Select rows from a DataFrame based on values in a vector in R
Last Updated :
09 May, 2021
In this article, we will discuss how to select rows from a DataFrame based on values in a vector in R Programming Language.
Method 1: Using %in% operator
%in% operator in R, is used to identify if an element belongs to a vector or Dataframe. It is used to perform a selection of the elements satisfying the condition. It takes the value and checks for its existence in the object specified.
Syntax:
val %in% vec
It returns a boolean TRUE or FALSE value depending on whether the element is found or not. Then the corresponding element is accessed from the DataFrame. This approach creates a subset of the DataFrame without making any changes to the existing DataFrame. Any particular column can be accessed using df$colname and then matched with vector using this comparison operator.
Example:
R
data_frame <- data.frame (col1 = c (1:7),col2 = LETTERS [1:7])
print ( "Original DataFrame" )
print (data_frame)
vec <- c ( 'A' , 'a' , 'C' )
sub_df <- data_frame[data_frame$col2 % in % vec,]
print ( "Resultant DataFrame" )
print (sub_df)
|
Output
[1] "Original DataFrame"
col1 col2
1 1 A
2 2 B
3 3 C
4 4 D
5 5 E
6 6 F
7 7 G
[1] "Resultant DataFrame"
col1 col2
1 1 A
3 3 C
Method 2 : Using is.element operator
This is an instance of the comparison operator which is used to check the existence of an element in a vector or a DataFrame. is.element(x, y) is identical to x %in% y. It returns a boolean logical value to return TRUE if the value is found, else FALSE.
Syntax:
is.element(val,vec)
rbind is applied here to combine two subsets of DataFrames where in, in the first case col2 values can be checked for existence in the vector and then col3 values in vector. Both the sub-DataFrames can then be combined.
Example:
R
data_frame <- data.frame (
col1 = c (1:7),col2 = LETTERS [1:7],col3 = letters [1:7])
print ( "Original DataFrame" )
print (data_frame)
vec <- c ( 'a' , 'C' , 'D' )
sub_df <- rbind (data_frame[ is.element (data_frame$col2, vec),],
data_frame[ is.element (data_frame$col3, vec),])
print ( "Resultant DataFrame" )
print (sub_df)
|
Output
[1] "Original DataFrame"
col1 col2 col3
1 1 A a
2 2 B b
3 3 C c
4 4 D d
5 5 E e
6 6 F f
7 7 G g
[1] "Resultant DataFrame"
col1 col2 col3
3 3 C c
4 4 D d
1 1 A a
Method 3 : Using data.table package
The data.table package in R can be explicitly invoked into the R working space as an enhanced version of the DataFrames. The setDT() method in R is used to convert the DataFrame to data table by reference.
Syntax: setDT(df, keep.rownames=FALSE, key=NULL, check.names=FALSE)
Parameter:
- df – DataFrame
- key – The column name or any vector which has to be passed to setkeyv.
Also, the function J(vec) is then applied, which returns the vec elements by mapping it to the passed column index in the key argument of the setDT() method. It is used to create a join of the table involved along with the character vector.
The following key points are noticed while using this approach :
- The dataframe is converted to a data table, therefore, each result row of the table is lead by a row number identifier followed by “:”.
- The dataframe is checked against each value of the vector, and row of the final output DataFrame is printed in accordance with that.
- Application of this approach may lead to ambiguity between the actual available data and the obtained result.
Example:
R
library ( "data.table" )
data_frame <- data.frame (
col1 = c (6:9),
col2 = c (4.5,6.7,89.0,6.2),
col3 = factor ( letters [1:4])
)
print ( "Original DataFrame" )
print (data_frame)
vec <- c (4,6)
data_frame <- setDT (data_frame, key = "col1" )[ J (vec)]
print ( "Modified Dataframe" )
print (data_frame)
|
Output
[1] "Original DataFrame"
col1 col2 col3
1 6 4.5 a
2 7 6.7 b
3 8 89.0 c
4 9 6.2 d
[1] "Modified Dataframe"
col1 col2 col3
1: 4 NA <NA>
2: 6 4.5 a
Method 4 : Using dplyr package
The dplyr package provides a variety of modules and method to simulate data manipulations. The dplyr package is not available in base R and needs to incorporated in the working space to use it as a library. A method filter() is available in this package to produce a subset of the original DataFrame where the columns remain unmodified and the rows are filtered based on the constraints applied. The rows returning a boolean TRUE value for the conditions are available as a result of the operation. However, like other operations if the filter() method yields an NA result, it is considered to be equivalent to the FALSE boolean values and hence dropped from the resulting DataFrame.
Syntax : filter(df, FUN)
Parameter :
- df – A DataFrame,
- FUN – The function defined using the df variables, which return a boolean value upon evaluation<data-masking>
This method is used in combination with the %in% operator to select rows satisfying the indicated conditions.
Example:
R
library (dplyr)
data_frame <- data.frame (
"col1" = as.character (6:9),
"col2" = c (4.5,6.7,89.0,6.2),
"col3" = factor ( letters [1:4])
)
print ( "Original DataFrame" )
print (data_frame)
vec <- (8:11)
data_frame <- filter (data_frame, col1 % in % vec)
print ( "Modified DataFrame" )
print (data_frame)
|
Output
[1] "Original DataFrame"
col1 col2 col3
1 6 4.5 a
2 7 6.7 b
3 8 89.0 c
4 9 6.2 d
[1] "Modified Dataframe"
col1 col2 col3
1 8 89.0 c
2 9 6.2 d
Similar Reads
Select DataFrame Rows where Column Values are in Range in R
In this article, we will discuss how to select dataframe rows where column values are in a range in R programming language. Data frame indexing can be used to extract rows or columns from the dataframe. The condition can be applied to the specific columns of the dataframe and combined using the logi
2 min read
Substitute DataFrame Row Names by Values in Vector in R
In this article, we will discuss how to substitute dataframe row names by values in a vector in R programming language. Dataframe in use: We can substitute row names by using rownames() function Syntax: rownames(dataframe) <- vector where, dataframe is the input dataframevector is the new row val
2 min read
How to change row values based on a column value in R dataframe ?
In this article, we will see how to change the values in rows based on the column values in Dataframe in R Programming Language. Syntax: df[expression ,] <- newrowvalue Arguments : df - Data frame to simulate the modification uponexpression - Expression to evaluate the cell data based on a column
4 min read
Sum of rows based on column value in R dataframe
In this article, we will be discussing how we can sum up row values based on column value in a data frame in R Programming Language. Suppose you have a data frame like this: fruits shop_1 shop_2 1. Apple 1 13 2. Mango 9 5 3. Strawberry 2 14 4. Apple 10 6 5. Apple 3 15 6. Strawberry 11 7 7. Mango 4 1
2 min read
Select rows from R DataFrame that contain both positive and negative values
In this article, we will be discussing how to select rows in the data frame that contains both positive and negative values in R Programming Language. Let's take an example for better understanding. Suppose you have the following data frame in R that contains multiple columns and rows. All rows cont
2 min read
How to convert a DataFrame row into character vector in R?
If we want to turn a dataframe row into a character vector then we can use as.character() method In R, we can construct a character vector by enclosing the vector values in double quotation marks, but if we want to create a character vector from data frame row values, we can use the as character fun
1 min read
How to remove a subset from a DataFrame in R ?
A subset is a combination of cells that form a smaller data frame formed out from the original data frame. A set of rows and columns can be removed from the original data frame to reduce a part of the data frame. The subset removal can be based on constraints to which rows and columns are subjected
4 min read
How to create a DataFrame from given vectors in R ?
In this article we will see how to create a Dataframe from four given vectors in R. To create a data frame in R using the vector, we must first have a series of vectors containing data. The data.frame() function is used to create a data frame from vector in R. Syntax: data.frame(vectors) Example 1.
2 min read
Select Rows if Value in One Column is Smaller Than in Another in R Dataframe
In this article, we will discuss how to select rows if the value in one column is smaller than another in dataframe in R programming language. Data frame in use: Method 1: Using Square Brackets By using < operator inside the square bracket we can return the required rows. Syntax: dataframe[datafr
2 min read
How to extract the dataframe row with min or max values in R ?
The tabular arrangement of rows and columns to form a data frame in R Programming Language supports many ways to access and modify the data. Application of queries and aggregate functions, like min, max and count can easily be made over the data frame cell values. Therefore, it is relatively very ea
5 min read