R Data Frame
• A data frame is a two-dimensional data structure which can
store data in tabular format.
• Data frames have rows and columns and each column can be
a different vector.
• Different vectors can be of different data types.
Characteristics of a data frame.
•The columns name should be non-empty.
•The rows name should be unique.
•The data which is stored in a data frame can be a factor,
numeric, or character type.
•Each column contains the same number of data items.
Create a Data Frame in R
Use the data.frame() function to create a Data Frame.
The syntax of the data.frame() function is
dataframe1 <- data.frame(
first_col = c(val1, val2, ...),
second_col = c(val1, val2, ...), ... )
Create a data frame
dataframe1 <- data.frame (
Name = c("Juan", "Alcaraz", "Simantha"),
Age = c(22, 15, 19),
Vote = c(TRUE, FALSE, TRUE)
)
print(dataframe1)
Access Data Frame Column
There are different ways to extract columns from a data frame. We can use [ ],
[[ ]], or $ to access specific column of a data frame in R
Combine Data Frames
In R, we use the rbind() and the cbind() function to combine two
data frames together.
rbind() - combines two data frames vertically
cbind() - combines two data frames horizontally
Combine Vertically
Using rbind()
Combine Horizontally
Using cbind()
Length of a Data Frame in R
R Arrays
Compared to matrices, arrays can have more than two dimensions.
We can use the array() function to create an array, and the dim parameter to
specify the dimensions:
Arrays are the R data objects which can store data in more than two dimensions.
For example − If we create an array of dimension (2, 3, 4) then it creates 4
rectangular matrices each with 2 rows and 3 columns. Arrays can store only data
type.
R Arrays
R Array Syntax
There is the following syntax of R arrays:
array_name <- array(data, dim= (row_size, column_size, matrices, dim_names))
Dim_names : Default value = NULL.
Create The Array
1.In the first step, we will create two vectors of different lengths.
2.Once our vectors are created, we take these vectors as inputs
to the array.
# create two 2 by 3 matrix
array1 <- array(c(1:12), dim = c(2,3,2))
print(array1)
Access Array Elements
We use the vector index operator [ ] to access specific elements of an array in R.
The syntax to access an array element is
array[n1, n2, mat_level]
array1 <- array(c(1:12), dim = c(2,3,2))
print(array1)
# access element at 1st row, 3rd column of 2nd matrix
cat("\nDesired Element:", array1[1, 3, 2])
Naming rows and columns
#Creating two vectors of different lengths
vec1 <-c(1,3,5)
vec2 <-c(10,11,12,13,14,15)
#Initializing names for rows, columns and matrices
col_names <- c("Col1","Col2","Col3")
row_names <- c("Row1","Row2","Row3")
matrix_names <- c("Matrix1","Matrix2")
#Taking the vectors as input to the array
res <-
array(c(vec1,vec2),dim=c(3,3,2),dimnames=list(row_names,col_names,matrix_names))
print(res)
Access Entire Row or Column
In R, we can also access the entire row or column based on the value
passed inside [].
[c(n), ,mat_level] - returns the entire element of the nth row.
[ ,c(n), mat_level] - returns the entire element of the nth column.
Access Entire Row or Column
# create a two 2 by 3 matrix
array1 <- array(c(1:12), dim = c(2,3,2))
print(array1)
# access entire elements at 2nd column of 1st matrix
cat("\n2nd Column Elements of 1st matrix:",
array1[,c(2),1])
# access entire elements at 1st row of 2nd matrix
cat("\n1st Row Elements of 2nd Matrix:", array1[c(1), ,2])
R Strings
A string is a sequence of characters.
message1 <- 'Hola Amigos'
print(message1)
message2 <- "Welcome to Programiz"
print(message2)
String Operations in R
• Find the length of a string
• Join two strings
• Compare two strings
• Change the string case
1. Find Length of R String
We use the nchar() method to find the length of a string.
message1 <- "Programiz"
# use of nchar() to find length of message1
nchar(message1) # 9
String Operations in R
2. Join Strings Together
In R, we can use the paste() function to join two or more strings together.
Output
[1] Programiz Pro
message1 <- "Programiz"
message2 <- "Pro"
# use paste() to join two strings
paste(message1, message2)
String Operations in R
3. Compare Two Strings in R Programming
We use the == operator to compare two strings. If two strings are equal, the operator returns
TRUE. Otherwise, it returns FALSE
message1 <- "Hello, World!"
message2 <- "Hola, Mundo!"
message3 <- "Hello, World!" Output
# compare message1 and message2 [1] FALSE
print(message1 == message2) [1] TRUE
# compare message1 and message3
print(message1 == message3)
String Operations in R
4. Change Case of R String toupper() - convert string to uppercase
tolower() - convert string to lowercase
message <- "R Programming"
Output
# change string to uppercase
Uppercase: R PROGRAMMING
message_upper <- toupper(message)
Lowercase: r programming
cat("Uppercase:", message_upper)
# change string to lowercase
message_lower <- tolower(message)
cat("\nLowercase:", message_lower)
Control Structures in R
Control Structures in R
Control Structures can be divided into three categories.
1.Conditional statements.
2.Looping Statements
3.Jump Statements
1.Conditional statements
if statement
The conditional if statement is used to test an expression. If
the test_expression is TRUE, the statement gets executed. But
if it’s FALSE, nothing happens.
if (test_expression)
{
statement
}
Example:Check for Positive Number
if-else Condition in R
The conditional if...else statement is used to test an expression similar
to the if statement. However, rather than nothing happening if the
test_expression is FALSE, the else part of the function will be
evaluated.
if (test_expression)
{
statement
}
else
{
statement
}
Val1<-10
val2 <-5
if (val1 > val2)
{
print("Value 1 is greater than Value 2")
} else if (val1 < val2)
{
print("Value 1 is less than Value 2")
}
x <- c(8, 3, -2, 5)
if(any(x < 0))
{
print("x contains negative numbers")
}
else
{
print("x contains all positive numbers")
}
[1] "x contains negative numbers"
If-Else if Ladder
If-Else statements can be chained using If-Else if Ladder
# Create vector quantity
Syntax: quantity <- 10
if(condition) if (quantity <20)
{ {
Statement1
print('Not enough for today')
}
else if } else if (quantity > 20 &quantity <= 30)
{ {
Statement 2 print('Average day')
} } else Output:
else
{ ## [1] "Not enough for today"
{
Statement 3 print('What a great day!')
} }
Switch:
1. A switch statement allows a variable to be tested for equality against a list of values.
2. Each value is called a case, and the variable being switched on is checked for each
case.
Syntax:
switch(Expression, "Option 1", "Option 2", "Option
3" ................................."OptionN")
Example: Print Department names using Switch
2.Looping Statements
While Loop:
Description: The While loop executes the same code again and again until a stop condition is met.
Syntax:
Example:
Out Put:
while(condition)
{ i <- 1 [1] 1
Statement while (i < 6) [1] 2
} [1] 3
{ [1] 4
[1] 5
print(i)
i <- i + 1
}
For Loop in R
1. It is used to iterate over a collection of objects, such as a vector, a list, a
matrix, or a dataframe, and apply the same set of operations on each
item of a given data structure.
2. sequence is a collection of objects (e.g., a vector)
3. the keywords for and in are compulsory, as well as the parenthesis
The basic syntax of a for-loop in R is the following:
for (variable in sequence)
{
expression
}
For Loop in R
A for loop is used for iterating over a sequence:
Example: Example: Example:
for (x in 1:10) Print every item in a list: values <- c(1,2,3,4,5)
{ for(id in 1:4)
print(x) fruits <- list("apple", "banana", "cherry") {
} print(values[id])
for (x in fruits) {
print(x)
}
}
[1] 1
[1] "apple" [1] 2
[1] "banana" [1] 3
[1] "cherry" [1] 4
repeat loop
1. A repeat loop is used to iterate over a block of code multiple number of times.
2. There is test expression in a repeat loop to end or exit the loop
3. use the break function to exit the loop. Failing to do so will result into an
infinite loop.
Syntax:
repeat
{
Commands
If(condition)
Break
}
repeat loop
x=1
# Repeat loop
repeat { Output:
print(x) [1] 1
[1] 2
# Break statement to terminate if x > 4 [1] 3
if (x > 4) { [1] 4
break [1] 5
}
# Increment x by 1
x=x+1
}
3.Jump Statements
Break :
Terminates the loop statement and transfers execution to the statement
immediately following the loop.
Syntax: break
Example :print numbers till 5
next Statement in R
1. next jumps to the next cycle without completing a particular iteration.
2. Next statement enables to skip the current iteration of a loop without
terminating it.
The syntax of the next statement is:
if (test_condition)
{
next
}
In this example, the for loop will iterate for each element in x; however, when it
gets to the element that equals 3 it will skip the for loop execution of printing the
element and simply jump to the next iteration.
x <- 1:5
for (i in x)
{
if (i == 3)
{
next
}
print(i)
}
## [1] 1
## [1] 2
## [1] 4
## [1] 5
R - Functions
A function is a set of statements organized together to perform a
specific task. R has a large number of in-built functions and the user
can create their own functions.
An R function is created by using the keyword function.
Syntax:
function_name <- function(arg_1, arg_2, ...)
{
Function body
}
Components of Functions
Function Types
Creating a Function Call a Function
To create a function, use the function() keyword:
To call a function, use the function name
followed by parenthesis, like my_function():
Example:
Example
my_function <- function()
{ my_function <- function()
print("Hello World!") {
} print("Hello World!")
}
my_function() # call the function named
Function with Arguments:
Function with Default Argument
LOOPING FUNCTIONS IN R
apply() function
apply() takes Data frame or matrix as an input and gives output in vector, list or
array. Apply function in R is primarily used to avoid explicit uses of loop
constructs. It is the most basic of all collections can be used over a matrice.
This function takes 3 arguments:
apply(X, MARGIN, FUN)
Here:
-x: an array or matrix
-MARGIN: take a value or range between 1 and 2 to define where to apply the function:
-MARGIN=1`: the manipulation is performed on rows
-MARGIN=2`: the manipulation is performed on columns
-MARGIN=c(1,2)` the manipulation is performed on rows and columns
-FUN: tells which function to apply. Built functions like mean, median, sum, min, max and even user-defined
functions can be applied>
m1 <- matrix(C<-(1:10),nrow=5, ncol=6)
m1
a_m1 <- apply(m1, 2, sum)
a_m1
Output:
lapply()
Definition: Loop over a list and evaluate a function on each element
Syntax:
lapply(X, FUN)
Arguments:
-X: A vector or an object
-FUN: Function applied to each element
of x
sapply()
Definition : same as lapply but try to simplify the result .
Syntax:
sapply(X, FUN)
Arguments:
-X: A vector or an object
-FUN: Function applied to each element
of x
sapply()
# create sample data
sample_data<- data.frame( x=c(1,2,3,4,5,6),y=c(3,2,4,2,34,5))
print( "original data:")
sample_data
# apply sapply() function
print("data after sapply():")
sapply(sample_data, max)
tapply()
We use the tapply() function for calculating summary statistics (such as mean, median,
min, max, sum, etc.) for different factors (i.e., categories). It has the following syntax:
salaries <- c(80000, 62000, 113000, 68000, 75000, 79000,
112000, 118000, 65000, 117000)
jobs <- c('DS', 'DA', 'DE', 'DA', 'DS', 'DS', 'DE', 'DE', 'DA', 'DE')
Syntax:
print(tapply(salaries, jobs, mean))
tapply(X, INDEX, FUN)
DA DE DS
65000 115000 78000