Department of CSE
COURSE NAME: BDO
COURSE CODE: 21CS3276R
TOPIC :
FOUNDATIONS OF R
Session - 4
AIM OF THE SESSION
To familiarize students with the basic concept of R Programming
INSTRUCTIONAL OBJECTIVES
This Session is designed to:
1. Demonstrate R Basics
2. Describe the Functions of R
3. Controlling execution of R Functions
LEARNING OUTCOMES
At the end of this session, you should be able to:
1. Fundamentals of R
2. Execution of R Functions
SESSION INTRODUCTION
Objects of R
Every programming language has its own data types to store values or any
information so that the user can assign these data types to the variables and
perform operations respectively. Operations are performed accordingly to the
data types.
These data types can be character, integer, float, long, etc. Based on the data
type, memory/storage is allocated to the variable
SESSION INTRODUCTION
Unlike other programming languages, variables are assigned to objects rather
than data types in R programming. The following are list of objects used in R
1. Vectors
2. List
3. Matrices
4. Factors
5. Data Frames
SESSION DESCRIPTION
VECTOR
Atomic vectors are one of the basic types of objects in R programming.
Atomic vectors can store homogeneous data types such as character, doubles,
integers, raw, logical, and complex.
A single element variable is also said to be vector.
SESSION DESCRIPTION
Here are some key characteristics and concepts related to vectors in R:
• Homogeneous Data Type: Vectors in R are homogeneous, meaning that all the
elements in a vector must be of the same data type. For example, you can have a
numeric vector, a character vector, or a logical vector, but you cannot mix different
data types within a single vector.
• Atomic Data Types: Vectors can contain elements of atomic data types, such as
numeric (real or integer values), character (text), logical (TRUE or FALSE), and
complex (complex numbers).
• Creation of Vectors:
•Using the c() function: The most common way to create a vector is by using the c()
SESSION DESCRIPTION
Example for VECTOR
# Create vectors
x <- c(1, 2, 3, 4)
y <- c("a", "b", "c", "d")
z <- 5
# Print vector and class of vector
print(x) -------- 1,2,3,4
print(class(x)) -------- numeric
print(y) -------- "a", "b", "c", "d"
print(class(y)) -------- charater
print(z) -------- 5
print(class(z)) -------- numeric
SESSION DESCRIPTION
List
List is another type of object in R programming. List can contain heterogeneous
data types such as vectors or another lists.
SESSION DESCRIPTION
List
List is another type of object in R programming. List can contain
heterogeneous data types such as vectors or another lists.
Lists are designed to store heterogeneous data, meaning that each element
within a list can be of a different data type, and they can be of varying
lengths. This flexibility makes lists particularly useful for organizing and
managing complex and structured data.
SESSION DESCRIPTION
Example
Output
my_list <- list(
name = "John Doe", $name
age = 30, [1] "John Doe"
city = "New York", $age
hobbies = c("Reading", "Hiking", "Cooking"), [1] 30
scores = c(95, 89, 78, 92) $city
) [1] "New York" $
print(my_list) Hobbies
[1] "Reading" "Hiking" "Cooking“
$scores
[1] 95 89 78 92
SESSION DESCRIPTION
Matrices
A matrix is a two-dimensional data structure that stores elements of the same data
type in rows and columns. Matrices are commonly used for various mathematical
and statistical operations
SESSION DESCRIPTION
Creation of Matrix in R
A matrix in R using the matrix() function. It takes several arguments, such as data,
the number of rows, and the number of columns.
# Create a 3x3 matrix
my_matrix <- matrix(1:9, nrow = 3, ncol = 3)
SESSION DESCRIPTION
Example
my_matrix <- matrix(1:9, nrow = 3, ncol = 3)
print(my_matrix)
Output
SESSION DESCRIPTION
Factors
In R, a factor is a data structure used to represent categorical data.
Categorical data consists of distinct categories or levels, and factors are used
to store and manipulate this type of data. Factors are particularly useful for
data analysis and statistical modeling, as they enable R to treat categorical
variables appropriately.
SESSION DESCRIPTION
Creating Factors:
You can create a factor using the factor() function. This function takes a
vector of categorical data as its main argument and, optionally, other
arguments like levels to specify the categories and ordered to indicate
whether the factor represents an ordered categorical variable.
# Create a factor
gender <- factor(c("Male", "Female", "Male", "Male", "Female"))
Output: Male Female Male Male Female
SESSION DESCRIPTION
Levels:
A factor consists of a set of levels, which are the distinct categories or values
in the categorical variable. You can access the levels of a factor using the
levels() function.
# factor_levels <- levels(gender)
factor_levels
Outout: Male Female
SESSION DESCRIPTION
Data Frames:
In R, a data frame is a fundamental data structure used to store and
manipulate data in a tabular format, similar to a spreadsheet or database
table.
Data frames are a common way to organize and work with structured data,
making them one of the most important data structures in R.
SESSION DESCRIPTION
Creating Data Frames:
You can create a data frame using the data.frame() function. This function
allows you to combine vectors of different types into a data frame, with each
vector representing a column.
SESSION DESCRIPTION
Example Data Frames: Output
# Creating a simple data frame
df <- data.frame( Name = c("John", "Alice",
"Bob"), Age = c(28, 24, 32),
City = c("New York", "Los Angeles", "Chicago"))
str(df)
SESSION DESCRIPTION
Typical Functions Used in R
R offers a wide range of functions and operators for manipulating objects, including
containers like vectors, lists, data frames, and more. Here are some useful functions and
operators for common data manipulation tasks in R
Data Manipulation String
Manipulation
Data Aggregation and Summarization Data Sorting
Missing Data Handling Data Reshaping
Data Sampling Statistical
Analysis:
Function Application Data I/O
SESSION DESCRIPTION
Data Manipulation:
• subset(): Filter rows based on conditions and select specific columns.
• mutate() : Create or modify columns in a data frame.
• aggregate(): Aggregate data using a summary function.
• merge() and join() (from dplyr): Perform data joins between data frames.
• split(): Split data into a list based on a factor or a grouping variable.
• stack() and unstack(): Reshape data from wide to long and vice versa.
• rbind() and cbind(): Combine data frames by rows or columns.
SESSION DESCRIPTION
String Manipulation:
• paste(), paste0(): Concatenate strings.
• grep(), sub(), gsub(): Perform regular expression-based text manipulation.
• strsplit(): Split strings into substrings.
• tolower(), toupper(): Convert character data to lowercase or uppercase.
SESSION DESCRIPTION
Data Aggregation and Summarization:
• tapply(): Apply a function to subsets of data based on a factor.
• aggregate(): Compute summary statistics for different levels of a factor.
• by(): Apply a function to subsets of data based on a factor..
SESSION DESCRIPTION
Data Sorting:
• order(): Get the index that would sort a vector.
• sort(): Sort a vector or data frame by one or more columns.
• arrange() : Sort data frames by one or more columns.
SESSION DESCRIPTION
Missing Data Handling:
• is.na(): Check for missing values.
• na.omit(): Remove missing values from a data frame.
• complete.cases(): Identify complete cases in a data frame.
Data Reshaping:
• melt() and cast() (from reshape2): Reshape data for analysis.
• **gather() and spread() (from tidyr): Reshape data from wide to long and vice
versa.
SESSION DESCRIPTION
Data Sampling:
• sample(): Randomly sample elements from a vector or data frame.
• sample_n() and sample_frac() (from dplyr): Randomly sample rows from a data
frame.
Statistical Analysis:
• lm(): Fit linear regression models.
• glm(): Fit generalized linear models.
• t.test(), wilcox.test(): Perform statistical tests.
• cor(), cov(): Calculate correlation and covariance.
• sum(), min(), max(), mean(), median(): Statistical functions
SESSION DESCRIPTION
Function Application:
• lapply(), sapply(), apply(): Apply a function to elements of a list or array.
• map(), mapply(): Apply a function to multiple lists or vectors.
• do.call(): Call a function with a list of arguments.
Data I/O:
• read.csv(), read.table(): Read data from CSV files or tab-delimited text files.
• write.csv(), write.table(): Write data frames to CSV or text files.
• readRDS(), saveRDS(): Read and save R objects.
SESSION DESCRIPTION
Machine Learning and Data Mining:
• caret and mlr packages: Streamline the process of building and evaluating
machine learning models.
• randomForest(), xgboost(), caret for machine learning algorithms.
Data Visualization:
• plot(), hist(), barplot(), etc.: Create basic plots.
• ggplot2 package: Create complex and customized plots.
• lattice package: Create conditioned plots.
• heatmap(), boxplot(), qqnorm(), etc.: Generate specialized plots.
• plotly and shiny for interactive plots.
SESSION DESCRIPTION
Control Structures
Control structures in R are fundamental programming constructs that
allow you to control the flow of your code and make decisions based
on conditions.
R supports a variety of control structures, including conditional
statements, loops, and function calls
SESSION DESCRIPTION
Control Structures
Conditional Statements: if Statement
The basic conditional statement in R. It allows you to execute a block
of code if a condition is true.
Syntax:
if (condition) { # Code to execute if condition is true}
SESSION DESCRIPTION
Control Structures
Conditional Statements: if-else Statement
Allows you to execute one block of code if a condition is true and
another block if the condition is false.
Syntax:
if (condition) { # Code to execute if condition is true}
else { # Code to execute if condition is false}
SESSION DESCRIPTION
Control Structures
Conditional Statements: if-else if-else Statement
Allows you to test multiple conditions and execute different code blocks
based on which condition is true
Syntax:
if (condition1) { # Code to execute if condition1 is true}
else if (condition2) { # Code to execute if condition2 is true}
else { # Code to execute if no conditions are true}
Note: NO SWITCH STATEMENT IN R
SESSION DESCRIPTION
Control Structures
Loop Statements: for loop
Iterates over a sequence (e.g., a vector) and executes a block of code
for each element in the sequence
Syntax:
for (variable in sequence) { # Code to execute for each element in the
sequence}
SESSION DESCRIPTION
Control Structures
Loop Statements: while loop
Repeats a block of code as long as a condition is true.
Syntax:
while (condition) { # Code to execute as long as the condition is true}
SESSION DESCRIPTION
Control Structures
Loop Statements: repeat loop and break statement
Creates an infinite loop, which can be exited using the break
statement.
Syntax:
repeat { # Code to execute (use break to exit the loop) with
conditional statement
break}
SESSION DESCRIPTION
Functions in R
A function is a block of code which only runs when it is called. You can pass
data, known as parameters, into a function. A function can return data as a
result.
In R, you define functions using the function() keyword and implement them
by specifying the function's arguments, body, and return value
SESSION DESCRIPTION
Call Function
To call a function, use the function name followed by parenthesis, like
function_name(arg,agr):
SESSION DESCRIPTION
function_name <- function(arg1, arg2, ...)
{ # Function body # Perform operations
return(result)
function_name(arg1,agr2)
SESSION DESCRIPTION
Example
square <- function(x) {
result <- x^2
return(result)
}
square(5)
Output
25
SELF-ASSESSMENT QUESTIONS
1. Point out the correct statement?
(a) Blocks are evaluated until a new line is entered after the closing brace
(b) Break will execute a loop while a condition is true
(c) The if/else statement conditionally evaluates two statements
2. Point out the wrong statement?
(a) if and else tests a condition and acting on it
(b) for will execute a loop a fixed number of times
(c) break will execute a loop while a condition is true
TERMINAL QUESTIONS
1. Describe about Control Structures in R
2. Discuss about different categories of functions can be used in R Programming
3. Evaluate the functionality of R Objects.
REFERENCES FOR FURTHER LEARNING OF THE SESSION
Text Books:
1. Paulo Cortez, “Modern Optimization with R “ Springer, (2014).
2. Nicholas J. Horton & Ken Klein man, “ Using R and R Studio for Data Management, Statistical
Analysis, and Graphics”, Second Edition , CRC Press, (2015).
Reference Books:
1. Carlo Zaniolo, “Advanced database systems”, Morgan Kaufmann, Elsevier, (1997).
2.Jan L. Harrington, “Relational Database Design”, Morgan Kaufmann, Elsevier, (2009)
THANK YOU
Team – BDO