0% found this document useful (0 votes)
35 views95 pages

DSR Block 2 All

The document provides a comprehensive guide on data import and export in R programming, covering various file formats such as CSV, Excel, and JSON. It details functions for reading, modifying, and writing these files, including examples of querying data and performing operations like merging and creating new columns. Additionally, it discusses methods for exporting data to text and CSV files, emphasizing the importance of data preservation and accessibility.

Uploaded by

Jeya preetha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views95 pages

DSR Block 2 All

The document provides a comprehensive guide on data import and export in R programming, covering various file formats such as CSV, Excel, and JSON. It details functions for reading, modifying, and writing these files, including examples of querying data and performing operations like merging and creating new columns. Additionally, it discusses methods for exporting data to text and CSV files, emphasizing the importance of data preservation and accessibility.

Uploaded by

Jeya preetha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Data Science using R

Data Import and Export

© Kalasalingam Academy of Research and Education


Data Import and Export
➢ In R programming, data import and export operations are essential for reading
data from external sources and saving processed data for later use or sharing.

➢ R provides various functions to handle different file formats, including CSV,


Excel, JSON, and more.

1. Working with CSV files in R Programming

2. Working with Excel Files in R Programming

3. Working with JSON Files in R Programming


Working with CSV files in R Programming
R CSV Files
R CSV Files are text files wherein the values of each row are separated by a delimiter, as in
a comma or a tab.
Getting and Setting the Working Directory with R CSV Files
# Get the current working directory.
print(getwd())

# Set current working directory.


setwd(“D:\\MBA_R”)

# Get and print current working directory.


print(getwd())
With the help of getwd() function we can get the current working directory
and with the help of setwd()function we can also set a new working directory.
Reading a R CSV Files
The contents of a CSV file can be read as a data frame in R using the [Link]() function.
The CSV file to be read should be either present in the current working directory or the
directory should be set accordingly using the setwd() command in R. The CSV file can
also be read from a URL using [Link]() function.
csv_data<-[Link](file=“D:\\MBA_R\\[Link]”)
print(csv_data)

# print number of columns


print (ncol(csv_data))

# print number of rows


print(nrow(csv_data))
We can upload the R csv Files by passing its directory the header is by default set to a TRUE
value in the function. The head is not included in the count of rows, therefore this CSV has 7
rows and 4 columns.
Querying with R CSV Files
QL queries can be performed on the CSV content, and the corresponding result can be
retrieved using the subset(csv_data,) function in R. Multiple queries can be applied in
the function at a time where each query is separated using a logical operator. The result
is stored as a data frame in R.
csv_data<-[Link](file=“D:\\MBA_R\\[Link]”)
print(csv_data)
min_pro <- min(csv_data$projects)
print (min_pro)
max_pro <- max(csv_data$projects)
print (max_pro)
Aggregator functions (min, max etc.) can be applied on the CSV data. Here
the min() function is applied on projects column using $ symbol. The minimum number of
projects which is 2 is returned. The maximum number of projects which is 8 is returned.
Querying with R CSV Files
# Selecting 'name' and 'salary' columns for employees with salary greater than 60000
result <- csv_data[csv_data$salary > 60000, c("name", "salary")]

# Print the result


print(result)

The subset of the data that is created is stored as a data frame satisfying the conditions
specified as the arguments of the function. Selecting ‘name’ and ‘salary’ columns for
employees with salary greater than 60000.
Querying with R CSV Files
# Calculating the average salary for each department
result <- tapply(csv_data$salary, csv_data$department, mean)

# Print the result


print(result)

In this we will Calculating the average salary for each department.


Working with Excel Files in R Programming
➢ Excel files are of extension .xls, .xlsx and .csv(comma-separated values).
➢ To start working with excel files in R Programming Language, we need to first
import excel files in RStudio or any other R supporting IDE(Integrated
development environment).

Reading Excel Files in R Programming Language


First, install readxl package in R to load excel files. Various methods including their
subparts are demonstrated further.
Sample_data1.xlsx
Reading Excel Files
The two excel files Sample_data1.xlsx and Sample_data2.xlsx and read from the working
directory.
# Working with Excel Files
# Installing required package
[Link]("readxl")

# Loading the package


library(readxl) The excel files are loaded into variables Data_1
and Data_2 as a dataframes and then variable
# Importing excel file Data_1 and Data_2 is called that prints the
Data1 < - read_excel("Sample_data1.xlsx") dataset.
Data2 < - read_excel("Sample_data2.xlsx")

# Printing the data


head(Data1)
head(Data2)
Modifying Excel Files
The Sample_data1.xlsx file and Sample_file2.xlsx are modified.

# Modifying the files


Data1$Pclass <- 0

Data2$Embarked <- "S"

# Printing the data


head(Data1)
head(Data2)

The value of the P-class attribute or variable of Data1 data is modified to 0. The
value of Embarked attribute or variable of Data2 is modified to S.
Deleting Content from Excel files
The variable or attribute is deleted from Data1 and Data2 datasets containing
Sample_data1.xlsx and Sample_data2.xlsx files.

# Deleting from files


Data1 <- Data1[-2]

Data2 <- Data2[-3]

# Printing the data


Data1
Data2

The – sign is used to delete columns or attributes from the dataset. Column 2 is deleted
from the Data1 dataset and Column 3 is deleted from the Data2 dataset.
Merging Excel Files
The two excel datasets Data1 and Data2 are merged using merge() function which is in
base package and comes pre-installed in R.

# Merging Files
Data3 <- merge(Data1, Data2, all.x = TRUE, all.y = TRUE)

# Displaying the data


head(Data3)

Data1 and Data2 are merged with each other and the resultant file is stored in the
Data3 variable.
Creating new columns
New columns or features can be easily created in Data1 and Data2 datasets.

# Creating feature in Data1 dataset


Data1$Num < - 0

# Creating feature in Data2 dataset


Data2$Code < - "Mission"

# Printing the data


head(Data1)
head(Data2)

Num is a new feature that is created with 0 default value in Data1 dataset. Code is a
new feature that is created with the mission as a default string in Data2 dataset.
Writing Excel Files
After performing all operations, Data1 and Data2 are written into new files
using [Link]() function built in writexl package.
# Installing the package
[Link]("writexl")

# Loading package
library(writexl)

# Writing Data1
write_xlsx(Data1, "New_Data1.xlsx")

# Writing Data2
write_xlsx(Data2, "New_Data2.xlsx")
The Data1 dataset is written New_Data1.xlsx file and Data2 dataset is written
in New_Data2.xlsx file. Both the files are saved in the present working directory.
Working with JSON Files in R Programming
JSON stands for JavaScript Object Notation. These files contain the data in human
readable format, i.e. as text. Like any other file, one can read as well as write into the
JSON files. In order to work with JSON files in R, one needs to install
the “rjson” package. The most common tasks done using JSON files under rjson
packages are as follows:
➢ Install and load the rjson package in R console
➢ Create a JSON file
➢ Reading data from JSON file
➢ Write into JSON file
➢ Converting the JSON data into Dataframes
Working with JSON Files in R Programming
install and load the rjson package
One can install the rjson from the R console using the [Link]() command in
the following way:
[Link]("rjson")
After installing rjson package one has to load the package using the library() function as
follows:
library("rjson")
Creating a JSON file
To create a JSON file, one can do the following steps:
•Copy the data given below into a notepad file or any text editor file. One can also create
his own data as per the given format.
Working with JSON Files in R Programming
•Choose “all types” as the file type and save the
file with .json extension.(Example: [Link])
# Read a JSON file
•One must make sure that the information or data
is contained within a pair or curly braces { } .
# Load the package required to read JSON files.
Reading a JSON file
library("rjson")
In R, reading a JSON file is quite a simple task.
One can extract and read the data of a JSON file
# Give the input file name to the function.
very efficiently using the fromJSON() function.
result <- fromJSON(file = "E:\\[Link]")
The fromJSON() function takes the JSON file
and returns the extracted data from the JSON file
# Print the result.
in the list format by default.
print(result)
Example:
Suppose the above data is stored in a file
named [Link] in the E drive. To read the
file we must write the following code.
Working with JSON Files in R Programming
Writing into a JSON file
One need to create a JSON Object using toJSON() function before he writes the data
to a JSON file. To write into a JSON file use the write() function.
# Load the package required to read JSON files.
library("rjson")
# creating the list
list1 <- vector(mode="list", length=2)
list1[[1]] <- c("sunflower", "guava", "hibiscus")
list1[[2]] <- c("flower", "fruit", "flower")
# creating the data for JSON file
jsonData <- toJSON(list1)
# writing into JSON file
write(jsonData, "[Link]")
# Give the created file name to the function
result <- fromJSON(file = "[Link]")
# Print the result
print(result)
Working with JSON Files in R Programming
Converting the JSON data into Dataframes
In R, to convert the data extracted from a JSON file into a data frame one can use
the [Link]() function.
# Convert the file into dataframe
# Load the package required to read JSON files.
library("rjson")

# Give the input file name to the function.


result <- fromJSON(file = "E://[Link]")

# Convert JSON file to a data frame.


json_data_frame <- [Link](result)

print(json_data_frame)
Importing Data in R Script
We can read external datasets and operate with them in our R
environment by importing data into an R script. R offers a number of
functions for importing data from various file formats.

Importing Data in R
First, let’s consider a data set that we can use for the demonstration.
For this demonstration, we will use two examples of a single dataset,
one in [Link] form and another [Link]
Reading a Comma-Separated Value(CSV) File
Method 1: Using [Link]() Function Read CSV Files into R
The function has two parameters:
[Link](): It opens a menu to choose a CSV file from
the desktop.
header: It is to indicate whether the first row of the dataset
is a variable name or not. Apply T/True if the variable
name is present else put F/False.

# import and store the dataset in data1


data1 <- [Link]([Link](), header=T)

# display the data


data1
Reading a Comma-Separated Value(CSV) File
Method 2: Using [Link]() Function
This function specifies how the dataset is separated, in this case we take sep=”, “ as an
argument.

Example:
# import and store the dataset in data2
data2 <- [Link]([Link](), header=T, sep=",")

# display data
data2
Reading a Tab-Delimited(txt) File
Method 1: Using [Link]() Function
The function has two parameters:
•[Link](): It opens a menu to choose a csv file from the
desktop.
•header: It is to indicate whether the first row of the dataset
is a variable name or not. Apply T/True if the variable name
is present else put F/False.
# import and store the dataset in data3
data3 <- [Link]([Link](), header=T)

# display the data


data3
Reading a Tab-Delimited(txt) File
Method 2: Using [Link]() Function
This function specifies how the dataset is separated, in this case we
take sep=”\t” as the argument.

# import and store the dataset in data4


data4 <- [Link]([Link](), header=T, sep="\t")

# display the data


data4
Exporting Data from scripts in R Programming
➢ When a program is terminated, the entire data is lost.
➢ Storing in a file will preserve one’s data even if the program terminates.
➢ If one has to enter a large number of data, it will take a lot of time to enter them all.
➢ However, if one has a file containing all the data, he/she can easily access the
contents of the file using a few commands in R.
➢ One can easily move his data from one computer to another without any changes. So
those files can be stored in various formats.
➢ It may be stored in .txt(tab-separated value) file, or in a tabular format i.e
.csv(comma-separated value) file or it may be on the internet or cloud.
➢ R provides very easy methods to export data to those files.
Exporting data to a text file
One of the important formats to store a file is in a text file. R provides various methods that
one can export data to a text file.
[Link]()
The R base function [Link]() can be used to export a data frame or a matrix to a text file.
Syntax:
[Link](x, file, append = FALSE, sep = ” “, dec = “.”, [Link] = TRUE, [Link] = TRUE)
Parameters:
x: a matrix or a data frame to be written.
file: a character specifying the name of the result file.
sep: the field separator string, e.g., sep = “\t” (for tab-separated value).
dec: the string to be used as decimal separator. Default is “.”
[Link]: either a logical value indicating whether the row names of x are to be written
along with x, or a character vector of row names to be written.
[Link]: either a logical value indicating whether the column names of x are to be written
along with x, or a character vector of column names to be written.
Exporting data to a text file
# Creating a dataframe

df = [Link](
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)

# Export a data frame to a text file using [Link]()


[Link](df,
file = "[Link]",
sep = "\t",
[Link] = TRUE,
[Link] = NA)
Exporting data to a text file
write_tsv():
This write_tsv() method is also used for to export # Importing readr library
library(readr)
data to a tab separated (“\t”) values by using the
help of readr package. # Creating a dataframe
df = [Link](
Syntax: "Name" = c("Amiya", "Raj", "Asish"),
write_tsv(file, path) "Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
Parameters: )
file: a data frame to be written
# Export a data frame using write_tsv()
path: the path to the result file write_tsv(df, path = "[Link]")
Exporting data to a csv file
Another popular format to store a file is in a csv(comma-separated value) format. R
provides various methods that one can export data to a csv file.
[Link]():
The R base function [Link]() can also be used to export a data frame or a matrix to a
csv file.
Syntax: [Link](x, file, append = FALSE, sep = ” “, dec = “.”, [Link] =
TRUE, [Link] = TRUE)
Parameters:
x: a matrix or a data frame to be written.
file: a character specifying the name of the result file.
sep: the field separator string, e.g., sep = “\t” (for tab-separated value).
dec: the string to be used as decimal separator. Default is “.”
[Link]: either a logical value indicating whether the row names of x are to be written
along with x, or a character vector of row names to be written.
[Link]: either a logical value indicating whether the column names of x are to be
written along with x, or a character vector of column names to be written.
Exporting data to a csv file

# Creating a dataframe
df = [Link](
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)

# Export a data frame to a text file using [Link]()


[Link](df,
file = "[Link]",
sep = "\t",
[Link] = FALSE,
)
Exporting data to a csv file
[Link]():
This [Link]() method is recommendable for exporting data to a csv file. It uses “.” for
the decimal point and a comma (“, ”) for the separator.

# Creating a dataframe
df = [Link](
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)

# Export a data frame to a text file using [Link]()


[Link](df, file = "my_data.csv")
Exporting data to a csv file
write.csv2():
This method is much similar as [Link]() but it uses a comma (“, ”) for the decimal point
and a semicolon (“;”) for the separator.

# Creating a dataframe
df = [Link](
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)

# Export a data frame to a text file using write.csv2()


write.csv2(df, file = "my_data_1.csv")
Exporting data to a csv file
write_csv():
This method is also used for to export data to a comma separated (“, ”) values
by using the help of readr package.
Syntax:
write_csv(file, path)
Parameters:
file: a data frame to be written
path: the path to the result file
Exporting data to a csv file

# Importing readr library


library(readr)
# Creating a dataframe
df = [Link](
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
# Export a data frame using write_csv()
write_csv(df, path = "[Link]")
Data Science using R
Type conversions

© Kalasalingam Academy of Research and Education


Type conversions in R programming
➢ In R programming, type conversion refers to the process of changing an object's
data type from one form to another, such as converting a numeric value to a
character string or a factor.
➢ R provides several functions for explicit type conversions, which are essential for
ensuring that your data is in the correct format for processing or analysis.
Type conversions in R programming
Basic Data Types in R
1) Numeric: Decimal numbers (e.g., 3.14, 42)
2) Integer: Whole numbers (e.g., 1, 100)
3) Character: Text or string data (e.g., "apple", "42")
4) Factor: Categorical data with levels (e.g., male, female)
5) Logical: Boolean values (TRUE or FALSE)
6) Date/Datetime: Representing dates and times
Data Type Conversion in R
Data Type conversion is the process of converting one type of data to another type of
data. R Programming Language has only 3 data types: Numeric, Logical, Character.
Since R is a weakly typed language or dynamically typed language, R language
automatically creates data types based on the values assigned to the variable. We
see a glimpse of this in the code below:

# value assignment
name <- "Kalasalingam"
age <- 50
pwd <- FALSE
# type checking
typeof(name)
typeof(age)
typeof(pwd)
Data Type Conversion in R
Numeric or Character to Logical type
Any numeric value which is not 0, on conversion to Logical type gets converted
to TRUE. Here “50” is a non-zero numeric value. Hence, it gets converted to TRUE.
“FALSE” is a character type which on conversion becomes FALSE of logical type. The
double quotes are removed on conversion from character type to Logical type.
Syntax: [Link](value)
# value assignment
age <- 50
pwd <- "FALSE"

# Converting type
[Link](age)
[Link](pwd)
Data Type Conversion in R
Numeric or Logical to Character type:
Any character type values are always enclosed within double quotes(” “). Hence,
the conversion of 50 of numeric type gets converted into “50” of character type.
Similarly, converting the FALSE of logical type into character type gives us “FALSE”.
Syntax: [Link](value)

# value assignment
age <- 50
pwd <- FALSE

# Converting type
[Link](age)
[Link](pwd)
Data Type Conversion in R
Character or Logical to Numeric type

“50” of character type on being converted to numeric type becomes 50, just the
double quotes got removed. Conversion of Logical type to Numeric, FALSE-> 0 and
TRUE-> 1.
Syntax: [Link](value)

# value assignment
age <- "50"
pwd <- FALSE

# Converting type
[Link](age)
[Link](pwd)
Data Type Conversion in R
Vectors to Matrix
We have converted 2 sample vectors into a single matrix by using the syntax below. The elements
of the vectors are filled in Row major order.
Syntax: rbind(vector1, vector2, vector3…..vectorN)
We have converted 2 sample vectors into a single matrix by using the syntax below. The elements
of the vectors are filled in Column major order.
Syntax: cbind(vector1, vector2, vector3…..vectorN)
# sample vectors
vector1 <- c('red','green',"blue","yellow")
vector2 <- c(1,2,3,4)

print("Row Major Order")


rbind(vector1,vector2)
print("Column Major Order")
cbind(vector1,vector2)
Data Type Conversion in R
Vectors to Dataframe:
On the conversion of our sample vectors into dataframe, elements are filled in the
column-major order. The first vector becomes the 1st column, the second vector became
the 2nd column.
Syntax: [Link](vector1, vector2, vector3…..vectorN)

# sample vectors
vector1 <- c('red', 'green', "blue", "yellow")
vector2 <- c(1, 2, 3, 4)
[Link](vector1, vector2)
Data Type Conversion in R
Matrix to Vector:
The sample matrix is containing elements from 1 to 6, we have specified nrows=2 which
means our sample matrix will be containing 2 rows, then we have used the syntax
below which converts the matrix into one long vector. The elements of the matrix are
accessed in column-major order.
Syntax: [Link](matrix_name)
# sample matrix
mat<- matrix(c(1:6), nrow = 2)
print("Sample Matrix")
mat

print("After conversion into vector")


[Link](mat)
Data Type Conversion in R
Matrix to Dataframe:
In the code below, the sample matrix is containing elements from 1 to 6, we have
specified nrows=2 which means our sample matrix will be containing 2 rows, then we
have used the syntax below which converts the matrix into a dataframe. The elements
of the matrix are accessed in column-major order.
Syntax: [Link](matrix_name)
# sample matrix
mat<- matrix(c(1:6), nrow = 2)
print("Sample Matrix")
mat

print("After conversion into Dataframe")


[Link](mat)
Data Type Conversion in R
Dataframe to Matrix:
In the code below, we have created a sample dataframe containing elements of different
types, then we have used the syntax below which converts the dataframe into a matrix of
character type, which means each element of the matrix is of character type.
Syntax: [Link](dataframe_name)
# sample dataframe
df <- [Link](
serial = c (1:5),
name = c("Raja","Mani","Selvam","Muthu","Govind"),
stipend = c(2000,3000.5,5000,4000,500.2),
stringsAsFactors = FALSE)
print("Sample Dataframe")
df
print("After conversion into Matrix")
[Link](df)
Data Type Conversion in R
Automatic Type Conversions in R

R sometimes performs implicit type conversion depending on the context of


operations. For example:

# Mixing numeric and character results in character

>result <- c(10, "apple") # Result: "10" "apple"


>print(result)

When you mix types in vectors, R will automatically convert everything to the most
general type (in this case, character).
Checking Data Types
Before or after performing a type conversion, it’s often helpful to check the type of your
data using functions like class(), typeof(), or is.* functions.
1) class()
Returns the class (or type) of an object.
>x <- 42
>class(x) # Result: "numeric"
2) typeof()
Returns the internal type or storage mode of an object.
>typeof(x) # Result: "double"
3) [Link](), [Link](), etc.
Functions to check if an object is of a specific type.
>[Link](x) # Check if numeric
>[Link](x) # Check if character
>[Link](x) # Check if factor
Data Science using R
Packages - Installation and libraries

© Kalasalingam Academy of Research and Education


Packages - Installation and libraries in R programming

➢ In R programming, packages are collections of functions, data, and compiled


code in a well-defined format.
➢ They extend R’s capabilities by providing additional functionality for statistical
analysis, data manipulation, plotting, machine learning, and more.
➢ The core installation of R comes with several standard packages, but there are
thousands of other packages available through repositories like CRAN
(Comprehensive R Archive Network), Bioconductor (for bioinformatics), and
GitHub.
1. Installing Packages
You can install a package from CRAN using the [Link]() function.
This only needs to be done once for a package.

1.1 Installing a Package from CRAN


# Install the "dplyr" package from CRAN
>[Link]("dplyr")
1.2 Installing Multiple Packages
You can also install multiple packages in one go by passing a vector of package names.
# Install multiple packages
>[Link](c("ggplot2", "tidyverse", "[Link]"))
1. Installing Packages
1.3 Installing Packages from Bioconductor
Bioconductor is a repository for bioinformatics-related packages. You first need to install
the BiocManager package and then use it to install other packages.
# Install BiocManager
>[Link]("BiocManager")
# Use BiocManager to install Bioconductor packages
>BiocManager::install("GenomicRanges")
1.4 Installing Packages from GitHub
You can install packages directly from GitHub using the devtools package.
# Install devtools
>[Link]("devtools")
# Use devtools to install packages from GitHub>devtools::install_github("author/repo")
2. Loading and Using Installed Packages
2.1 Loading a Package
# Load the "ggplot2" package into the R session
>library(ggplot2)
After loading, all the functions from that package are available for use.
2.2 Loading Multiple Packages
# Load multiple packages at once
>library(dplyr)
>ibrary(tidyr)
2.3 Checking Installed Packages
To see which packages are installed on your system, use the [Link]() function.
# View all installed packages
>[Link]()
3. Updating and Removing Packages
3.1 Updating Installed Packages
To ensure that you have the latest version of a package, use the [Link]() function.
# Update all installed packages
>[Link]()
# Update a specific package
>[Link]("ggplot2")
3.2 Removing a Package
You can remove a package from your system with the [Link]() function.
# Remove the "dplyr" package
>[Link]("dplyr")
4. Package Repositories

CRAN is the default repository for R packages, but you can also install packages from other
sources like Bioconductor, GitHub, or custom repositories.
4.1 Setting a Different CRAN Mirror
When installing a package, you may want to use a faster or closer CRAN mirror.
# Set a specific CRAN mirror
>options(repos = c(CRAN = "[Link]
4.2 Specifying Repositories During Installation
# Install from a specific CRAN mirror
>[Link]("ggplot2", repos = "[Link]
5. Discovering and Searching for Packages
With thousands of packages available, finding the right one for your task can be
challenging. Here are some ways to search for packages.
5.1 Searching for Packages by Name or Keyword
You can search for packages related to a specific topic on CRAN by using the
[Link]() function or online at CRAN Task Views.
# Search for packages related to "time series“
>[Link]("time series")
5.2 CRAN Task Views
CRAN Task Views are curated lists of packages in different fields, such as machine
learning, econometrics, and bioinformatics.
Visit: [Link]
Data Science using R
Conditionals

© Kalasalingam Academy of Research and Education


Conditionals in R programming
In R programming, conditional statements allow you to execute certain blocks of
code based on specific conditions. These are essential for controlling the flow of your
program and making decisions. The most common conditional statements in R include if,
else if, and else.
1. Basic Syntax of Conditionals
1.1 if Statement
The if statement is used to execute a block of code if a specified condition evaluates
to TRUE. # Example of an if statement
x <- 5
if (x > 0)
{
print("x is positive")
}
Conditionals in R programming
1.2 if...else Statement
You can use else to provide an alternative block of code to
execute if the condition is FALSE.

# Example of if...else statement

x <- -10
if (x > 0)
{
print("x is positive")
} else
{
print("x is not positive")
Conditionals in R programming

1.3 if...else if...else Statement


For multiple conditions, you can chain several else if statements.

# Example of if...else if...else statement

x <- -10
if (x > 0) {
print("x is positive")
} else if (x < 0) {
print("x is negative")
} else {
print("x is zero")
}
Conditionals in R programming
2. Nested Conditionals
You can also nest if statements within each other to handle more
complex conditions.
# Nested if statement
x <- 10
if (x >= 0) {
print("x is non-negative")
if (x == 0) {
print("x is zero")
} else {
print("x is positive")
}
} else {
print("x is negative")
}
Conditionals in R programming
3. Vectorized Conditionals with ifelse()
R provides the ifelse() function, which is a vectorized form of if statements. This allows
you to apply a condition to each element of a vector and return values accordingly.
# Using ifelse() function
x <- c(-2, -1, 0, 1, 2)
# Classify values as "Negative", "Zero", or "Positive"
result <- ifelse(x > 0, "Positive", ifelse(x < 0, "Negative", "Zero"))
print(result)
Conditionals in R programming
4. Logical Operators

You can combine conditions using logical operators like & (AND), | (OR), and ! (NOT).

4.1 Using AND (&)


x <- 7
y <- 5
if (x > 0 & y > 0) {
print("Both x and y are positive")
}
Conditionals in R programming
4.2 Using OR (|)
x <- 7
y <- 15
if (x > 10 | y > 10) {
print("At least one of x or y is greater than 10")
}

4.3 Using NOT (!)


x<-10
if (!(x < 0)) {
print("x is not negative")
}
Conditionals in R programming

5. Switch Statement
The switch statement is useful when you have a variable with multiple possible values
and want to execute different code blocks based on that variable's value.
# Example of switch statement

option <- 2
result <- switch(option,
"Option 1" = "You selected option 1",
"Option 2" = "You selected option 2",
"Option 3" = "You selected option 3",
"Invalid option")
print(result)
Conditionals in R programming
6. Using Conditionals in Functions
Conditionals are often used in functions to perform different actions based on input values.
# Function with conditionals
categorize_number <- function(num) {

if (num > 0) {
return("Positive")
} else if (num < 0) {
return("Negative")
} else {
return("Zero")
}
}
# Test the function
result <- categorize_number(-5)
print(result) # Output: "Negative"
Conditionals in R programming
8. Example: Using Conditionals with Data Frames

Conditionals can also be applied to data frames, enabling you to filter or modify data
based on specific conditions.

# Create a sample data frame

data <- [Link](


id = 1:5,
value = c(10, -5, 0, 7, -3)
)
# Add a new column based on conditions
data$status <- ifelse(data$value > 0, "Positive",
ifelse(data$value < 0, "Negative", "Zero"))
print(data)
Conditionals in R programming
Data Science using R
Looping

© Kalasalingam Academy of Research and Education


Looping in R programming
In R programming, loops allow you to execute a block of code repeatedly, which is
useful for automating tasks that involve repetition. The most commonly used loops in R are
for, while, and repeat. Additionally, R provides vectorized functions like apply() to
efficiently loop over data structures.
1. Types of Loops in R
1.1 for Loop
A for loop is used to iterate over a sequence (e.g., a vector or list) and execute a block
of code for each element in that sequence.
Syntax for (variable in sequence) {
# Code to execute in each iteration
}
1.1 for Loop
Example: Looping Through a Vector
# A vector of numbers
numbers <- c(1, 2, 3, 4, 5)
# Print each number
for (num in numbers) {
print(num)
}
Example: Using a for Loop with Indexing
1.1 for Loop
You can also use indexing within a for loop to access elements by their index position.

# A vector of numbers
numbers <- c(10, 20, 30, 40, 50)
# Use a for loop to print elements with their index
for (i in 1:length(numbers)) {
print(paste("Element", i, "is", numbers[i]))
}
1.2 while Loop
A while loop repeats a block of code as long as a specified condition is TRUE.
Syntax while (condition) {
# Code to execute while the condition is TRUE
}
Example: while Loop to Print Numbers
# Initialize a variable
<- 1
# Loop while x is less than or equal to 5
while (x <= 5) {
In this example, the loop continues as long as x is
print(x) x <- x + 1 less than or equal to 5.
} Each time through the loop, x is incremented by 1.
1.3 repeat Loop
The repeat loop in R does not have a built-in exit condition and continues
indefinitely unless a break statement is encountered.
Syntax repeat {
# Code to execute repeatedly
if (condition) {
break # Exit the loop
}}
Example: repeat Loop to Break After a Condition
x <- 1
repeat {
print(x) x <- x + 1
if (x > 5) {
break
# Exit the loop when x exceeds 5
}}
2. Controlling Loop Behavior
2.1 break Statement
You can use the break statement to exit a loop before it has run its natural course.

# Example of using break in a for loop


for (i in 1:10) {
if (i == 6) {
break
# Exit the loop when i is 6
}
print(i)
}
2. Controlling Loop Behavior
2.2 next Statement
The next statement is used to skip the current iteration
and move to the next iteration of the loop.
# Example of using next in a for loop
for (i in 1:10) {
if (i %% 2 == 0) {
next
# Skip even numbers
}
print(i)
}
3. Nested Loops

Loops can be nested inside other loops, allowing you to handle more complex tasks,
such as iterating over matrices or data frames.
# Nested loop example
matrix <- matrix(1:9, nrow = 3, ncol = 3)
# Outer loop iterates over rows
for (i in 1:nrow(matrix)) {
# Inner loop iterates over columns
for (j in 1:ncol(matrix)) {
print(paste("Element at [", i, ",", j, "] is", matrix[i, j]))
}}
3. Nested Loops
4. Looping Through Lists and Data Frames
You can loop through more complex data structures like lists and data frames in a similar
manner to vectors.
4.1 Looping Through a List
# A list with different types of elements
my_list <- list(name = "John", age = 25, scores = c(85, 90, 78))
# Loop through each element of the list
for (item in my_list) {
print(item)
}
4. Looping Through Lists and Data Frames
4.2 Looping Through a Data Frame
# Create a sample data frame
data <- [Link]( name = c("Alice", "Bob", "Charlie"), score = c(85, 92, 88))
# Loop through each row of the data frame
for (i in 1:nrow(data)) {
print(paste(data$name[i], "scored", data$score[i]))
}
Data Science using R
Family of Functions

© Kalasalingam Academy of Research and Education


Apply family of functions R programming

➢ The apply family of functions in R provides a powerful and efficient way to


perform operations on data structures like vectors, matrices, and data frames without
explicitly writing loops.

➢ These functions are particularly useful for simplifying your code and improving
performance by utilizing R's vectorized operations.

➢ Here’s an overview of the main functions in the apply family, along with their usage
and examples.
Overview of Apply Family Functions
The primary functions in the apply family include:

1) apply(): Used for applying a function to the rows or columns of a matrix or array.

2) lapply(): Used for applying a function to each element of a list or vector, returning a
list.

3) sapply(): Similar to lapply(), but attempts to simplify the output to a vector or matrix if
possible.

4) tapply(): Used for applying a function to subsets of a vector, defined by factors.

5) mapply(): A multivariate version of sapply(), allowing you to apply a function to


multiple arguments.
1. apply() Function
The apply() function is used primarily with matrices and arrays. It applies a function
to the rows or columns of a matrix.

Syntax

apply(X, MARGIN, FUN, ...)


• X: The array or matrix.

• MARGIN: 1 for rows, 2 for columns.

• FUN: The function to apply.

• ...: Additional arguments to FUN.


1. apply() Function
mat <- matrix(1:9, nrow = 3)
# Print the matrix

Example print(mat)
# Create a matrix # Calculate the row sums
row_sums <- apply(mat, 1, sum)
print(row_sums)
# Output: 6, 15, 24
# Calculate the column means
col_means <- apply(mat, 2, mean)
print(col_means)
# Output: 2, 5, 8
2. lapply() Function
The lapply() function applies a function to each element of a list (or vector) and returns a
list.

Syntax

lapply(X, FUN, ...)


• X: A list or vector.

• FUN: The function to apply.

• ...: Additional arguments to FUN.


2. lapply() Function
Example

# Create a list of numbers


num_list <- list(a = 1:5, b = 6:10, c = 11:15)
# Apply the sum function to each element of the list
sums <- lapply(num_list, sum)
print(sums)
# Output: List of sums for each element
# Output: $a [1] 15
# $b [1] 40
# $c [1] 75
3. sapply() Function
The sapply() function is similar to lapply(), but it simplifies the output to a vector or
matrix when possible.

Syntax
sapply(X, FUN, ...)

Example

# Apply the mean function to each element of the list


means <- sapply(num_list, mean)
print(means) # Output: Named numeric vector of means
4. tapply() Function

The tapply() function applies a function to subsets of a vector, defined by a factor (or
grouping variable).

Syntax

tapply(X, INDEX, FUN, ...)


• X: A vector.

• INDEX: A factor or a list of factors that define the subsets.

• FUN: The function to apply.


4. tapply() Function
Example

# Create a vector of values and a corresponding factor


values <- c(10, 20, 30, 40, 50)
group <- c("A", "A", "B", "B", "A")
# Calculate the sum of values for each group
group_sums <- tapply(values, group, sum)
print(group_sums)
5. mapply() Function

The mapply() function is a multivariate version of sapply(). It allows you to apply a


function to multiple arguments simultaneously.
Syntax
mapply(FUN, ..., MoreArgs = NULL)
•FUN: The function to apply.
•...: Vectors or lists to which FUN will be applied.
• MoreArgs: A list of additional arguments to pass to FUN.
5. mapply() Function
Example

# Create two vectors


x <- 1:5
y <- 6:10
# Define a function to add two numbers
add <- function(a, b) {
return(a + b)
}
# Apply the add function to corresponding elements of x and y
result <- mapply(add, x, y)
print(result)
Using the Apply Family Functions in Data Frames
You can also use the apply family of functions with data frames, particularly apply(), lapply(),
and sapply().

Example with Data Frames

# Create a data frame


df <- [Link](
Name = c("A", "B", "C"),
Age = c(25, 30, 35),
Salary = c(50000, 60000, 70000)
)
Using the Apply Family Functions in Data Frames
# Use sapply to calculate the mean of each numeric column
means <- sapply(df[, c("Age", "Salary")], mean)
print(means) # Output: Mean Age: 30, Mean Salary: 60000
# Use apply to calculate row sums (for numeric columns)
row_sums <- apply(df[, c("Age", "Salary")], 1, sum)
print(row_sums)

You might also like