DSR Block 2 All
DSR Block 2 All
The subset of the data that is created is stored as a data frame satisfying the conditions
specified as the arguments of the function. Selecting ‘name’ and ‘salary’ columns for
employees with salary greater than 60000.
Querying with R CSV Files
# Calculating the average salary for each department
result <- tapply(csv_data$salary, csv_data$department, mean)
The value of the P-class attribute or variable of Data1 data is modified to 0. The
value of Embarked attribute or variable of Data2 is modified to S.
Deleting Content from Excel files
The variable or attribute is deleted from Data1 and Data2 datasets containing
Sample_data1.xlsx and Sample_data2.xlsx files.
The – sign is used to delete columns or attributes from the dataset. Column 2 is deleted
from the Data1 dataset and Column 3 is deleted from the Data2 dataset.
Merging Excel Files
The two excel datasets Data1 and Data2 are merged using merge() function which is in
base package and comes pre-installed in R.
# Merging Files
Data3 <- merge(Data1, Data2, all.x = TRUE, all.y = TRUE)
Data1 and Data2 are merged with each other and the resultant file is stored in the
Data3 variable.
Creating new columns
New columns or features can be easily created in Data1 and Data2 datasets.
Num is a new feature that is created with 0 default value in Data1 dataset. Code is a
new feature that is created with the mission as a default string in Data2 dataset.
Writing Excel Files
After performing all operations, Data1 and Data2 are written into new files
using [Link]() function built in writexl package.
# Installing the package
[Link]("writexl")
# Loading package
library(writexl)
# Writing Data1
write_xlsx(Data1, "New_Data1.xlsx")
# Writing Data2
write_xlsx(Data2, "New_Data2.xlsx")
The Data1 dataset is written New_Data1.xlsx file and Data2 dataset is written
in New_Data2.xlsx file. Both the files are saved in the present working directory.
Working with JSON Files in R Programming
JSON stands for JavaScript Object Notation. These files contain the data in human
readable format, i.e. as text. Like any other file, one can read as well as write into the
JSON files. In order to work with JSON files in R, one needs to install
the “rjson” package. The most common tasks done using JSON files under rjson
packages are as follows:
➢ Install and load the rjson package in R console
➢ Create a JSON file
➢ Reading data from JSON file
➢ Write into JSON file
➢ Converting the JSON data into Dataframes
Working with JSON Files in R Programming
install and load the rjson package
One can install the rjson from the R console using the [Link]() command in
the following way:
[Link]("rjson")
After installing rjson package one has to load the package using the library() function as
follows:
library("rjson")
Creating a JSON file
To create a JSON file, one can do the following steps:
•Copy the data given below into a notepad file or any text editor file. One can also create
his own data as per the given format.
Working with JSON Files in R Programming
•Choose “all types” as the file type and save the
file with .json extension.(Example: [Link])
# Read a JSON file
•One must make sure that the information or data
is contained within a pair or curly braces { } .
# Load the package required to read JSON files.
Reading a JSON file
library("rjson")
In R, reading a JSON file is quite a simple task.
One can extract and read the data of a JSON file
# Give the input file name to the function.
very efficiently using the fromJSON() function.
result <- fromJSON(file = "E:\\[Link]")
The fromJSON() function takes the JSON file
and returns the extracted data from the JSON file
# Print the result.
in the list format by default.
print(result)
Example:
Suppose the above data is stored in a file
named [Link] in the E drive. To read the
file we must write the following code.
Working with JSON Files in R Programming
Writing into a JSON file
One need to create a JSON Object using toJSON() function before he writes the data
to a JSON file. To write into a JSON file use the write() function.
# Load the package required to read JSON files.
library("rjson")
# creating the list
list1 <- vector(mode="list", length=2)
list1[[1]] <- c("sunflower", "guava", "hibiscus")
list1[[2]] <- c("flower", "fruit", "flower")
# creating the data for JSON file
jsonData <- toJSON(list1)
# writing into JSON file
write(jsonData, "[Link]")
# Give the created file name to the function
result <- fromJSON(file = "[Link]")
# Print the result
print(result)
Working with JSON Files in R Programming
Converting the JSON data into Dataframes
In R, to convert the data extracted from a JSON file into a data frame one can use
the [Link]() function.
# Convert the file into dataframe
# Load the package required to read JSON files.
library("rjson")
print(json_data_frame)
Importing Data in R Script
We can read external datasets and operate with them in our R
environment by importing data into an R script. R offers a number of
functions for importing data from various file formats.
Importing Data in R
First, let’s consider a data set that we can use for the demonstration.
For this demonstration, we will use two examples of a single dataset,
one in [Link] form and another [Link]
Reading a Comma-Separated Value(CSV) File
Method 1: Using [Link]() Function Read CSV Files into R
The function has two parameters:
[Link](): It opens a menu to choose a CSV file from
the desktop.
header: It is to indicate whether the first row of the dataset
is a variable name or not. Apply T/True if the variable
name is present else put F/False.
Example:
# import and store the dataset in data2
data2 <- [Link]([Link](), header=T, sep=",")
# display data
data2
Reading a Tab-Delimited(txt) File
Method 1: Using [Link]() Function
The function has two parameters:
•[Link](): It opens a menu to choose a csv file from the
desktop.
•header: It is to indicate whether the first row of the dataset
is a variable name or not. Apply T/True if the variable name
is present else put F/False.
# import and store the dataset in data3
data3 <- [Link]([Link](), header=T)
df = [Link](
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
# Creating a dataframe
df = [Link](
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
# Creating a dataframe
df = [Link](
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
# Creating a dataframe
df = [Link](
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
# value assignment
name <- "Kalasalingam"
age <- 50
pwd <- FALSE
# type checking
typeof(name)
typeof(age)
typeof(pwd)
Data Type Conversion in R
Numeric or Character to Logical type
Any numeric value which is not 0, on conversion to Logical type gets converted
to TRUE. Here “50” is a non-zero numeric value. Hence, it gets converted to TRUE.
“FALSE” is a character type which on conversion becomes FALSE of logical type. The
double quotes are removed on conversion from character type to Logical type.
Syntax: [Link](value)
# value assignment
age <- 50
pwd <- "FALSE"
# Converting type
[Link](age)
[Link](pwd)
Data Type Conversion in R
Numeric or Logical to Character type:
Any character type values are always enclosed within double quotes(” “). Hence,
the conversion of 50 of numeric type gets converted into “50” of character type.
Similarly, converting the FALSE of logical type into character type gives us “FALSE”.
Syntax: [Link](value)
# value assignment
age <- 50
pwd <- FALSE
# Converting type
[Link](age)
[Link](pwd)
Data Type Conversion in R
Character or Logical to Numeric type
“50” of character type on being converted to numeric type becomes 50, just the
double quotes got removed. Conversion of Logical type to Numeric, FALSE-> 0 and
TRUE-> 1.
Syntax: [Link](value)
# value assignment
age <- "50"
pwd <- FALSE
# Converting type
[Link](age)
[Link](pwd)
Data Type Conversion in R
Vectors to Matrix
We have converted 2 sample vectors into a single matrix by using the syntax below. The elements
of the vectors are filled in Row major order.
Syntax: rbind(vector1, vector2, vector3…..vectorN)
We have converted 2 sample vectors into a single matrix by using the syntax below. The elements
of the vectors are filled in Column major order.
Syntax: cbind(vector1, vector2, vector3…..vectorN)
# sample vectors
vector1 <- c('red','green',"blue","yellow")
vector2 <- c(1,2,3,4)
# sample vectors
vector1 <- c('red', 'green', "blue", "yellow")
vector2 <- c(1, 2, 3, 4)
[Link](vector1, vector2)
Data Type Conversion in R
Matrix to Vector:
The sample matrix is containing elements from 1 to 6, we have specified nrows=2 which
means our sample matrix will be containing 2 rows, then we have used the syntax
below which converts the matrix into one long vector. The elements of the matrix are
accessed in column-major order.
Syntax: [Link](matrix_name)
# sample matrix
mat<- matrix(c(1:6), nrow = 2)
print("Sample Matrix")
mat
When you mix types in vectors, R will automatically convert everything to the most
general type (in this case, character).
Checking Data Types
Before or after performing a type conversion, it’s often helpful to check the type of your
data using functions like class(), typeof(), or is.* functions.
1) class()
Returns the class (or type) of an object.
>x <- 42
>class(x) # Result: "numeric"
2) typeof()
Returns the internal type or storage mode of an object.
>typeof(x) # Result: "double"
3) [Link](), [Link](), etc.
Functions to check if an object is of a specific type.
>[Link](x) # Check if numeric
>[Link](x) # Check if character
>[Link](x) # Check if factor
Data Science using R
Packages - Installation and libraries
CRAN is the default repository for R packages, but you can also install packages from other
sources like Bioconductor, GitHub, or custom repositories.
4.1 Setting a Different CRAN Mirror
When installing a package, you may want to use a faster or closer CRAN mirror.
# Set a specific CRAN mirror
>options(repos = c(CRAN = "[Link]
4.2 Specifying Repositories During Installation
# Install from a specific CRAN mirror
>[Link]("ggplot2", repos = "[Link]
5. Discovering and Searching for Packages
With thousands of packages available, finding the right one for your task can be
challenging. Here are some ways to search for packages.
5.1 Searching for Packages by Name or Keyword
You can search for packages related to a specific topic on CRAN by using the
[Link]() function or online at CRAN Task Views.
# Search for packages related to "time series“
>[Link]("time series")
5.2 CRAN Task Views
CRAN Task Views are curated lists of packages in different fields, such as machine
learning, econometrics, and bioinformatics.
Visit: [Link]
Data Science using R
Conditionals
x <- -10
if (x > 0)
{
print("x is positive")
} else
{
print("x is not positive")
Conditionals in R programming
x <- -10
if (x > 0) {
print("x is positive")
} else if (x < 0) {
print("x is negative")
} else {
print("x is zero")
}
Conditionals in R programming
2. Nested Conditionals
You can also nest if statements within each other to handle more
complex conditions.
# Nested if statement
x <- 10
if (x >= 0) {
print("x is non-negative")
if (x == 0) {
print("x is zero")
} else {
print("x is positive")
}
} else {
print("x is negative")
}
Conditionals in R programming
3. Vectorized Conditionals with ifelse()
R provides the ifelse() function, which is a vectorized form of if statements. This allows
you to apply a condition to each element of a vector and return values accordingly.
# Using ifelse() function
x <- c(-2, -1, 0, 1, 2)
# Classify values as "Negative", "Zero", or "Positive"
result <- ifelse(x > 0, "Positive", ifelse(x < 0, "Negative", "Zero"))
print(result)
Conditionals in R programming
4. Logical Operators
You can combine conditions using logical operators like & (AND), | (OR), and ! (NOT).
5. Switch Statement
The switch statement is useful when you have a variable with multiple possible values
and want to execute different code blocks based on that variable's value.
# Example of switch statement
option <- 2
result <- switch(option,
"Option 1" = "You selected option 1",
"Option 2" = "You selected option 2",
"Option 3" = "You selected option 3",
"Invalid option")
print(result)
Conditionals in R programming
6. Using Conditionals in Functions
Conditionals are often used in functions to perform different actions based on input values.
# Function with conditionals
categorize_number <- function(num) {
if (num > 0) {
return("Positive")
} else if (num < 0) {
return("Negative")
} else {
return("Zero")
}
}
# Test the function
result <- categorize_number(-5)
print(result) # Output: "Negative"
Conditionals in R programming
8. Example: Using Conditionals with Data Frames
Conditionals can also be applied to data frames, enabling you to filter or modify data
based on specific conditions.
# A vector of numbers
numbers <- c(10, 20, 30, 40, 50)
# Use a for loop to print elements with their index
for (i in 1:length(numbers)) {
print(paste("Element", i, "is", numbers[i]))
}
1.2 while Loop
A while loop repeats a block of code as long as a specified condition is TRUE.
Syntax while (condition) {
# Code to execute while the condition is TRUE
}
Example: while Loop to Print Numbers
# Initialize a variable
<- 1
# Loop while x is less than or equal to 5
while (x <= 5) {
In this example, the loop continues as long as x is
print(x) x <- x + 1 less than or equal to 5.
} Each time through the loop, x is incremented by 1.
1.3 repeat Loop
The repeat loop in R does not have a built-in exit condition and continues
indefinitely unless a break statement is encountered.
Syntax repeat {
# Code to execute repeatedly
if (condition) {
break # Exit the loop
}}
Example: repeat Loop to Break After a Condition
x <- 1
repeat {
print(x) x <- x + 1
if (x > 5) {
break
# Exit the loop when x exceeds 5
}}
2. Controlling Loop Behavior
2.1 break Statement
You can use the break statement to exit a loop before it has run its natural course.
Loops can be nested inside other loops, allowing you to handle more complex tasks,
such as iterating over matrices or data frames.
# Nested loop example
matrix <- matrix(1:9, nrow = 3, ncol = 3)
# Outer loop iterates over rows
for (i in 1:nrow(matrix)) {
# Inner loop iterates over columns
for (j in 1:ncol(matrix)) {
print(paste("Element at [", i, ",", j, "] is", matrix[i, j]))
}}
3. Nested Loops
4. Looping Through Lists and Data Frames
You can loop through more complex data structures like lists and data frames in a similar
manner to vectors.
4.1 Looping Through a List
# A list with different types of elements
my_list <- list(name = "John", age = 25, scores = c(85, 90, 78))
# Loop through each element of the list
for (item in my_list) {
print(item)
}
4. Looping Through Lists and Data Frames
4.2 Looping Through a Data Frame
# Create a sample data frame
data <- [Link]( name = c("Alice", "Bob", "Charlie"), score = c(85, 92, 88))
# Loop through each row of the data frame
for (i in 1:nrow(data)) {
print(paste(data$name[i], "scored", data$score[i]))
}
Data Science using R
Family of Functions
➢ These functions are particularly useful for simplifying your code and improving
performance by utilizing R's vectorized operations.
➢ Here’s an overview of the main functions in the apply family, along with their usage
and examples.
Overview of Apply Family Functions
The primary functions in the apply family include:
1) apply(): Used for applying a function to the rows or columns of a matrix or array.
2) lapply(): Used for applying a function to each element of a list or vector, returning a
list.
3) sapply(): Similar to lapply(), but attempts to simplify the output to a vector or matrix if
possible.
Syntax
Example print(mat)
# Create a matrix # Calculate the row sums
row_sums <- apply(mat, 1, sum)
print(row_sums)
# Output: 6, 15, 24
# Calculate the column means
col_means <- apply(mat, 2, mean)
print(col_means)
# Output: 2, 5, 8
2. lapply() Function
The lapply() function applies a function to each element of a list (or vector) and returns a
list.
Syntax
Syntax
sapply(X, FUN, ...)
Example
The tapply() function applies a function to subsets of a vector, defined by a factor (or
grouping variable).
Syntax