Here’s a concise set of notes for Unit-2: R Programming Basics:
1. Overview of R
R: A programming language and environment for statistical computing and
graphics.
Open-source, supports data manipulation, statistical modeling, and
visualization.
Popular IDE: RStudio.
Features:
Comprehensive statistical analysis
Extensive graphical capabilities
Integration with other languages (C, Python)
2. R Data Types and Objects
Basic Data Types:
Numeric: Decimal numbers (e.g., 3.14, 10).
Integer: Whole numbers (e.g., as.integer(4)).
Character: Strings (e.g., “Hello”).
Logical: Boolean values (TRUE, FALSE).
Complex: Complex numbers (e.g., 1+2i).
Objects in R:
Vectors: One-dimensional data (e.g., c(1, 2, 3)).
Matrices: Two-dimensional arrays (e.g., matrix(1:9, nrow=3)).
Lists: Collection of different types of elements (e.g., list(1, “a”, TRUE)).
Data Frames: Tabular data (e.g., data.frame(a=1:3, b=c(“x”, “y”, “z”))).
Factors: Categorical data (e.g., factor(c(“male”, “female”))).
3. Reading and Writing Data
Reading Data:
Read.table(): Read tabular data from a file.
Read.csv(): Read CSV files.
readLines(): Read text files line-by-line.
Scan(): Read raw data.
Writing Data:
Write.table(): Write tabular data to a file.
Write.csv(): Write data to CSV files.
Cat(): Write data to console or files.
4. Control Structures
Conditional Statements:
If, else, ifelse(condition, true_value, false_value).
Loops:
For: Iteration over a sequence.
While: Loop with condition.
Repeat: Infinite loop (use break to exit).
5. Functions
User-defined reusable blocks of code.
Syntax:
My_function <- function(arg1, arg2) {
# Code
Return(result)
Example:
Add <- function(x, y) {
Return(x + y)
Add(3, 5) # Output: 8
6. Scoping Rules
Lexical Scoping:
Variables are looked up in the environment where the function is defined.
Dynamic Scoping:
R primarily uses lexical scoping.
Use <<- to assign values to variables in parent environments.
7. Dates and Times
Date Class:
Sys.Date(): Current date.
As.Date(“2025-01-01”): Convert string to date.
Time Class:
Sys.time(): Current date and time.
POSIXct and POSIXlt: Classes for time representation.
8. Loop Functions
Apply(): Apply a function to rows/columns of a matrix.
Lapply(): Apply a function to elements of a list.
Sapply(): Same as lapply(), but returns a simplified result.
Tapply(): Apply a function to subsets of data.
Mapply(): Multivariate version of sapply().
9. Debugging Tools
Traceback(): View the call stack after an error.
Debug(): Step through a function line by line.
Browser(): Pause execution to inspect variables.
Trace(): Insert debugging code into a function.
Recover(): Interactive debugging on errors.
10. Simulation
Generating Random Numbers:
Rnorm(n, mean, sd): Normal distribution.
Runif(n, min, max): Uniform distribution.
Rbinom(n, size, prob): Binomial distribution.
Random Sampling:
Sample(x, size, replace, prob): Draw samples.
11. Code Profiling
Tools:
Rprof(): Profile R code.
summaryRprof(): Summarize profiling output.
Best Practices:
Avoid loops where vectorized operations are possible.
Use efficient data structures.
1. Overview of R
What is R?
R is a statistical computing language designed for data analysis, statistical
modeling, and graphical representation.
Features:
Rich library of statistical and graphical functions.
Supports vectorized operations for efficiency.
Extensible through packages (CRAN, Bioconductor).
RStudio IDE:
User-friendly interface with script editor, console, and visualization panels.
Popular tools: Syntax highlighting, debugging, and package management.
2. R Data Types and Objects
Data Types
Numeric: Numbers (e.g., 3.5, -2).
Num <- 3.14
Typeof(num) # Output: “double”
Integer: Whole numbers.
Int <- as.integer(5)
Typeof(int) # Output: “integer”
Character: Text or string.
Str <- “Hello”
Typeof(str) # Output: “character”
Logical: Boolean values (TRUE, FALSE).
Log_val <- TRUE
Typeof(log_val) # Output: “logical”
Complex: Numbers with imaginary parts.
Comp <- 2 + 3i
Typeof(comp) # Output: “complex”
Objects
1. Vector: Homogeneous data.
Vec <- c(1, 2, 3, 4)
2. Matrix: 2D, homogeneous.
Mat <- matrix(1:9, nrow = 3, ncol = 3)
3. List: Heterogeneous data.
Lst <- list(1, “a”, TRUE)
4. Data Frame: Tabular data.
Df <- data.frame(Name = c(“A”, “B”), Age = c(25, 30))
5. Factor: Categorical data.
Gender <- factor(c(“male”, “female”, “male”))
3. Reading and Writing Data
Reading Data
From CSV File:
Data <- read.csv(“file.csv”)
From Text File:
Data <- read.table(“file.txt”, header = TRUE, sep = “\t”)
Raw Input:
Values <- scan(what = integer())
Writing Data
To CSV:
Write.csv(data, “output.csv”)
To Text File:
Write.table(data, “output.txt”, sep = “\t”)
4. Control Structures
Conditional Statements
If and else:
X <- 10
If (x > 5) {
Print(“x is greater than 5”)
} else {
Print(“x is less than or equal to 5”)
}
Ifelse:
Result <- ifelse(x > 5, “Greater”, “Lesser”)
Loops
For loop:
For (I in 1:5) {
Print(i)
While loop:
I <- 1
While (I <= 5) {
Print(i)
I <- I + 1
Repeat loop:
I <- 1
Repeat {
If (I > 5) break
Print(i)
I <- I + 1
5. Functions
Defining Functions
Syntax:
My_function <- function(arg1, arg2) {
# Function body
Return(result)
Example
Square <- function(x) {
Return(x^2)
Square(4) # Output: 16
6. Scoping Rules
Lexical Scoping:
Variables are searched in the environment where the function is defined.
X <- 10
My_func <- function() {
Return(x)
My_func() # Output: 10
Dynamic Scoping: R does not use this approach.
7. Dates and Times
Current Date and Time:
Today <- Sys.Date()
Now <- Sys.time()
Conversion:
Date <- as.Date(“2025-01-01”)
Time <- as.POSIXct(“2025-01-01 12:00:00”)
8. Loop Functions
Apply:
Mat <- matrix(1:9, nrow = 3)
Apply(mat, 1, sum) # Row sums
Lapply:
Lapply(1:5, function(x) x^2)
Sapply:
Sapply(1:5, function(x) x^2)
Tapply:
Tapply(1:10, c(1,1,2,2,3,3,4,4,5,5), sum)
Mapply:
Mapply(rep, 1:3, 3)
9. Debugging Tools
Traceback:
Traceback()
Debug:
Debug(my_function)
Browser:
Browser()
Recover:
Options(error = recover)
10. Simulation
Random Numbers:
Rnorm(5, mean = 0, sd = 1) # Normal distribution
Runif(5, min = 0, max = 1) # Uniform distribution
Rbinom(5, size = 10, prob = 0.5) # Binomial distribution
11. Code Profiling
Profiling with Rprof():
Rprof(“profile.out”)
For (I in 1:1000) sqrt(i)
Rprof(NULL)
summaryRprof(“profile.out”)
Optimization Tips:
Replace loops with vectorized functions.
Use efficient data structures like matrices and data frames.
1. Overview of R
What is R?
R is like a calculator but much more powerful. It can handle complex data
analysis, generate beautiful graphs, and automate tasks.
Example: Plotting a simple graph.
X <- 1:10
Y <- x^2
Plot(x, y, main = “Graph of y = x^2”)
2. R Data Types and Objects
Data Types
1. Numeric:
X <- 10.5
Typeof(x) # Output: “double”
2. Integer:
Y <- as.integer(7)
Typeof(y) # Output: “integer”
3. Character:
Z <- “Hello”
Typeof(z) # Output: “character”
4. Logical:
Flag <- TRUE
Typeof(flag) # Output: “logical”
5. Complex:
Comp <- 2 + 3i
Typeof(comp) # Output: “complex”
Objects
1. Vectors: Store multiple values of the same type.
Vec <- c(1, 2, 3, 4)
Print(vec) # Output: 1 2 3 4
2. Matrices: 2D array where all elements are of the same type.
Mat <- matrix(1:6, nrow = 2, ncol = 3)
Print(mat)
3. Lists: Collection of different data types.
Lst <- list(num = 10, str = “Hello”, vec = c(1, 2))
Print(lst)
4. Data Frames: Tabular data.
Df <- data.frame(Name = c(“Alice”, “Bob”), Age = c(25, 30))
Print(df)
3. Reading and Writing Data
Reading Data
1. From CSV:
Data <- read.csv(“file.csv”) # Replace “file.csv” with your file path.
2. Raw Input:
Values <- scan(what = integer(), nmax = 5) # Reads 5 integers from the
console.
Writing Data
1. To CSV:
Write.csv(data, “output.csv”) # Save data to a CSV file.
3. Control Structures
Conditional Statements
1. If and else:
X <- 10
If (x > 5) {
Print(“x is greater than 5”)
} else {
Print(“x is less than or equal to 5”)
Loops
1. For Loop:
For (I in 1:5) {
Print(i)
}
2. While Loop:
I <- 1
While (I <= 5) {
Print(i)
I <- I + 1
3. Repeat Loop:
I <- 1
Repeat {
Print(i)
If (I == 5) break
I <- I + 1
4. Functions
1. Defining a Function:
Square <- function(x) {
Return(x^2)
Square(4) # Output: 16
2. Functions with Default Arguments:
Greet <- function(name = “User”) {
Return(paste(“Hello,”, name))
Greet() # Output: “Hello, User”
Greet(“Alice”) # Output: “Hello, Alice”
5. Scoping Rules
Lexical Scoping:
R searches for a variable in the environment where the function was defined,
not where it’s called.
X <- 10
My_function <- function() {
X <- 20
Return(x)
My_function() # Output: 20
Print(x) # Output: 10 (Global x remains unchanged)
Global Assignment:
Use <<- to assign a value to a variable outside the function.
X <- 5
My_function <- function() {
X <<- 10
My_function()
Print(x) # Output: 10
6. Dates and Times
1. Get Current Date and Time:
Sys.Date() # Example Output: “2025-01-06”
Sys.time() # Example Output: “2025-01-06 10:15:00”
2. Convert Strings to Dates:
As.Date(“2025-01-01”) # Output: “2025-01-01”
7. Loop Functions
1. Apply:
Mat <- matrix(1:6, nrow = 2)
Apply(mat, 1, sum) # Row sums: Output: 9 12
2. Lapply:
Lapply(1:3, function(x) x^2) # Output: List: 1 4 9
3. Sapply:
Sapply(1:3, function(x) x^2) # Output: 1 4 9
8. Debugging Tools
1. Traceback:
Shows the sequence of calls before an error occurred.
Traceback()
2. Debug:
Steps through a function.
Debug(square)
Square(4)
3. Browser:
Pauses execution at a specific line.
My_function <- function(x) {
Browser()
Return(x^2)
My_function(4)
9. Simulation
1. Generate Random Numbers:
Rnorm(5, mean = 0, sd = 1) # Generate 5 random numbers from normal
distribution.
Runif(5, min = 0, max = 10) # 5 random numbers from uniform distribution.
2. Random Sampling:
Sample(1:10, 5, replace = TRUE) # Sample 5 numbers with replacement.
10. Code Profiling
1. Profiling with Rprof:
Rprof(“profile.out”)
For (I in 1:10000) sqrt(i)
Rprof(NULL)
summaryRprof(“profile.out”)
2. Optimization:
Replace loops with vectorized operations.
X <- 1:10000
Y <- x^2 # Faster than using a for loop.