0% found this document useful (0 votes)
13 views

R Programming

These documents provide examples of basic R programs for performing tasks like printing text, taking user input, checking for leap years, prime numbers, and more. The programs demonstrate using functions like print, readline, and conditionals. Common operations covered include arithmetic, finding mean, sum and product of vectors.

Uploaded by

raftaarrafeeq619
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

R Programming

These documents provide examples of basic R programs for performing tasks like printing text, taking user input, checking for leap years, prime numbers, and more. The programs demonstrate using functions like print, readline, and conditionals. Common operations covered include arithmetic, finding mean, sum and product of vectors.

Uploaded by

raftaarrafeeq619
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

R Program

R Program to Print Hello World!

Source Code
# We can use the print() function
print("Hello World!")

[1] "Hello World!"

# Quotes can be suppressed in the output


print("Hello World!", quote=FALSE)

[1] Hello World!

# If there are more than 1 item, we can concatenate using paste()


> print(paste("How","are","you?"))
[1] "How are you?"

In this program, we have used the built-in print() function to print the string
Hello World! The quotes are printed by default. To avoid this we can pass the
argument quote=FALSE.

If there are more than one item, we can use the paste() or cat() function to
concatenate the strings together.

1. R Program to Take Input from User

Use readline() function to take input from the user (terminal). This function will
return a single element character vector. So, if we want numbers, we need to do
appropriate conversions.

my.name <- readline(prompt="Enter name: ")


my.age <- readline(prompt="Enter age: ")

Page 1
my.age <- as.integer(my.age) # convert character into integer
print (paste("Hi,",my.name,"next year you will be",my.age+1,"years old."))

Output

Enter name: Doss


Enter age: 36
[1] "Hi, Doss next year you will be 37 years old."

2.R Program to Check for Leap Year

Source Code
# Program to check if
# the input year is
# a leap year or not

year = as.integer(readline(prompt="Enter a year: "))


if((year %% 4) == 0) {
if((year %% 100) == 0) {
if((year %% 400) == 0) {
print(paste(year,"is a leap year"))
} else {
print(paste(year,"is not a leap year"))
}
} else {
print(paste(year,"is a leap year"))
}
} else {
print(paste(year,"is not a leap year"))
}

Output 1

Enter a year: 1900


[1] "1900 is not a leap year"

Output 2

Page 2
Enter a year: 2000
[1] "2000 is a leap year"

In this program, we ask the user to input a year and check if it is a leap year or not.
Leap years are those divisible by 4. Except those that are divisible by 100 but not
by 400. Thus 1900 is not a leap year as it is divisible by 100. But 2000 is a leap
year because it if divisible by 400 as well.

3. R Program to Check if a Number is Odd or Even

# Program to check if
# the input number is odd or even.
# A number is even if division
# by 2 give a remainder of 0.
# If remainder is 1, it is odd.

num = as.integer(readline(prompt="Enter a number: "))


if((num %% 2) == 0) {
print(paste(num,"is Even"))
} else {
print(paste(num,"is Odd"))
}

Output 1

Enter a number: 89
[1] "89 is Odd"

Output 2

Enter a number: 0
[1] "0 is Even"

Page 3
In this program, we ask the user for the input and check if the number is odd or
even. A number is even if it is perfectly divisible by 2. When the number is divided
by 2, we use the remainder operator %% to compute the remainder. If the
remainder is not zero, the number is odd.

4. R Program to check Armstrong Number

Source Code
# Program to if the
# number provied by the
# user is an Armstrong number
# or not

# take input from the user


num = as.integer(readline(prompt="Enter a number: "))

# initialize sum
sum = 0

# find the sum of the cube of each digit


temp = num
while(temp > 0) {
digit = temp %% 10
sum = sum + (digit ^ 3)
temp = floor(temp/10)
}

# display the result


if(num == sum) {
print(paste(num,"is an Armstrong number"))
} else {
print(paste(num,"is not an Armstrong number"))
}

Output 1

Enter a number: 23
[1] "23 is not an Armstrong number"

Page 4
Output 2

Enter a number: 370


[1] "370 is an Armstrong number"

Here, we ask the user for a number and check if it is an Armstrong number. We
need to calculate the sum of cube of each digit. So, we initialize the sum to 0 and
obtain each digit number by using the modulus operator %%. Remainder of a
number when it is divide by 10 is the last digit of that number. We take the cubes
using exponent operator. Finally we compare the sum with the original number and
conclude that it is Armstrong number if they are equal.

5. R Program to Check if a Number is Prime

A positive integer greater than 1 which has no other factors except 1 and the
number itself is called a prime number. 2, 3, 5, 7 etc. are prime numbers as they do
not have any other factors. But 6 is not prime (it is composite) since, 2 x 3 = 6.

Source Code
# Program to check if
# the input number is
# prime or not

# take input from the user


num = as.integer(readline(prompt="Enter a number: "))

flag = 0
# prime numbers are greater than 1
if(num > 1) {
# check for factors
flag = 1
for(i in 2:(num-1)) {
if ((num %% i) == 0) {
flag = 0
break
}
}
}

Page 5
if(num == 2) flag = 1
if(flag == 1) {
print(paste(num,"is a prime number"))
} else {
print(paste(num,"is not a prime number"))
}

Output 1

Enter a number: 25
[1] "25 is not a prime number"

Output 2

Enter a number: 19
[1] "19 is a prime number"

Here, we take an integer from the user and check whether it is prime or not.
Numbers less than or equal to 1 are not prime numbers. Hence, we only proceed if
the num is greater than 1. We check if num is exactly divisible by any number from
2 to num - 1. If we find a factor in that range, the number is not prime. Else the
number is prime.

We can decrease the range of numbers where we look for factors. In the above
program, our search range is from 2 to num - 1. We could have used the range, [2,
num / 2] or [2, num ** 0.5]. The later range is based on the fact that a composite
number must have a factor less than square root of that number. Otherwise the
number is prime.

6. R Program to Find the Factorial of a Number

The factorial of a number is the product of all the integers from 1 to that number.
For example, the factorial of 6 (denoted as 6!) is 1*2*3*4*5*6 = 720. Factorial is
not defined for negative numbers and the factorial of zero is one, 0! = 1.

Source Code

Page 6
# Program to find the
# factorial of a number
# provided by the user

# take input from the user


num = as.integer(readline(prompt="Enter a number: "))
factorial = 1

# check is the number is negative, positive or zero


if(num < 0) {
print("Sorry, factorial does not exist for negative numbers")
} else if(num == 0) {
print("The factorial of 0 is 1")
} else {
for(i in 1:num) {
factorial = factorial*i
}
print(paste("The factorial of",num,"is",factorial))
}

Output

Enter a number: 8
[1] "The factorial of 8 is 40320"

Here, we take input from the user and check if the number is negative, zero or
positive using if...else statement. If the number is positive, we use for loop to
calculate the factorial. We can also use the built-in function factorial() for this.

> factorial(8)
[1] 40320

7. R Program to Check if a Number is Positive, Negative or Zero

Source Code
# In this program, we input a number
# check if the number is positive or
# negative or zero and display
Page 7
# an appropriate message

num = as.double(readline(prompt="Enter a number: "))


if(num > 0) {
print("Positive number")
} else {
if(num == 0) {
print("Zero")
} else {
print("Negative number")
}
}

Output 1

Enter a number: -9.6


[1] "Negative number"

Output 2

Enter a number: 2
[1] "Positive number"

A number is positive if it is greater than zero. We check this in the expression of if.
If it is FALSE, the number will either be zero or negative. This is also tested in
subsequent expression.

8. R Program to Find the Multiplication Table of a Number

Source Code
# Program to find the multiplication
# table (from 1 to 10)
# of a number input by the user

# take input from the user


num = as.integer(readline(prompt="Enter Number: "))

# use for loop to iterate 10 times

Page 8
for(i in 1:10) {
print(paste(num,'x',i,'=',num*i))
}

Output

Enter Number: 7
[1] "7 x 1 = 7"
[1] "7 x 2 = 14"
[1] "7 x 3 = 21"
[1] "7 x 4 = 28"
[1] "7 x 5 = 35"
[1] "7 x 6 = 42"
[1] "7 x 7 = 49"
[1] "7 x 8 = 56"
[1] "7 x 9 = 63"
[1] "7 x 10 = 70"

Here, we ask the user for a number and display the multiplication table upto 10.
We use for loop to iterate 10 times.

sum the elements of a vector using the sum() function. Similarly, mean() and
prod() functions can be used to find the mean and product of the terms.

Source Code
> sum(2,7,5)
[1] 14

>x
[1] 2 NA 3 1 4

> sum(x) # if any element is NA or NaN, result is NA or NaN


[1] NA

> sum(x, na.rm=TRUE) # this way we can ignore NA and NaN values
[1] 10

Page 9
> mean(x, na.rm=TRUE)
[1] 2.5

> prod(x, na.rm=TRUE)


[1] 24

Whenever a vector contains NA (Not Available) or NaN (Not a Number),


functions such as sum(), mean(), prod() etc. produce NA or NaN respectively. In
order to ignore such values, we pass in the argument na.rm=TRUE.

Add two vectors together using the + operator. One thing to keep in mind while
adding (or other arithmetic operations) two vectors together is the recycling rule. If
the two vectors are of equal length then there is no issue. But if the lengths are
different, the shorter one is recycled (repeated) until its length is equal to that of
the longer one. This recycling process will give a warning if the longer vector is
not an integral multiple of the shorter one.

Source Code
>x
[1] 3 6 8
>y
[1] 2 9 0

>x+y
[1] 5 15 8

> x + 1 # 1 is recycled to (1,1,1)


[1] 4 7 9

> x + c(1,4) # (1,4) is recycled to (1,4,1) but warning issued


[1] 4 10 9
Warning message:
In x + c(1, 4) :
longer object length is not a multiple of shorter object length

As we can see above the two vectors x and y are of equal length so they can be
added together without difficulty. The expression x + 1 also works fine because the

Page 10
single 1 is recycled into a vector of three 1's. Similarly, in the last example, a two
element vector is recycled into a three element vector. But a warning is issued in
this case as 3 is not an integral multiple of 2.

Find the minimum and the maximum of a vector using the min() or the max()
function. A function called range() is also available which returns the minimum
and maximum in a two element vector.

>x
[1] 5 8 3 9 2 7 4 6 10

> # find the minimum


> min(x)
[1] 2

> # find the maximum


> max(x)
[1] 10

> # find the range


> range(x)
[1] 2 10

If we want to find where the minimum or maximum is located, i.e. the index
instead of the actual value, then we can use which.min() and which.max()
functions. Note that these functions will return the index of the first minimum or
maximum in case multiple of them exists.

>x
[1] 5 8 3 9 2 7 4 6 10

> # find index of the minimum


> which.min(x)
[1] 5

> # find index of the minimum


> which.max(x)
[1] 9

> # alternate way to find the minimum

Page 11
> x[which.min(x)]
[1] 2

1. Write a program to illustrate basic Arithmetic in R

Aim: To Write a program to illustrate basic Arithmetic in R

a<-2-5 # subtraction
b <- 6/3 # division
c<-3+2*5 # note order of operations exists
d<-(3+2)*5 # if you need, force operations using # redundant parentheses
e<-4^3 # raise to a power is ^
f<-exp(4) # e^4 = 54.598 (give or take)
g<-log(2.742) # natural log of 2.74
h<-log10(1000) # common log of 1000
pi # 3.14159...
print (a)
print (b)
print ( c)
print (d)
print (e)
print (f)
print (g)
print (h)
print (pi)
OUTPUT:
[1] -3
Page 12
[1] 2
[1] 13
[1] 25
[1] 64
[1] 54.59815
[1] 1.008688
[1] 3
[1] 3.141593

2.Aim: To Write a program to illustrate basic Arithmetic in R


Program:
v <- c ( 2,5.5,6)
t <- c (8, 3, 4)
print (v-t)
print (v+t)
OUTPUT:
[1] -6.0 2.5 2.0
[1] 10.0 8.5 10.0

Result: The basic Arithmetic in R Program Executed Successfully.

3. Write a program to illustrate Variable assignment in R


Aim: To write a Program to illustrate variable assignment in R

Assignment Operators in R
Operator Description
<-, <<-, = Leftwards assignment

Page 13
->, ->> Rightwards assignment

The operators <- and = can be used, almost interchangeably, to assign to variable
in the same environment. <<- is used for assigning to variables in the parent
environments (more like global assignments). The rightward assignments, although
available are rarely used.

Program:

A<-5
B<<-10
C->15
D->>20
E=45
print(A)
print(B)
print(C)
print(D)
print(E)

OUTPUT:
5
10
15
20
45
Result: The Variable assignment in R Program Executed Successfully.

4. Write a program to illustrate data types in R


R - Data Types
Data Type Example
Numeric 12.3, 5, 999
Page 14
Integer 2L, 34L, 0L
Complex 3 + 2i
Character 'a' , '"good", "TRUE", '23.4'
Logical TRUE ,FALSE
Raw Raw Values
Program:
v <- TRUE
print(class(v))
v <- 23.5
print(class(v))
v <- 2L
print(class(v))
v <- 2+5i
print(class(v))
v <- "TRUE"
print(class(v))
v<-charToRaw("SACET")
print(v)
print (class(v))

OUTPUT:
[1] "logical"
[1] "numeric"
[1] "integer"
[1] "complex"
[1] "character"
[1] 53 4143 45 54
[1] “Raw”
Result: The Data Types in R Program Executed Successfully.

5. Write a program to illustrate creating and naming a vector in R

Vectors are the most basic R data objects and there are six types of atomic vectors.

They are logical, integer, double, complex, character and raw.

Page 15
Vector Creation

Single Element Vector

Even when you write just one value in R, it becomes a vector of length 1 and
belongs to one of the above vector types.

# Atomic vector of type character.


print("abc");

# Atomic vector of type double.


print(12.5)

# Atomic vector of type integer.


print(63L)

# Atomic vector of type logical.


print(TRUE)

# Atomic vector of type complex.


print(2+3i)

# Atomic vector of type raw.


print(charToRaw('hello'))

When we execute the above code, it produces the following result −

[1] "abc"
[1] 12.5
[1] 63
[1] TRUE
[1] 2+3i
[1] 68 65 6c 6c 6f

Multiple Elements Vector

Using colon operator with numeric data

# Creating a sequence from 5 to 13.


v <- 5:13
print(v)
Page 16
[1] 5 6 7 8 9 10 11 12 13

# Creating a sequence from 6.6 to 12.6.


v <- 6.6:12.6
print(v)
[1] 6.6 7.6 8.6 9.6 10.6 11.6 12.6

# If the final element specified does not belong to the sequence then it is discarded.
v <- 3.8:11.4
print(v)
[1] 3.8 4.8 5.8 6.8 7.8 8.8 9.8 10.8

Using sequence (Seq.) operator

# Create vector with elements from 5 to 9 incrementing by 0.4.


print(seq(5, 9, by = 0.4))

[1] 5.0 5.4 5.8 6.2 6.6 7.0 7.4 7.8 8.2 8.6 9.0

Using the c() function

The non-character values are coerced to character type if one of the elements is a
character.

# The logical and numeric values are converted to characters.


s <- c('apple','red',5,TRUE)
print(s)

When we execute the above code, it produces the following result −

[1] "apple" "red" "5" "TRUE"

6. Write a program to illustrate create a matrix and naming matrix in R

Matrices are the R objects in which the elements are arranged in a two-dimensional
rectangular layout. They contain elements of the same atomic types. Though we
can create a matrix containing only characters or only logical values, they are not

Page 17
of much use. We use matrices containing numeric elements to be used in
mathematical calculations.

A Matrix is created using the matrix() function.

Syntax

The basic syntax for creating a matrix in R is −

matrix(data, nrow, ncol, byrow, dimnames)

Following is the description of the parameters used −

 data is the input vector which becomes the data elements of the matrix.
 nrow is the number of rows to be created.
 ncol is the number of columns to be created.
 byrow is a logical clue. If TRUE then the input vector elements are arranged
by row.
 dimname is the names assigned to the rows and columns.

Create a matrix taking a vector of numbers as input

# Elements are arranged sequentially by row.


M <- matrix(c(3:14), nrow = 4, byrow = TRUE)
print(M)

[,1] [,2] [,3]


[1,] 3 4 5
[2,] 6 7 8
[3,] 9 10 11
[4,] 12 13 14

# Elements are arranged sequentially by column.


N <- matrix(c(3:14), nrow = 4, byrow = FALSE)
print(N)

Page 18
[,1] [,2] [,3]
[1,] 3 7 11
[2,] 4 8 12
[3,] 5 9 13
[4,] 6 10 14

# Define the column and row names.


rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")

P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(rownames,


colnames))
print(P)

col1 col2 col3


row1 3 4 5
row2 6 7 8
row3 9 10 11
row4 12 13 14

7. Write a program to illustrate Add column and Add a Row in Matrix in R


1) Add Column using cbind
2) Add Row using rbind

1) Add Column using cbind


B = matrix(c(2, 4, 3, 1, 5, 7), nrow=3, ncol=2)

B # B has 3 rows and 2 columns


[,1] [,2]
[1,] 2 1
[2,] 4 5
[3,] 3 7

Page 19
Transpose

We construct the transpose of a matrix by interchanging its columns and rows with
the function t .

> t(B) # transpose of B


[,1] [,2] [,3]
[1,] 2 4 3
[2,] 1 5 7

Combining Matrices

The columns of two matrices having the same number of rows can be combined
into a larger matrix. For example, suppose we have another matrix C also with 3
rows.

C = matrix(c(7, 4, 2), nrow=3, ncol=1)

C # C has 3 rows
[,1]
[1,] 7
[2,] 4
[3,] 2

Then we can combine the columns of B and C with cbind.

cbind(B, C)
[,1] [,2] [,3]
[1,] 2 1 7
[2,] 4 5 4
[3,] 3 7 2

2) Add Row using rbind

Similarly, we can combine the rows of two matrices if they have the same number
of columns with the rbind function.

Page 20
D = matrix( c(6, 2), nrow=1, ncol=2)

D # D has 2 columns
[,1] [,2]
[1,] 6 2

rbind(B, D)
[,1] [,2]
[1,] 2 1
[2,] 4 5
[3,] 3 7
[4,] 6 2

8. Write a program to illustrate Selection of elements in Matrixes in R

Accessing Elements of a Matrix

Elements of a matrix can be accessed by using the column and row index of the
element. We consider the matrix P above to find the specific elements below.

# Define the column and row names.


rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")

# Create the matrix.


P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(rownames,
colnames))
P
[,1] [,2] [,3]
[1,] 3 4 5
[2,] 6 7 8
[3,] 9 10 11
[4,] 12 13 14

# Access the element at 3rd column and 1st row.


print(P[1,3])

OUTPUT:

Page 21
[1] 5

# Access the element at 2nd column and 4th row.


print(P[4,2])
OUTPUT:
[1] 13

# Access only the 2nd row.


print(P[2,])
OUTPUT:
col1 col2 col3
6 7 8

# Access only the 3rd column.


print(P[,3])
OUTPUT:

row1 row2 row3 row4


5 8 11 14

9. Write a program to illustrate Performing Arithmetic of Matrices

Matrix Computations

Various mathematical operations are performed on the matrices using the R


operators. The result of the operation is also a matrix.

The dimensions (number of rows and columns) should be same for the
matrices involved in the operation.

Matrix Addition & Subtraction

# Create two 2x3 matrices.


matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2)
print(matrix1)

matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)


print(matrix2)
Page 22
# Add the matrices.
result <- matrix1 + matrix2
cat("Result of addition","\n")
print(result)

# Subtract the matrices


result <- matrix1 - matrix2
cat("Result of subtraction","\n")
print(result)

When we execute the above code, it produces the following result −

[,1] [,2] [,3]


[1,] 3 -1 2
[2,] 9 4 6
[,1] [,2] [,3]
[1,] 5 0 3
[2,] 2 9 4
Result of addition
[,1] [,2] [,3]
[1,] 8 -1 5
[2,] 11 13 10
Result of subtraction
[,1] [,2] [,3]
[1,] -2 -1 -1
[2,] 7 -5 2

Matrix Multiplication & Division

# Create two 2x3 matrices.


matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2)
print(matrix1)

matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)


print(matrix2)

# Multiply the matrices.


result <- matrix1 * matrix2
cat("Result of multiplication","\n")

Page 23
print(result)

# Divide the matrices


result <- matrix1 / matrix2
cat("Result of division","\n")
print(result)

When we execute the above code, it produces the following result −

[,1] [,2] [,3]


[1,] 3 -1 2
[2,] 9 4 6
[,1] [,2] [,3]
[1,] 5 0 3
[2,] 2 9 4
Result of multiplication
[,1] [,2] [,3]
[1,] 15 0 6
[2,] 18 36 24
Result of division
[,1] [,2] [,3]
[1,] 0.6 -Inf 0.6666667
[2,] 4.5 0.4444444 1.5000000

10. Write a program to illustrate Factors in R

Factor variables are categorical variables that can be either numeric or string
variables. There are a number of advantages to converting categorical variables to
factor variables.

Factors are the data objects which are used to categorize the data and store it as
levels. They can store both strings and integers. They are useful in the columns
which have a limited number of unique values. Like "Male, "Female" and True,
False etc. They are useful in data analysis for statistical modeling.

Factors are created using the factor () function by taking a vector as input.

# Create a vector as input.

Page 24
data <-
c("East","West","East","North","North","East","West","West","West","East","Nor
th")

print(data)

print(is.factor(data))

OUTPUT:
[1] "East" "West" "East" "North" "North" "East" "West" "West" "West"
[10] "East" "North"
[1] FALSE

# Apply the factor function.

factor_data <- factor(data)

print(factor_data)

print(is.factor(factor_data))

OUTPUT:

[1] East West East North North East West West West East North
Levels: East North West
[1] TRUE

Factors in Data Frame

On creating any data frame with a column of text data, R treats the text column as
categorical data and creates factors on it.

# Create the vectors for data frame.


height <- c(132,151,162,139,166,147,122)
weight <- c(48,49,66,53,67,52,40)
gender <- c("male","male","female","female","male","female","male")

Page 25
# Create the data frame.
input_data <- data.frame(height,weight,gender)
print(input_data)

# Test if the gender column is a factor.


print(is.factor(input_data$gender))

# Print the gender column so see the levels.


print(input_data$gender)

When we execute the above code, it produces the following result −

height weight gender


1 132 48 male
2 151 49 male
3 162 66 female
4 139 53 female
5 166 67 male
6 147 52 female
7 122 40 male
[1] TRUE
[1] male male female female male female male
Levels: female male

11. Case study of why you need use a Factor in R


Factors are variables in R which take on a limited number of different values; such
variables are often referred to as categorical variables. One of the most important
uses of factors is in statistical modeling; since categorical variables enter into
statistical models differently than continuous variables, storing data as factors
insures that the modeling functions will treat such data correctly.

Factors in R are stored as a vector of integer values with a corresponding set of


character values to use when the factor is displayed. The factor function is used to
create a factor. The only required argument to factor is a vector of values which
will be returned as a vector of factor values. Both numeric and character variables
can be made into factors, but a factor's levels will always be character values. You
can see the possible levels for a factor through the levels command.

Page 26
To change the order in which the levels will be displayed from their default sorted
order, the levels= argument can be given a vector of all the possible values of the
variable in the order you desire. If the ordering should also be used when
performing comparisons, use the optional ordered=TRUE argument. In this case,
the factor is known as an ordered factor.

The levels of a factor are used when displaying the factor's values. You can change
these levels at the time you create a factor by passing a vector with the new values
through the labels= argument. Note that this actually changes the internal levels of
the factor, and to change the labels of a factor after it has been created, the
assignment form of the levels function is used. To illustrate this point, consider a
factor taking on integer values which we want to display as roman numerals.

> data = c(1,2,2,3,1,2,3,3,1,2,3,3,1)


> fdata = factor(data)
> fdata
[1] 1 2 2 3 1 2 3 3 1 2 3 3 1
Levels: 1 2 3
> rdata = factor(data,labels=c("I","II","III"))
> rdata
[1] I II II III I II III III I II III III I
Levels: I II III

To convert the default factor fdata to roman numerals, we use the assignment form
of the levels function:

> levels(fdata) = c('I','II','III')


> fdata
[1] I II II III I II III III I II III III I
Levels: I II III

Factors represent a very efficient way to store character values, because each
unique character value is stored only once, and the data itself is stored as a vector
of integers. Because of this, read.table will automatically convert character
variables to factors unless the as.is= argument is specified. See Section for details.

As an example of an ordered factor, consider data consisting of the names of


months:

Page 27
> mons = c("March","April","January","November","January",
+ "September","October","September","November","August",
+ "January","November","November","February","May","August",
+ "July","December","August","August","September","November",
+ "February","April")
> mons = factor(mons)
> table(mons)
mons
April August December February January July
2 4 1 2 3 1
March May November October September
1 1 5 1 3

Although the months clearly have an ordering, this is not reflected in the output of
the table function. Additionally, comparison operators are not supported for
unordered factors. Creating an ordered factor solves these problems:

> mons = factor(mons,levels=c("January","February","March",


+ "April","May","June","July","August","September",
+ "October","November","December"),ordered=TRUE)
> mons[1] < mons[2]
[1] TRUE
> table(mons)
mons
January February March April May June
3 2 1 2 1 0
July August September October November December
1 4 3 1 5 1

While it may be necessary to convert a numeric variable to a factor for a particular


application, it is often very useful to convert the factor back to its original numeric
values, since even simple arithmetic operations will fail when using factors. Since
the as.numeric function will simply return the internal integer values of the factor,
the conversion must be done using the levels attribute of the factor.

Suppose we are studying the effects of several levels of a fertilizer on the growth
of a plant. For some analyses, it might be useful to convert the fertilizer levels to
an ordered factor:

Page 28
> fert = c(10,20,20,50,10,20,10,50,20)
> fert = factor(fert,levels=c(10,20,50),ordered=TRUE)
> fert
[1] 10 20 20 50 10 20 10 50 20
Levels: 10 < 20 < 50

If we wished to calculate the mean of the original numeric values of the fert
variable, we would have to convert the values using the levels function:

> mean(fert)
[1] NA
Warning message:
argument is not numeric or logical:
returning NA in: mean.default(fert)
> mean(as.numeric(levels(fert)[fert]))
[1] 23.33333

Indexing the return value from the levels function is the most reliable way to
convert numeric factors to their original numeric values.

When a factor is first created, all of its levels are stored along with the factor, and
if subsets of the factor are extracted, they will retain all of the original levels. This
can create problems when constructing model matrices and may or may not be
useful when displaying the data using, say, the table function. As an example,
consider a random sample from the letters vector, which is part of the base R
distribution.

> lets = sample(letters,size=100,replace=TRUE)


> lets = factor(lets)
> table(lets[1:5])

abcdefghijklmnopqrstuvwxyz
10000000001000001010000001

Even though only five of the levels were actually represented, the table function
shows the frequencies for all of the levels of the original factors. To change this,
we can simply use another call to factor

> table(factor(lets[1:5]))

Page 29
akqsz
11111

To exclude certain levels from appearing in a factor, the exclude= argument can be
passed to factor. By default, the missing value (NA) is excluded from factor levels;
to create a factor that inludes missing values from a numeric variable, use
exclude=NULL.

Care must be taken when combining variables which are factors, because the c
function will interpret the factors as integers. To combine factors, they should first
be converted back to their original values (through the levels function), then
catenated and converted to a new factor:

> l1 = factor(sample(letters,size=10,replace=TRUE))
> l2 = factor(sample(letters,size=10,replace=TRUE))
> l1
[1] o b i v q n q w e z
Levels: b e i n o q v w z
> l2
[1] b a s b l r g m z o
Levels: a b g l m o r s z
> l12 = factor(c(levels(l1)[l1],levels(l2)[l2]))
> l12
[1] o b i v q n q w e z b a s b l r g m z o
Levels: a b e g i l m n o q r s v w z

The cut function is used to convert a numeric variable into a factor. The breaks=
argument to cut is used to describe how ranges of numbers will be converted to
factor values. If a number is provided through the breaks= argument, the resulting
factor will be created by dividing the range of the variable into that number of
equal length intervals; if a vector of values is provided, the values in the vector are
used to determine the breakpoint. Note that if a vector of values is provided, the
number of levels of the resultant factor will be one less than the number of values
in the vector.

Page 30
For example, consider the women data set, which contains height and weights for a
sample of women. If we wanted to create a factor corresponding to weight, with
three equally-spaced levels, we could use the following:

> wfact = cut(women$weight,3)


> table(wfact)
wfact
(115,131] (131,148] (148,164]
6 5 4

Notice that the default label for factors produced by cut contains the actual range
of values that were used to divide the variable into factors. The pretty function can
be used to make nicer default labels, but it may not return the number of levels
that's actually desired:

> wfact = cut(women$weight,pretty(women$weight,3))


> wfact
[1] (100,120] (100,120] (100,120] (120,140] (120,140] (120,140] (120,140]
[8] (120,140] (120,140] (140,160] (140,160] (140,160] (140,160] (140,160]
[15] (160,180]
Levels: (100,120] (120,140] (140,160] (160,180]
> table(wfact)
wfact
(100,120] (120,140] (140,160] (160,180]
3 6 5 1

The labels= argument to cut allows you to specify the levels of the factors:

> wfact = cut(women$weight,3,labels=c('Low','Medium','High'))


> table(wfact)
wfact
Low Medium High
6 5 4

To produce factors based on percentiles of your data (for example quartiles or


deciles), the quantile function can be used to generate the breaks= argument,
insuring nearly equal numbers of observations in each of the levels of the factor:

> wfact = cut(women$weight,quantile(women$weight,(0:4)/4))


> table(wfact)
Page 31
wfact
(115,124] (124,135] (135,148] (148,164]
3 4 3 4

As mentioned in Section , there are a number of ways to create factors from


date/time objects. If you wish to create a factor based on one of the components of
that date, you can extract it with strftime and convert it to a factor directly. For
example, we can use the seq function to create a vector of dates representing each
day of the year:

> everyday = seq(from=as.Date('2005-1-1'),to=as.Date('2005-12-31'),by='day')

To create a factor based on the month of the year in which each date falls, we can
extract the month name (full or abbreviated) using format:

> cmonth = format(everyday,'%b')


> months = factor(cmonth,levels=unique(cmonth),ordered=TRUE)
> table(months)
months
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
31 28 31 30 31 30 31 31 30 31 30 31

Since unique returns unique values in the order they are encountered, the levels
argument will provide the month abbreviations in the correct order to produce an
properly ordered factor.

For more details on formatting dates, see Section

Sometimes more flexibility can be acheived by using the cut function, which
understands time units of months, days, weeks and years through the breaks=
argument. (For date/time values, units of hours, minutes, and seconds can also be
used.) For example, to format the days of the year based on the week in which they
fall, we could use cut as follows:

> wks = cut(everyday,breaks='week')


> head(wks)
[1] 2004-12-27 2004-12-27 2005-01-03 2005-01-03 2005-01-03 2005-01-03
53 Levels: 2004-12-27 2005-01-03 2005-01-10 2005-01-17 ... 2005-12-26

Page 32
Note that the first observation had a date earlier than any of the dates in the
everyday vector, since the first date was in middle of the week. By default, cut
starts weeks on Mondays; to use Sundays instead, pass the
start.on.monday=FALSE argument to cut.

Multiples of units can also be specified through the breaks= argument. For
example, create a factor based on the quarter of the year an observation is in, we
could use cut as follows:

> qtrs = cut(everyday,"3 months",labels=paste('Q',1:4,sep=''))


> head(qtrs)
[1] Q1 Q1 Q1 Q1 Q1 Q1
Levels: Q1 Q2 Q3 Q4

12. Write a program to illustrate Ordered Factors in R

Changing the Order of Levels

The order of the levels in a factor can be changed by applying the factor function
again with new order of the levels.

data <-
c("East","West","East","North","North","East","West","West","West","East","Nor
th")
# Create the factors
factor_data <- factor(data)
print(factor_data)

# Apply the factor function with required order of the level.


new_order_data <- factor(factor_data,levels = c("East","West","North"))
print(new_order_data)

When we execute the above code, it produces the following result −

[1] East West East North North East West West West East North
Levels: East North West
[1] East West East North North East West West West East North
Levels: East West North

Page 33
Generating Factor Levels

We can generate factor levels by using the gl() function. It takes two integers as
input which indicates how many levels and how many times each level.

Syntax

gl(n, k, labels)

Following is the description of the parameters used −

 n is a integer giving the number of levels.


 k is a integer giving the number of replications.
 labels is a vector of labels for the resulting factor levels.

Example

v <- gl(3, 4, labels = c("Tampa", "Seattle","Boston"))


print(v)

When we execute the above code, it produces the following result −

Tampa Tampa Tampa Tampa Seattle Seattle Seattle Seattle Boston


[10] Boston Boston Boston
Levels: Tampa Seattle Boston

13. Write a program to illustrate Data Frame Selection of elements in a Data


frame
14. Write a program to illustrate Sorting a Data frame

Page 34

You might also like