R Programming
R Programming
Source Code
# We can use the print() function
print("Hello World!")
In this program, we have used the built-in print() function to print the string
Hello World! The quotes are printed by default. To avoid this we can pass the
argument quote=FALSE.
If there are more than one item, we can use the paste() or cat() function to
concatenate the strings together.
Use readline() function to take input from the user (terminal). This function will
return a single element character vector. So, if we want numbers, we need to do
appropriate conversions.
Page 1
my.age <- as.integer(my.age) # convert character into integer
print (paste("Hi,",my.name,"next year you will be",my.age+1,"years old."))
Output
Source Code
# Program to check if
# the input year is
# a leap year or not
Output 1
Output 2
Page 2
Enter a year: 2000
[1] "2000 is a leap year"
In this program, we ask the user to input a year and check if it is a leap year or not.
Leap years are those divisible by 4. Except those that are divisible by 100 but not
by 400. Thus 1900 is not a leap year as it is divisible by 100. But 2000 is a leap
year because it if divisible by 400 as well.
# Program to check if
# the input number is odd or even.
# A number is even if division
# by 2 give a remainder of 0.
# If remainder is 1, it is odd.
Output 1
Enter a number: 89
[1] "89 is Odd"
Output 2
Enter a number: 0
[1] "0 is Even"
Page 3
In this program, we ask the user for the input and check if the number is odd or
even. A number is even if it is perfectly divisible by 2. When the number is divided
by 2, we use the remainder operator %% to compute the remainder. If the
remainder is not zero, the number is odd.
Source Code
# Program to if the
# number provied by the
# user is an Armstrong number
# or not
# initialize sum
sum = 0
Output 1
Enter a number: 23
[1] "23 is not an Armstrong number"
Page 4
Output 2
Here, we ask the user for a number and check if it is an Armstrong number. We
need to calculate the sum of cube of each digit. So, we initialize the sum to 0 and
obtain each digit number by using the modulus operator %%. Remainder of a
number when it is divide by 10 is the last digit of that number. We take the cubes
using exponent operator. Finally we compare the sum with the original number and
conclude that it is Armstrong number if they are equal.
A positive integer greater than 1 which has no other factors except 1 and the
number itself is called a prime number. 2, 3, 5, 7 etc. are prime numbers as they do
not have any other factors. But 6 is not prime (it is composite) since, 2 x 3 = 6.
Source Code
# Program to check if
# the input number is
# prime or not
flag = 0
# prime numbers are greater than 1
if(num > 1) {
# check for factors
flag = 1
for(i in 2:(num-1)) {
if ((num %% i) == 0) {
flag = 0
break
}
}
}
Page 5
if(num == 2) flag = 1
if(flag == 1) {
print(paste(num,"is a prime number"))
} else {
print(paste(num,"is not a prime number"))
}
Output 1
Enter a number: 25
[1] "25 is not a prime number"
Output 2
Enter a number: 19
[1] "19 is a prime number"
Here, we take an integer from the user and check whether it is prime or not.
Numbers less than or equal to 1 are not prime numbers. Hence, we only proceed if
the num is greater than 1. We check if num is exactly divisible by any number from
2 to num - 1. If we find a factor in that range, the number is not prime. Else the
number is prime.
We can decrease the range of numbers where we look for factors. In the above
program, our search range is from 2 to num - 1. We could have used the range, [2,
num / 2] or [2, num ** 0.5]. The later range is based on the fact that a composite
number must have a factor less than square root of that number. Otherwise the
number is prime.
The factorial of a number is the product of all the integers from 1 to that number.
For example, the factorial of 6 (denoted as 6!) is 1*2*3*4*5*6 = 720. Factorial is
not defined for negative numbers and the factorial of zero is one, 0! = 1.
Source Code
Page 6
# Program to find the
# factorial of a number
# provided by the user
Output
Enter a number: 8
[1] "The factorial of 8 is 40320"
Here, we take input from the user and check if the number is negative, zero or
positive using if...else statement. If the number is positive, we use for loop to
calculate the factorial. We can also use the built-in function factorial() for this.
> factorial(8)
[1] 40320
Source Code
# In this program, we input a number
# check if the number is positive or
# negative or zero and display
Page 7
# an appropriate message
Output 1
Output 2
Enter a number: 2
[1] "Positive number"
A number is positive if it is greater than zero. We check this in the expression of if.
If it is FALSE, the number will either be zero or negative. This is also tested in
subsequent expression.
Source Code
# Program to find the multiplication
# table (from 1 to 10)
# of a number input by the user
Page 8
for(i in 1:10) {
print(paste(num,'x',i,'=',num*i))
}
Output
Enter Number: 7
[1] "7 x 1 = 7"
[1] "7 x 2 = 14"
[1] "7 x 3 = 21"
[1] "7 x 4 = 28"
[1] "7 x 5 = 35"
[1] "7 x 6 = 42"
[1] "7 x 7 = 49"
[1] "7 x 8 = 56"
[1] "7 x 9 = 63"
[1] "7 x 10 = 70"
Here, we ask the user for a number and display the multiplication table upto 10.
We use for loop to iterate 10 times.
sum the elements of a vector using the sum() function. Similarly, mean() and
prod() functions can be used to find the mean and product of the terms.
Source Code
> sum(2,7,5)
[1] 14
>x
[1] 2 NA 3 1 4
> sum(x, na.rm=TRUE) # this way we can ignore NA and NaN values
[1] 10
Page 9
> mean(x, na.rm=TRUE)
[1] 2.5
Add two vectors together using the + operator. One thing to keep in mind while
adding (or other arithmetic operations) two vectors together is the recycling rule. If
the two vectors are of equal length then there is no issue. But if the lengths are
different, the shorter one is recycled (repeated) until its length is equal to that of
the longer one. This recycling process will give a warning if the longer vector is
not an integral multiple of the shorter one.
Source Code
>x
[1] 3 6 8
>y
[1] 2 9 0
>x+y
[1] 5 15 8
As we can see above the two vectors x and y are of equal length so they can be
added together without difficulty. The expression x + 1 also works fine because the
Page 10
single 1 is recycled into a vector of three 1's. Similarly, in the last example, a two
element vector is recycled into a three element vector. But a warning is issued in
this case as 3 is not an integral multiple of 2.
Find the minimum and the maximum of a vector using the min() or the max()
function. A function called range() is also available which returns the minimum
and maximum in a two element vector.
>x
[1] 5 8 3 9 2 7 4 6 10
If we want to find where the minimum or maximum is located, i.e. the index
instead of the actual value, then we can use which.min() and which.max()
functions. Note that these functions will return the index of the first minimum or
maximum in case multiple of them exists.
>x
[1] 5 8 3 9 2 7 4 6 10
Page 11
> x[which.min(x)]
[1] 2
a<-2-5 # subtraction
b <- 6/3 # division
c<-3+2*5 # note order of operations exists
d<-(3+2)*5 # if you need, force operations using # redundant parentheses
e<-4^3 # raise to a power is ^
f<-exp(4) # e^4 = 54.598 (give or take)
g<-log(2.742) # natural log of 2.74
h<-log10(1000) # common log of 1000
pi # 3.14159...
print (a)
print (b)
print ( c)
print (d)
print (e)
print (f)
print (g)
print (h)
print (pi)
OUTPUT:
[1] -3
Page 12
[1] 2
[1] 13
[1] 25
[1] 64
[1] 54.59815
[1] 1.008688
[1] 3
[1] 3.141593
Assignment Operators in R
Operator Description
<-, <<-, = Leftwards assignment
Page 13
->, ->> Rightwards assignment
The operators <- and = can be used, almost interchangeably, to assign to variable
in the same environment. <<- is used for assigning to variables in the parent
environments (more like global assignments). The rightward assignments, although
available are rarely used.
Program:
A<-5
B<<-10
C->15
D->>20
E=45
print(A)
print(B)
print(C)
print(D)
print(E)
OUTPUT:
5
10
15
20
45
Result: The Variable assignment in R Program Executed Successfully.
OUTPUT:
[1] "logical"
[1] "numeric"
[1] "integer"
[1] "complex"
[1] "character"
[1] 53 4143 45 54
[1] “Raw”
Result: The Data Types in R Program Executed Successfully.
Vectors are the most basic R data objects and there are six types of atomic vectors.
Page 15
Vector Creation
Even when you write just one value in R, it becomes a vector of length 1 and
belongs to one of the above vector types.
[1] "abc"
[1] 12.5
[1] 63
[1] TRUE
[1] 2+3i
[1] 68 65 6c 6c 6f
# If the final element specified does not belong to the sequence then it is discarded.
v <- 3.8:11.4
print(v)
[1] 3.8 4.8 5.8 6.8 7.8 8.8 9.8 10.8
[1] 5.0 5.4 5.8 6.2 6.6 7.0 7.4 7.8 8.2 8.6 9.0
The non-character values are coerced to character type if one of the elements is a
character.
Matrices are the R objects in which the elements are arranged in a two-dimensional
rectangular layout. They contain elements of the same atomic types. Though we
can create a matrix containing only characters or only logical values, they are not
Page 17
of much use. We use matrices containing numeric elements to be used in
mathematical calculations.
Syntax
data is the input vector which becomes the data elements of the matrix.
nrow is the number of rows to be created.
ncol is the number of columns to be created.
byrow is a logical clue. If TRUE then the input vector elements are arranged
by row.
dimname is the names assigned to the rows and columns.
Page 18
[,1] [,2] [,3]
[1,] 3 7 11
[2,] 4 8 12
[3,] 5 9 13
[4,] 6 10 14
Page 19
Transpose
We construct the transpose of a matrix by interchanging its columns and rows with
the function t .
Combining Matrices
The columns of two matrices having the same number of rows can be combined
into a larger matrix. For example, suppose we have another matrix C also with 3
rows.
C # C has 3 rows
[,1]
[1,] 7
[2,] 4
[3,] 2
cbind(B, C)
[,1] [,2] [,3]
[1,] 2 1 7
[2,] 4 5 4
[3,] 3 7 2
Similarly, we can combine the rows of two matrices if they have the same number
of columns with the rbind function.
Page 20
D = matrix( c(6, 2), nrow=1, ncol=2)
D # D has 2 columns
[,1] [,2]
[1,] 6 2
rbind(B, D)
[,1] [,2]
[1,] 2 1
[2,] 4 5
[3,] 3 7
[4,] 6 2
Elements of a matrix can be accessed by using the column and row index of the
element. We consider the matrix P above to find the specific elements below.
OUTPUT:
Page 21
[1] 5
Matrix Computations
The dimensions (number of rows and columns) should be same for the
matrices involved in the operation.
Page 23
print(result)
Factor variables are categorical variables that can be either numeric or string
variables. There are a number of advantages to converting categorical variables to
factor variables.
Factors are the data objects which are used to categorize the data and store it as
levels. They can store both strings and integers. They are useful in the columns
which have a limited number of unique values. Like "Male, "Female" and True,
False etc. They are useful in data analysis for statistical modeling.
Factors are created using the factor () function by taking a vector as input.
Page 24
data <-
c("East","West","East","North","North","East","West","West","West","East","Nor
th")
print(data)
print(is.factor(data))
OUTPUT:
[1] "East" "West" "East" "North" "North" "East" "West" "West" "West"
[10] "East" "North"
[1] FALSE
print(factor_data)
print(is.factor(factor_data))
OUTPUT:
[1] East West East North North East West West West East North
Levels: East North West
[1] TRUE
On creating any data frame with a column of text data, R treats the text column as
categorical data and creates factors on it.
Page 25
# Create the data frame.
input_data <- data.frame(height,weight,gender)
print(input_data)
Page 26
To change the order in which the levels will be displayed from their default sorted
order, the levels= argument can be given a vector of all the possible values of the
variable in the order you desire. If the ordering should also be used when
performing comparisons, use the optional ordered=TRUE argument. In this case,
the factor is known as an ordered factor.
The levels of a factor are used when displaying the factor's values. You can change
these levels at the time you create a factor by passing a vector with the new values
through the labels= argument. Note that this actually changes the internal levels of
the factor, and to change the labels of a factor after it has been created, the
assignment form of the levels function is used. To illustrate this point, consider a
factor taking on integer values which we want to display as roman numerals.
To convert the default factor fdata to roman numerals, we use the assignment form
of the levels function:
Factors represent a very efficient way to store character values, because each
unique character value is stored only once, and the data itself is stored as a vector
of integers. Because of this, read.table will automatically convert character
variables to factors unless the as.is= argument is specified. See Section for details.
Page 27
> mons = c("March","April","January","November","January",
+ "September","October","September","November","August",
+ "January","November","November","February","May","August",
+ "July","December","August","August","September","November",
+ "February","April")
> mons = factor(mons)
> table(mons)
mons
April August December February January July
2 4 1 2 3 1
March May November October September
1 1 5 1 3
Although the months clearly have an ordering, this is not reflected in the output of
the table function. Additionally, comparison operators are not supported for
unordered factors. Creating an ordered factor solves these problems:
Suppose we are studying the effects of several levels of a fertilizer on the growth
of a plant. For some analyses, it might be useful to convert the fertilizer levels to
an ordered factor:
Page 28
> fert = c(10,20,20,50,10,20,10,50,20)
> fert = factor(fert,levels=c(10,20,50),ordered=TRUE)
> fert
[1] 10 20 20 50 10 20 10 50 20
Levels: 10 < 20 < 50
If we wished to calculate the mean of the original numeric values of the fert
variable, we would have to convert the values using the levels function:
> mean(fert)
[1] NA
Warning message:
argument is not numeric or logical:
returning NA in: mean.default(fert)
> mean(as.numeric(levels(fert)[fert]))
[1] 23.33333
Indexing the return value from the levels function is the most reliable way to
convert numeric factors to their original numeric values.
When a factor is first created, all of its levels are stored along with the factor, and
if subsets of the factor are extracted, they will retain all of the original levels. This
can create problems when constructing model matrices and may or may not be
useful when displaying the data using, say, the table function. As an example,
consider a random sample from the letters vector, which is part of the base R
distribution.
abcdefghijklmnopqrstuvwxyz
10000000001000001010000001
Even though only five of the levels were actually represented, the table function
shows the frequencies for all of the levels of the original factors. To change this,
we can simply use another call to factor
> table(factor(lets[1:5]))
Page 29
akqsz
11111
To exclude certain levels from appearing in a factor, the exclude= argument can be
passed to factor. By default, the missing value (NA) is excluded from factor levels;
to create a factor that inludes missing values from a numeric variable, use
exclude=NULL.
Care must be taken when combining variables which are factors, because the c
function will interpret the factors as integers. To combine factors, they should first
be converted back to their original values (through the levels function), then
catenated and converted to a new factor:
> l1 = factor(sample(letters,size=10,replace=TRUE))
> l2 = factor(sample(letters,size=10,replace=TRUE))
> l1
[1] o b i v q n q w e z
Levels: b e i n o q v w z
> l2
[1] b a s b l r g m z o
Levels: a b g l m o r s z
> l12 = factor(c(levels(l1)[l1],levels(l2)[l2]))
> l12
[1] o b i v q n q w e z b a s b l r g m z o
Levels: a b e g i l m n o q r s v w z
The cut function is used to convert a numeric variable into a factor. The breaks=
argument to cut is used to describe how ranges of numbers will be converted to
factor values. If a number is provided through the breaks= argument, the resulting
factor will be created by dividing the range of the variable into that number of
equal length intervals; if a vector of values is provided, the values in the vector are
used to determine the breakpoint. Note that if a vector of values is provided, the
number of levels of the resultant factor will be one less than the number of values
in the vector.
Page 30
For example, consider the women data set, which contains height and weights for a
sample of women. If we wanted to create a factor corresponding to weight, with
three equally-spaced levels, we could use the following:
Notice that the default label for factors produced by cut contains the actual range
of values that were used to divide the variable into factors. The pretty function can
be used to make nicer default labels, but it may not return the number of levels
that's actually desired:
The labels= argument to cut allows you to specify the levels of the factors:
To create a factor based on the month of the year in which each date falls, we can
extract the month name (full or abbreviated) using format:
Since unique returns unique values in the order they are encountered, the levels
argument will provide the month abbreviations in the correct order to produce an
properly ordered factor.
Sometimes more flexibility can be acheived by using the cut function, which
understands time units of months, days, weeks and years through the breaks=
argument. (For date/time values, units of hours, minutes, and seconds can also be
used.) For example, to format the days of the year based on the week in which they
fall, we could use cut as follows:
Page 32
Note that the first observation had a date earlier than any of the dates in the
everyday vector, since the first date was in middle of the week. By default, cut
starts weeks on Mondays; to use Sundays instead, pass the
start.on.monday=FALSE argument to cut.
Multiples of units can also be specified through the breaks= argument. For
example, create a factor based on the quarter of the year an observation is in, we
could use cut as follows:
The order of the levels in a factor can be changed by applying the factor function
again with new order of the levels.
data <-
c("East","West","East","North","North","East","West","West","West","East","Nor
th")
# Create the factors
factor_data <- factor(data)
print(factor_data)
[1] East West East North North East West West West East North
Levels: East North West
[1] East West East North North East West West West East North
Levels: East West North
Page 33
Generating Factor Levels
We can generate factor levels by using the gl() function. It takes two integers as
input which indicates how many levels and how many times each level.
Syntax
gl(n, k, labels)
Example
Page 34