0% found this document useful (0 votes)

11 views101 pages

Introduction To R

The document provides an introduction to R, a programming language and statistical package developed as an open-source alternative to the S language. It covers the history of R, its capabilities, and its limitations, including data handling, statistical functions, and session management. Additionally, it explains basic data types, operations on vectors and matrices, and the structure of lists and data frames.

Uploaded by

Deepali Naglot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views101 pages

Introduction To R

Uploaded by

Deepali Naglot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

R – a brief

introduction
Original material

 Johannes Freudenberg
 Cincinnati Children’s Hospital Medical Center
 Marcel Baumgartner
 Nestec S.A.
 Jaeyong Lee
 Penn State University
 Jennifer Urbano Blackford, Ph.D
 Department of Psychiatry, Kennedy Center
 Wolfgang Huber
History of R

 Statistical programming language S developed at

Bell Labs since 1976 (at the same time as UNIX)
 Intended to interactively support research and
data analysis projects
 Exclusively licensed to Insightful (“S-Plus”)
 R: Open source platform similar to S developed
by R. Gentleman and R. Ihaka (U of Auckland, NZ)
during the 1990s
 Since 1997: international “R-core” developing
team
 Updated versions available every couple months
What R is and what it is not

 R is
 a programming language
 a statistical package
 an interpreter
 Open Source
 R is not
 a database
 a collection of “black boxes”
 a spreadsheet software package
 commercially supported
What R is

 data handling and storage: numeric, textual

 matrix algebra
 hash tables and regular expressions
 high-level data analytic and statistical functions
 classes (“OO”)
 graphics
 programming language: loops, branching,
subroutines
What R is not

 is not a database, but connects to DBMSs

 has no click-point user interfaces, but connects to
Java, TclTk
 language interpreter can be very slow, but allows
to call own C/C++ code
 no spreadsheet view of data, but connects to
Excel/MsOffice
 no professional /commercial support
R and statistics

 Packaging: a crucial infrastructure to efficiently

produce, load and keep consistent software
libraries from (many) different sources / authors

 Statistics: most packages deal with statistics and

data analysis

 State of the art: many statistical researchers

provide their methods as R packages
Getting started

 To obtain and install R on your computer

 Go to [Link] to choose a
mirror near you
 Click on your favorite operating system (Linux, Mac, or
Windows)
 Download and install the “base”
 To install additional packages
 Start R on your computer
 Choose the appropriate item from the “Packages” menu
R: session management

 Your R objects are stored in a workspace

 To list the objects in your workspace: > ls()
 To remove objects you no longer need:
> rm(weight, height, bmi)
 To remove ALL objects in your workspace:
> rm(list=ls()) or use Remove all objects in the
Misc menu
 To save your workspace to a file, you may type
> [Link]() or use Save Workspace… in the
File menu
 The default workspace file is called .RData
Basic data types
Objects

 names
 types of objects: vector, factor, array, matrix,
[Link], ts, list
 attributes
 mode: numeric, character, complex, logical
 length: number of elements in object
 creation
 assign a value
 create a blank object
Naming Convention

 must start with a letter (A-Z or a-z)

 can contain letters, digits (0-9), and/or periods “.”

 case-sensitive
 mydata different from MyData

 do not use use underscore “_”

Assignment

 “<-” used to indicate assignment

x<-c(1,2,3,4,5,6,7)
x<-c(1:7)
x<-1:4

 note: as of version 1.4 “=“ is also a valid assignment

operator
R as a calculator

> 5 + (6 + 7) * pi^2
[1] 133.3049
> log(exp(1))
[1] 1
> log(1000, 10)
[1] 3
> sin(pi/3)^2 + cos(pi/3)^2
[1] 1
> Sin(pi/3)^2 + cos(pi/3)^2
Error: couldn't find function "Sin"
R as a calculator

1.0
> log2(32)
[1] 5

0.5
sin(seq(0, 2 * pi, length = 100))
> sqrt(2)

0.0
[1] 1.414214

-0.5
> seq(0, 5, length=6)
[1] 0 1 2 3 4 5 -1.0

0 20 40 60 80 100

Index

> plot(sin(seq(0, 2*pi, length=100)))

Basic (atomic) data types

 Logical  Character
> x <- T; y <- F > a <- "1"; b <- 1
> x; y > a; b
[1] TRUE [1] "1"
[1] 1
[1] FALSE
> a <- "character"
 Numerical
> b <- "a"; c <- a
> a <- 5; b <- sqrt(2) > a; b; c
> a; b [1] "character"
[1] 5 [1] "a"
[1] 1.414214 [1] "character"
Vectors, Matrices, Arrays

 Vector
 Ordered collection of data of the same data type
 Example:
 last names of all students in this class

 Mean intensities of all genes on an oligonucleotide

microarray
 In R, single number is a vector of length 1
 Matrix
 Rectangular table of data of the same type
 Example
 Mean intensities of all genes measured during a microarray

experiment
 Array
 Higher dimensional matrix
Vectors

 Vector: Ordered collection of data of the same

data type
> x <- c(5.2, 1.7, 6.3)
> log(x)
[1] 1.6486586 0.5306283 1.8405496
> y <- 1:5
> z <- seq(1, 1.4, by = 0.1)
>y+z
[1] 2.0 3.1 4.2 5.3 6.4
> length(y)
[1] 5
> mean(y + z)
[1] 4.2
Vecteurs

> Mydata <- c(2,3.5,-0.2) Vector (c=“concatenate”)

> Colors <-
c("Red","Green","Red") Character vector

> x1 <- 25:30

> x1
[1] 25 26 27 28 29 30 Number sequences

> Colors[2]
[1] "Green" One element

> x1[3:5]
[1] 27 28 29 Various elements
Operation on vector elements

> Mydata
[1] 2 3.5 -0.2

> Mydata > 0  Test on the elements

[1] TRUE TRUE FALSE

> Mydata[Mydata>0]  Extract the positive

[1] 2 3.5 elements

> Mydata[-c(1,3)]
[1] 3.5  Remove elements
Vector operations
> x <- c(5,-2,3,-7)
> y <- c(1,2,3,4)*10 Operation on all the elements
>y
[1] 10 20 30 40

> sort(x) Sorting a vector

[1] -7 -2 3 5

> order(x)
[1] 4 2 3 1 Element order for sorting

> y[order(x)]
[1] 40 20 30 10 Operation on all the components

> rev(x) Reverse a vector

[1] -7 3 -2 5
Matrices
 Matrix: Rectangular table of data of the same type
> m <- matrix(1:12, 4, byrow = T); m
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[4,] 10 11 12
> y <- -1:2
> [Link] <- m + y
> t([Link])
[,1] [,2] [,3] [,4]
[1,] 0 4 8 12
[2,] 1 5 9 13
[3,] 2 6 10 14
> dim(m)
[1] 4 3
> dim(t([Link]))
[1] 3 4
Matrices
Matrix: Rectangular table of data of the same type

> x <- c(3,-1,2,0,-3,6)

> [Link] <- matrix(x,ncol=2) Matrix with 2 cols
> [Link]
[,1] [,2]
[1,] 3 0
[2,] -1 -3
[3,] 2 6

> [Link] <- matrix(x,ncol=2,

byrow=T) By row creation
> [Link]
[,1] [,2]
[1,] 3 -1
[2,] 2 0
[3,] -3 6
Dealing with matrices

> [Link][,2] 2nd col

[1] -1 0 6

> [Link][c(1,3),] 1st and 3rd lines

[,1] [,2]
[1,] 3 -1
[2,] -3 6

> [Link] %*% t([Link]) Multiplication

[,1] [,2] [,3]

[1,] 10 6 -15
[2,] 6 4 -6
[3,] -15 -6 45

> solve() Inverse of a square matrix

> eigen() Eigenvectors and eigenvalues
Missing values

 R is designed to handle statistical data and

therefore predestined to deal with missing values
 Numbers that are “not available”
> x <- c(1, 2, 3, NA)
>x+3
[1] 4 5 6 NA
 “Not a number”
> log(c(0, 1, 2))
[1] -Inf 0.0000000 0.6931472
> 0/0
[1] NaN
Subsetting

 It is often necessary to extract a subset of a

vector or matrix
 R offers a couple of neat ways to do that
> x <- c("a", "b", "c", "d", "e", "f", "g", "h")
> x[1]
> x[3:5]
> x[-(3:5)]
> x[c(T, F, T, F, T, F, T, F)]
> x[x <= "d"]
> m[,2]
> m[3,]
Lists, data frames,
and factors
Lists
vector: an ordered collection of data of the same
type.
> a = c(7,5,1)
> a[2]
[1] 5

list: an ordered collection of data of arbitrary types.

> doe = list(name="john",age=28,married=F)
> doe$name
[1] "john“
> doe$age
[1] 28

Typically, vector elements are accessed by their

index (an integer), list elements by their name (a
Lists 1

 A list is an object consisting of objects called

components.
 The components of a list don’t need to be of the
same mode or type and they can be a numeric
vector, a logical value and a function and so on.
 A component of a list can be referred as aa[[I]] or
aa$times, where aa is the name of the list and
times is a name of a component of aa.
Lists 2

 The names of components may be abbreviated

down to the minimum number of letters needed
to identify them uniquely.
 aa[[1]] is the first component of aa, while aa[1] is
the sublist consisting of the first component of aa
only.
 There are functions whose return value is a List.
We have seen some of them, eigen, svd, …
Lists are very flexible
> [Link] <- list(c(5,4,-1),c("X1","X2","X3"))
> [Link]
[[1]]:
[1] 5 4 -1

[[2]]:
[1] "X1" "X2" "X3"

> [Link][[1]]
[1] 5 4 -1

> [Link] <- list(c1=c(5,4,-1),c2=c("X1","X2","X3"))

> [Link]$c2[2:3]
[1] "X2" "X3"
Lists: Session

Empl <- list(employee=“Anna”, spouse=“Fred”, children=3,

[Link]=c(4,7,9))
Empl[[4]]
Empl$child.a
Empl[4] # a sublist consisting of the 4th component of Empl
names(Empl) <- letters[1:4]
Empl <- c(Empl, service=8)
unlist(Empl) # converts it to a vector. Mixed types will be converted to
character, giving a character vector.
More lists

> [Link]
[,1] [,2]
[1,] 3 -1
[2,] 2 0
[3,] -3 6

> dimnames([Link]) <- list(c("L1","L2","L3"),

c("R1","R2"))

> [Link]
R1 R2
L1 3 -1
L2 2 0
L3 -3 6
Data frames
data frame: represents a spreadsheet.
Rectangular table with rows and columns; data
within each column has the same type (e.g.
number, text, logical), but different columns may
have different types.
Example:
> cw = chickwts
> cw
weight feed
1 179 horsebean
11 309 linseed
23 243 soybean
37 423 sunflower
...
Data Frames 1

 A data frame is a list with class “[Link]”. There are

restrictions on lists that may be made into data frames.
 a. The components must be vectors (numeric,
character, or logical), factors, numeric matrices, lists, or
other data frames.
 b. Matrices, lists, and data frames provide as many
variables to the new data frame as they have columns,
elements, or variables, respectively.
 c. Numeric vectors and factors are included as is, and
non-numeric vectors are coerced to be factors, whose
levels are the unique values appearing in the vector.
 d. Vector structures appearing as variables of the data
frame must all have the same length, and matrix structures
must all have the same row size.
Subsetting
Individual elements of a vector, matrix, array or data frame
are accessed with “[ ]” by specifying their index, or their
name
> cw = chickwts
> cw
weight feed
1 179 horsebean
11 309 linseed
23 243 soybean
37 423 sunflower
...

> cw [3,2]
[1] horsebean
6 Levels: casein horsebean linseed ... sunflower
> cw [3,]
weight feed
3 136 horsebean
Subsetting in data frames

 an = Animals
 an

body brain
Mountain beaver 1.350 8.1
Cow 465.000 423.0
Grey wolf 36.330 119.5
> an [3,]
body brain
Grey wolf 36.33 119.5
Labels in data frames
> labels (an)
[[1]]
[1] "Mountain beaver" "Cow"
[3] "Grey wolf" "Goat"
[5] "Guinea pig" "Dipliodocus"
[7] "Asian elephant" "Donkey"
[9] "Horse" "Potar monkey"
[11] "Cat" "Giraffe"
[13] "Gorilla" "Human"
[15] "African elephant" "Triceratops"
[17] "Rhesus monkey" "Kangaroo"
[19] "Golden hamster" "Mouse"
[21] "Rabbit" "Sheep"
[23] "Jaguar" "Chimpanzee"
[25] "Rat" "Brachiosaurus"
[27] "Mole" "Pig"

[[2]]
[1] "body" "brain"
Control structures and
functions
Grouped expressions in R

x = 1:9

if (length(x) <= 10)

{
x <- c(x,10:20);
print(x)
}
else
{
print(x[1])
}
Loops in R

>for(i in 1:10) {
x[i] <- rnorm(1)
}
j=1
while( j < 10) {
print(j)
j <- j + 2
}
Functions
 Functions do things with data
 “Input”: function arguments (0,1,2,…)
 “Output”: function result (exactly one)

 Example:
add = function(a,b)
{ result = a+b
return(result) }

 Operators:
 Short-cut writing for frequently used functions of one or two
arguments.
 Examples: + - * / ! & | %%
General Form of Functions

function(arguments) {
expression
}

larger <- function(x,y) {

if(any(x < 0)) return(NA)
[Link] <- y > x
x[[Link]] <- y[[Link]]
x
}
Functions inside functions

120
100
80
dist

60
40
20
0
> attach(cars) 5 10 15

speed
20 25

> plot(speed,dist)
> x <- seq(min(speed),max(speed),length=1000)
> newdata=[Link](speed=x)))
> lines(x,predict(lm(dist~speed),newdata))

> rm(x) ; detach()

If you are in doubt...

> help (predict)

'predict' is a generic function for predictions from
the results
of various model fitting functions.
> help ([Link])
'[Link]' produces predicted values, obtained
by evaluating the
regression function in the frame 'newdata'
> predict(lm(dist~speed),newdata)
Calling Conventions for Functions

 Arguments may be specified in the same order in

which they occur in function definition, in which
case the values are supplied in order.
 Arguments may be specified as name=value, when
the order in which the arguments appear is
irrelevant.
 Above two rules can be mixed.

> [Link](x1, y1, [Link]=F, [Link]=.99)

> [Link]([Link]=F, [Link]=.99, x1, y1)
Missing Arguments

 R function can handle missing arguments two

ways:

 either by providing a default expression in the

argument list of definition, or

 by testing explicitly for missing arguments.

Missing Arguments in Functions

> add <- function(x,y=0){x + y}

> add(4)

> add <- function(x,y){

if(missing(y)) x
else x+y
}
> add(4)
Variable Number of Arguments

 The special argument name “…” in the function

definition will match any number of arguments in
the call.

 nargs() returns the number of arguments in the

current call.
Variable Number of Arguments

> [Link] <- function(…) mean(c(…))

> [Link](1:10,20:100,12:14)

> [Link] <- function(…)

{
means <- numeric()
for(x in list(…)) means <- c(means,mean(x))
mean(means)
}
Variable Number of Arguments

> seq(2,12,by=2)
[1] 2 4 6 8 10 12
> seq(4,5,length=5)
[1] 4.00 4.25 4.50 4.75 5.00
> rep(4,10)
[1] 4 4 4 4 4 4 4 4 4 4

> paste("V",1:5,sep="")
[1] "V1" "V2" "V3" "V4" "V5"

> LETTERS[1:7]
[1] "A" "B" "C" "D" "E" "F" "G"
Mathematical operation

Opérations usuelles : + - * /
Puissances: 2^5 ou bien 2**5
Divisions entières: %/%
Modulus: %% (7%%5 gives 2)

Fonctions standards: abs(), sign(), log(), log10(), sqrt(),

exp(), sin(), cos(), tan()
gamma(), lgamma(), choose()

Pour arrondir: round(x,3) arrondi à 3 chiffres après la virgule

Et aussi: floor(2.5) donne 2, ceiling(2.5) donne 3

Vector functions
> vec <- c(5,4,6,11,14,19)
> sum(vec)
[1] 59 And also: min() max()
> prod(vec) cummin() cummax()
[1] 351120 range()
> mean(vec)
[1] 9.833333
> median(vec)
[1] 8.5
> var(vec)
[1] 34.96667
> sd(vec)
[1] 5.913262
> summary(vec)
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.000 5.250 8.500 9.833 13.250 19.000
Des fonctions logiques

R contient deux valeurs logiques: TRUE (ou T) et FALSE (ou F).

Exemple: == exactement égal
< plus petit
> 3 == 4 > plus grand
[1] FALSE <= plus petit ou égal
> 4 > 3 >= plus grand ou égal
[1] TRUE != différent
& “et” (“and”)
> x <- -4:3 | “ou” (“or”)
> x > 1
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
> sum(x[x>1])
[1] 5
Notez la différence !
> sum(x>1)
[1] 2
Graphics in R
Plot()

 If x and y are vectors, plot(x,y) produces a

scatterplot of x against y.
 plot(x) produces a time series plot if x is a
numeric vector or time series object.
 plot(df), plot(~ expr), plot(y ~ expr), where df is a
data frame, y is any object, expr is a list of object
names separated by `+' (e.g. a + b + c).
 The first two forms produce distributional plots of
the variables in a data frame (first form) or of a
number of named objects (second form). The
third form plots y against every object named in
expr.
Graphics with plot()

> plot(rnorm(100),rnorm(100))

The function rnorm()

Simulates a random normal

distribution .
2
1
rnorm(100)

Help ?rnorm,
and ?runif,
0

?rexp,
-1

?binom, ...
-2

-3 -2 -1 0 1 2
rnorm(100)
Graphics with plot()
> x <- seq(-2*pi,2*pi,length=100)
> y <- sin(x)

> par(mfrow=c(2,2)) Une Ligne

1.0

1.0
> plot(x,y,xlab="x”,

0.5

0.5
ylab="Sin x")

Sinus de x

0.0

0.0
y
-0.5

-0.5
> plot(x,y,type= "l",

-1.0

-1.0
main=“A Line") -6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6

x x

> plot(x[seq(5,100,by=5)],

1.0
y[seq(5,100,by=5)],

0.5
y[seq(5, 100, by = 5)]

type= "b",axes=F)

0.0
y

-1.0
> plot(x,y,type="n",

-2.0
ylim=c(-2,1) -6 -4 -2 0 2 4 6

> par(mfrow=c(1,1))
x[seq(5, 100, by = 5)] x
Graphical Parameters of plot()

type = “c”: c =p (default), l, b,s,o,h,n.

pch=“+” : character or numbers 1 – 18
lty=1 : numbers
lwd=2 : numbers
axes = “L”: L= F, T
xlab =“string”, ylab=“string”
sub = “string”, main =“string”
xlim = c(lo,hi), ylim= c(lo,hi)
And some more.
Graphical Parameters of plot()

x <- 1:10
y <- 2*x + rnorm(10,0,1)
plot(x,y,type=“p”) #Try l,b,s,o,h,n
# axes=T, F
# xlab=“age”, ylab=“weight”
# sub=“sub title”, main=“main title”
# xlim=c(0,12), ylim=c(-1,12)
Other graphical functions
8
6

 locator(n,type=“p”) :Waits for the user to select

locations on the current plot using the left mouse
button. This continues until n (default 500) points
have been selected.

 identify(x, y, labels) : Allow the user to highlight

any of the points defined by x and y.

 text(x,y,”Hey”): Write text at coordinate x,y.

Plots for Multivariate Data

pairs(stack.x)
x <- 1:20/20
y <- 1:20/20
z <-
outer(x,y,function(a,b){cos(10*a*b)/(1+a*b^2)})
contour(x,y,z)
persp(x,y,z)
image(x,y,z)
Other graphical functions

> axis(1,at=c(2,4,5), Axis details (“ticks”, légende, …)

legend("A","B","C")) Use xaxt="n" ou yaxt="n" inside
plot()

> lines(x,y,…) Line plots

> abline(lsfit(x,y)) Add an adjustment

> abline(0,1) add a line of slope 1 and intercept 0

> legend(locator(1),…) Legends: very flexible

Histogram

 A histogram is a special kind of bar plot

 It allows you to visualize the distribution of values for a
numerical variable
 When drawn with a density scale:
 the AREA (NOT height) of each bar is the proportion of
observations in the interval
 the TOTAL AREA is 100% (or 1)
R: making a histogram

 Type ?hist to view the help file

 Note some important arguments, esp breaks
 Simulate some data, make histograms varying the number of
bars (also called ‘bins’ or ‘cells’), e.g.
> par(mfrow=c(2,2)) # set up multiple plots
> simdata <-rchisq(100,8)
> hist(simdata) # default number of bins
> hist(simdata,breaks=2) # etc,4,20
R: setting your own breakpoints

> bps <- c(0,2,4,6,8,10,15,25)

> hist(simdata,breaks=bps)
Scatterplot

 A scatterplot is a standard two-dimensional (X,Y)

plot
 Used to examine the relationship between two
(continuous) variables
 It is often useful to plot values for a single
variable against the order or time the values were
obtained
R: making a scatterplot

 Type ?plot to view the help file

 For now we will focus on simple plots, but R allows
extensive user control for highly customized plots
 Simulate a bivariate data set:
> z1 <- rnorm(50)
> z2 <- rnorm(50)
> rho <- .75 # (or any number between –1 and 1)
> x2<- rho*z1+sqrt(1-rho^2)*z2
> plot(z1,x2)
Statistical functions
Lots of statistical functions

0.4
2
1  x  
1   

0.3
Normal distr 2  
f ( x |  , )  e

dnorm(x)
 2

0.2
PDF in point 2

0.1
> dnorm(2,mean=1,sd=2)
[1] 0.1760327 for X ~ N(1,4)

0.0
-4 -2 0 2 4

> qnorm(0.975) Quantile for

[1] 1.959964 the 0.975 for N~ (0,1)

> pnorm(c(2,3),mean=2) = P(X<2) and P(X<3), where X ~ N(2,1)

[1] 0.5000000 0.8413447
> [Link] <- rnorm(1000) Pseudo-random normally distributed numbers
> summary([Link])
Min. 1st Qu. Median Mean 3rd Qu. Max.
-3.418 -0.6625 -0.0429 -0.01797 0.6377 3.153
> sd([Link])
[1] 0.9881418
How to remember functions
For a normal distribution, the root is norm. Then add the letters
d density ( dnorm() )
p probability( pnorm() )
q quantiles ( qnorm() )
r pseudo-random ( rnorm() )

Distribution Root Argument

normal norm mean, sd, log
t (Student) t df, log
uniform unif min, max, log
F (Fisher) f df1, df2
2
chisq df, ncp, log
Binomial binom size, prob, log
exponential exp rate, log
Poisson pois lambda, log
...
Hypotheses tests

[Link]()
Student (test t), determines if the averages of two populations are
statistically different.

[Link]()
hypothesis tests with proportions

Non-parametrical tests

[Link]() Kruskal-Wallis test (variance analysis)

[Link]() 2 test for convergence
[Link]() Kolmogorov-Smirnov test
...
Statistical models
Linear regression: lsfit() et lm()

lsfit() adjusment of regression models

Example – inclination of the Pisa tower

> year <- 75:87

> inclin <- c(642,644,656,667,673,688, 642 corresponds to
696,698,713,717,725,742,757) 2.9642m, the distance of a
point from its current position
> plot(year,inclin) and a vertical tower
> [Link] <- lsfit(year,inclin)
> [Link]([Link])
Residual Standard Error=4.181
R-Square=0.988
F-statistic (df=1, 11)=904.1198
p-value=0

Estimate [Link] t-value Pr(>|t|)

Intercept -61.1209 25.1298 -2.4322 0.0333
X 9.3187 0.3099 30.0686 0.0000

> abline([Link])
Data and models for Pisa tower

760
740
720
inclinaison

700
680
660
640

76 78 80 82 84 86

annee
Using lm() instead of lsfit()
> [Link] <- lm(inclin ~ year)
> [Link]
> summary([Link])
Call:
lm(formula = inclin ~ year)

Residuals:
Min 1Q Median 3Q Max
-5.9670 -3.0989 0.6703 2.3077 7.3956

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -61.1209 25.1298 -2.432 0.0333 *
year 9.3187 0.3099 30.069 6.5e-12 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Residual standard error: 4.181 on 11 degrees of freedom

Multiple R-Squared: 0.988, Adjusted R-squared: 0.9869
F-statistic: 904.1 on 1 and 11 DF, p-value: 6.503e-012
Generic functions ...
> residuals([Link])
1 2 3 4 ...
4.21978 -3.098901 -0.4175824 1.263736 ...

> fitted([Link])
1 2 3 4 ...
637.7802 647.0989 656.4176 665.7363 ...

> coef([Link])
(Intercept) annee
-61.12088 9.318681

> par(mfrow=c(2,3))
> plot([Link]) #cf. prochaine page
Diagnostics

Residuals vs Fitted Normal Q-Q plot

13 13

2
5

Standardized residuals

1
Residuals

0
-1
-5

8
8
11
11

640 660 680 700 720 740 -1.5 -0.5 0.0 0.5 1.0 1.5

Fitted values Theoretical Quantiles

Scale-Location plot Cook's distance plot

13
13
11 0.8
Standardized residuals

8
1.0

0.6
Cook's distance

0.4
0.5

1
11
0.2
0.0

0.0

640 660 680 700 720 740 2 4 6 8 10 12

Fitted values Obs. number

Multiple regression
> wheat <- [Link](yield=c(210,110,103,103,1,76,73,70,68,53,45,31),
temp=c(16.7,17.4,18.4,16.8,18.9,17.1,17.3,
18.2,21.3,21.2,20.7,18.5),
sunexp=c(30,42,47,47,43,41,48,44,43,50,56,60))

> [Link] <- lm(yield~temp+sunexp,

data=wheat)
> summary([Link],cor=F)
Residuals:
Min 1Q Median 3Q Max
-85.733 -11.117 6.411 16.476 53.375

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 420.660 131.292 3.204 0.0108 *
temperature -8.840 7.731 -1.143 0.2824
ensoleillement -3.880 1.704 -2.277 0.0488 *
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
...
Multiple regression diagnostics

> par(mfrow=c(2,2)) Residuals vs Fitted Normal Q-Q plot

> plot([Link])

2
1 1

Standardized residuals

1
Residuals

0
6

-50
6

-1
-2
5

-100
5

20 40 60 80 100 120 140 160 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

Fitted values Theoretical Quantiles

Scale-Location plot Cook's distance plot

1.5

1.2
1
1
Standardized residuals

1.0
Cook's distance
1.0

0.8
6

0.6
0.5

0.4
5

0.2
11
0.0

0.0

20 40 60 80 100 120 140 160 2 4 6 8 10 12

Fitted values Obs. number

Hash tables
hash tables

 In vectors, lists, dataframes, arrays, elements are stored

one after another, and are accessed in that order by their
offset (or: index), which is an integer number.

 Sometimes, consecutive integer numbers are not the

“natural” way to access: e.g., gene names, oligo sequences

 E.g., if we want to look for a particular gene name in a long

list or data frame with tens of thousands of genes, the
linear search may be very slow.

 Solution: instead of list, use a hash table. It sorts, stores

and accesses its elements in a way similar to a telephone
book.
hash tables
In R, a hash table is the same as a workspace for
variables, which is the same as an environment.

> tab = [Link](hash=T)

> assign("btk", list(cloneid=682638,

fullname="Bruton agammaglobulinemia tyrosine kinase"),
env=tab)

> ls(env=tab)
[1] "btk"

> get("btk", env=tab)

$cloneid
[1] 682638
$fullname
[1] "Bruton agammaglobulinemia tyrosine kinase"
Object orientation
Object orientation
.
 primitive (or: atomic) data types in R are:

 numeric (integer, double, complex)

 character
 logical
 function
 out of these, vectors, arrays, lists can be built
Object orientation

 Object: a collection of atomic variables and/or

other objects that belong together
 Example: a microarray experiment
 probe intensities
 patient data (tissue location, diagnosis, follow-up)
 gene data (sequence, IDs, annotation)

 Parlance:
 class: the “abstract” definition of it
 object: a concrete instance
 method: other word for ‘function’
 slot: a component of an object
Object orientation advantages

 Encapsulation (can use the objects and methods

someone else has written without having to care
about the internals)

 Generic functions (e.g. plot, print)

 Inheritance (hierarchical organization of

complexity)
Object orientation
library('methods')
setClass('microarray', ## the class definition
representation( ## its slots
qua = 'matrix',
samples = 'character',
probes = 'vector'),
prototype = list( ## and default values
qua = matrix(nrow=0, ncol=0),
samples = character(0),
probes = character(0)))

dat = [Link]('../data/alizadeh/[Link]')
z = cbind(dat$CH1I, dat$CH2I)

setMethod('plot', ## overload generic function ‘plot’

signature(x='microarray'), ## for this new class
function(x, ...)
plot(x@qua, xlab=x@samples[1], ylab=x@samples[2], pch='.', log='xy'))

ma = new('microarray', ## instantiate (construct)

qua = z,
samples = c('brain','foot'))
plot(ma)
Object orientation in R

The plot([Link]) command is different from

plot(year,inclin) .

plot([Link])
R recognizes that [Link] is a “lm” object.
Uses [Link]() .

Most R functions are object-oriented.

For more details see ?methods and ?class

Importing/Exporting Data

 Importing data
 R can import data from other applications
 Packages are available to import microarray data, Excel
spreadsheets etc.
 The easiest way is to import tab delimited files
> [Link]<-[Link]("file",sep=",") *)
> SimpleData <- [Link](file =
"[Link] header = TRUE,
quote = "", sep = "\t", [Link]="")
 Exporting data
 R can also export data in various formats
 Tab delimited is the most common
> [Link](x, "filename") *)

*) make sure to include the path or

to first change the working directory
Getting help… and quitting

 Getting information about a specific command

> help(rnorm)
> ?rnorm
 Finding functions related to a key word
> [Link]("boxplot")
 Starting the R installation help pages
> [Link]()
 Quitting R
> q()
Getting help
Details about a specific command whose name you
know (input arguments, options, algorithm, results):

>? [Link]

>help([Link])
Resources

 Books  Online documentation

 Assigned text book  R Project documentation (
 For an extended list visit [Link]
[Link]  Manuals
ib/[Link]  FAQs

 …
 Mailing lists  Bioconductor documentation (
 R-help [Link]
(  Vignettes
[Link]
 Short Courses
html
)  …

 Bioconductor (  Google
[Link]
[Link]
)
 However, first
 read the posting guide/

general instructions and

 search archives

Da Session 4
No ratings yet
Da Session 4
75 pages
Introduction To R
No ratings yet
Introduction To R
21 pages
R Statistical Package
No ratings yet
R Statistical Package
63 pages
R Programming: © 2016 SMART Training Resources Pvt. LTD
No ratings yet
R Programming: © 2016 SMART Training Resources Pvt. LTD
28 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
Understanding the abs() Function in R
No ratings yet
Understanding the abs() Function in R
34 pages
R Programming Essentials
No ratings yet
R Programming Essentials
27 pages
R Network Analysis with igraph Guide
No ratings yet
R Network Analysis with igraph Guide
62 pages
Lecture 1
No ratings yet
Lecture 1
42 pages
R Programming Basics: Vectors, Matrices, Dataframes
No ratings yet
R Programming Basics: Vectors, Matrices, Dataframes
13 pages
Unit 4
No ratings yet
Unit 4
27 pages
Introduction to R for Statistics
No ratings yet
Introduction to R for Statistics
56 pages
Introduction to R Basics and Data Types
No ratings yet
Introduction to R Basics and Data Types
33 pages
Mod 2 Summary Table
No ratings yet
Mod 2 Summary Table
16 pages
R Programming: Data Analysis Guide
No ratings yet
R Programming: Data Analysis Guide
61 pages
Statistics With R Unit 1: Divya Arun Kumar
No ratings yet
Statistics With R Unit 1: Divya Arun Kumar
65 pages
2 Undefined
No ratings yet
2 Undefined
86 pages
M2 Dar
No ratings yet
M2 Dar
46 pages
STATS LAB Basics of R PDF
No ratings yet
STATS LAB Basics of R PDF
77 pages
R Reference Guide for Programmers
No ratings yet
R Reference Guide for Programmers
6 pages
Introduction to R Programming
No ratings yet
Introduction to R Programming
34 pages
R Reference Card
No ratings yet
R Reference Card
6 pages
Introduction to R Programming Basics
No ratings yet
Introduction to R Programming Basics
96 pages
N2 Data in R
No ratings yet
N2 Data in R
7 pages
Week 1-B. Data in R
No ratings yet
Week 1-B. Data in R
5 pages
R Programming-Chapiter 4
No ratings yet
R Programming-Chapiter 4
16 pages
R Study Material I
No ratings yet
R Study Material I
8 pages
Week3 2020
No ratings yet
Week3 2020
20 pages
CH 03
No ratings yet
CH 03
42 pages
Introduction To R
No ratings yet
Introduction To R
91 pages
R Programming
No ratings yet
R Programming
41 pages
Data Analytic R
No ratings yet
Data Analytic R
28 pages
Chap 1
No ratings yet
Chap 1
105 pages
R Programming Materials
No ratings yet
R Programming Materials
51 pages
R Basics for Economics Students
No ratings yet
R Basics for Economics Students
7 pages
R Lab Record 2024
No ratings yet
R Lab Record 2024
35 pages
R Data Structures Guide
No ratings yet
R Data Structures Guide
10 pages
Introdution To R - Network Analysis - Practical 1 - Sacha Epskamp - University of Amsterdam, 2013
No ratings yet
Introdution To R - Network Analysis - Practical 1 - Sacha Epskamp - University of Amsterdam, 2013
34 pages
Teaching R
No ratings yet
Teaching R
15 pages
R Programming Checklist of Basic Skills With Examples
No ratings yet
R Programming Checklist of Basic Skills With Examples
33 pages
Tutorial 1
No ratings yet
Tutorial 1
29 pages
MLlab 5 TH
No ratings yet
MLlab 5 TH
17 pages
A Crash Course in R - Intro To Statistical Programming
No ratings yet
A Crash Course in R - Intro To Statistical Programming
53 pages
R Programming
No ratings yet
R Programming
22 pages
R Notes 1 Draft
No ratings yet
R Notes 1 Draft
4 pages
R Notes 2
No ratings yet
R Notes 2
9 pages
Week 12 - Lecture Notes Special Matrices
No ratings yet
Week 12 - Lecture Notes Special Matrices
25 pages
Brief Introduction To R Kaustav Banerjee: Decision Sciences Area, IIM Lucknow
No ratings yet
Brief Introduction To R Kaustav Banerjee: Decision Sciences Area, IIM Lucknow
7 pages
First Course On R
No ratings yet
First Course On R
26 pages
R Programming
No ratings yet
R Programming
61 pages
R Programming: Vectors and Matrices Guide
No ratings yet
R Programming: Vectors and Matrices Guide
109 pages
R Lecture 2-1
No ratings yet
R Lecture 2-1
28 pages
Creating and Manipulating Objects
No ratings yet
Creating and Manipulating Objects
12 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
First Year Unit Test Ii Result Analysis-Odd Sem 2023-24
No ratings yet
First Year Unit Test Ii Result Analysis-Odd Sem 2023-24
131 pages
DIALux Evo Manual PDF
No ratings yet
DIALux Evo Manual PDF
101 pages
Inbound 991615664502972340
No ratings yet
Inbound 991615664502972340
8 pages
L9. Software Development and Testing
No ratings yet
L9. Software Development and Testing
26 pages
CDS Views
No ratings yet
CDS Views
14 pages
Console
No ratings yet
Console
64 pages
TCS Resume Format Guidelines
No ratings yet
TCS Resume Format Guidelines
3 pages
Employee Database Constraints Guide
No ratings yet
Employee Database Constraints Guide
11 pages
Prashant CV
No ratings yet
Prashant CV
1 page
Telecom Evolution for Engineers
No ratings yet
Telecom Evolution for Engineers
12 pages
Applications of Assignment Problems
No ratings yet
Applications of Assignment Problems
36 pages
2nd Sem. Summative Test Q1-12 WEEK 3 & 4
No ratings yet
2nd Sem. Summative Test Q1-12 WEEK 3 & 4
6 pages
Rip Ospf BGP
No ratings yet
Rip Ospf BGP
2 pages
Project Report Designing Position Control System: Abu Dhabi University
No ratings yet
Project Report Designing Position Control System: Abu Dhabi University
13 pages
Java Basics - 50 MCQs With Answers
No ratings yet
Java Basics - 50 MCQs With Answers
21 pages
DL Lab Manual
No ratings yet
DL Lab Manual
21 pages
Proctored Learning Eligible Student List With Highlighted Entry
No ratings yet
Proctored Learning Eligible Student List With Highlighted Entry
48 pages
UDC2500 Universal Digital Controller Product Manual: Industrial Measurement and Control
No ratings yet
UDC2500 Universal Digital Controller Product Manual: Industrial Measurement and Control
228 pages
User Manual ReECR v3.0
No ratings yet
User Manual ReECR v3.0
33 pages
Mancilla Gallardo - Artes Visuales 3
0% (1)
Mancilla Gallardo - Artes Visuales 3
130 pages
MS 900 Questions
No ratings yet
MS 900 Questions
5 pages
01 - 07 PHP Nguyen Trung Hieu TopCV - VN 010721 - 175208
No ratings yet
01 - 07 PHP Nguyen Trung Hieu TopCV - VN 010721 - 175208
2 pages
Multitech MTCDTIP 267A 868 - Hardware Guide
No ratings yet
Multitech MTCDTIP 267A 868 - Hardware Guide
26 pages
Dipper PT I HydroProfiler Basekit
No ratings yet
Dipper PT I HydroProfiler Basekit
3 pages
Vatican Knight & Cyber Criminal - World Crime Syndicate
No ratings yet
Vatican Knight & Cyber Criminal - World Crime Syndicate
3 pages
RTD Isolated Safety Barrier Overview
No ratings yet
RTD Isolated Safety Barrier Overview
1 page
Diggity SEO On Site SEO Guide v1.17
No ratings yet
Diggity SEO On Site SEO Guide v1.17
38 pages
File Installation Key Matlab R2011a 134 PDF
No ratings yet
File Installation Key Matlab R2011a 134 PDF
3 pages
MG2100 MG3100 MG4100series Simplified Manual E QY8-13DL-000
No ratings yet
MG2100 MG3100 MG4100series Simplified Manual E QY8-13DL-000
31 pages
All Fonts
No ratings yet
All Fonts
55 pages

Introduction To R

Uploaded by

Introduction To R

Uploaded by

R – a brief

 Statistical programming language S developed at

 data handling and storage: numeric, textual

 is not a database, but connects to DBMSs

 Packaging: a crucial infrastructure to efficiently

 Statistics: most packages deal with statistics and

 State of the art: many statistical researchers

 To obtain and install R on your computer

 Your R objects are stored in a workspace

 must start with a letter (A-Z or a-z)

 can contain letters, digits (0-9), and/or periods “.”

 do not use use underscore “_”

 “<-” used to indicate assignment

 note: as of version 1.4 “=“ is also a valid assignment

> plot(sin(seq(0, 2*pi, length=100)))

 Mean intensities of all genes on an oligonucleotide

 Vector: Ordered collection of data of the same

> Mydata <- c(2,3.5,-0.2) Vector (c=“concatenate”)

> x1 <- 25:30

> Mydata > 0  Test on the elements

> Mydata[Mydata>0]  Extract the positive

> sort(x) Sorting a vector

> rev(x) Reverse a vector

> x <- c(3,-1,2,0,-3,6)

> [Link] <- matrix(x,ncol=2,

> [Link][,2] 2nd col

> [Link][c(1,3),] 1st and 3rd lines

> [Link][-2,] No 2nd line

> [Link] %*% t([Link]) Multiplication

[,1] [,2] [,3]

> solve() Inverse of a square matrix

 R is designed to handle statistical data and

 It is often necessary to extract a subset of a

list: an ordered collection of data of arbitrary types.

Typically, vector elements are accessed by their

 A list is an object consisting of objects called

 The names of components may be abbreviated

> [Link] <- list(c1=c(5,4,-1),c2=c("X1","X2","X3"))

Empl <- list(employee=“Anna”, spouse=“Fred”, children=3,

> dimnames([Link]) <- list(c("L1","L2","L3"),

 A data frame is a list with class “[Link]”. There are

if (length(x) <= 10)

larger <- function(x,y) {

> rm(x) ; detach()

> help (predict)

 Arguments may be specified in the same order in

> [Link](x1, y1, [Link]=F, [Link]=.99)

 R function can handle missing arguments two

 either by providing a default expression in the

 by testing explicitly for missing arguments.

> add <- function(x,y=0){x + y}

> add <- function(x,y){

 The special argument name “…” in the function

 nargs() returns the number of arguments in the

> [Link] <- function(…) mean(c(…))

> [Link] <- function(…)

[Link] <- function(…)

Fonctions standards: abs(), sign(), log(), log10(), sqrt(),

Pour arrondir: round(x,3) arrondi à 3 chiffres après la virgule

Et aussi: floor(2.5) donne 2, ceiling(2.5) donne 3

R contient deux valeurs logiques: TRUE (ou T) et FALSE (ou F).

 If x and y are vectors, plot(x,y) produces a

The function rnorm()

Simulates a random normal

> par(mfrow=c(2,2)) Une Ligne

type = “c”: c =p (default), l, b,s,o,h,n.

 locator(n,type=“p”) :Waits for the user to select

 identify(x, y, labels) : Allow the user to highlight

 text(x,y,”Hey”): Write text at coordinate x,y.

> axis(1,at=c(2,4,5), Axis details (“ticks”, légende, …)

> lines(x,y,…) Line plots

> abline(lsfit(x,y)) Add an adjustment

> legend(locator(1),…) Legends: very flexible

 A histogram is a special kind of bar plot

 Type ?hist to view the help file

> bps <- c(0,2,4,6,8,10,15,25)