Introduction To R
Introduction To R
introduction
Original material
Johannes Freudenberg
Cincinnati Children’s Hospital Medical Center
Marcel Baumgartner
Nestec S.A.
Jaeyong Lee
Penn State University
Jennifer Urbano Blackford, Ph.D
Department of Psychiatry, Kennedy Center
Wolfgang Huber
History of R
R is
a programming language
a statistical package
an interpreter
Open Source
R is not
a database
a collection of “black boxes”
a spreadsheet software package
commercially supported
What R is
names
types of objects: vector, factor, array, matrix,
[Link], ts, list
attributes
mode: numeric, character, complex, logical
length: number of elements in object
creation
assign a value
create a blank object
Naming Convention
case-sensitive
mydata different from MyData
x<-c(1,2,3,4,5,6,7)
x<-c(1:7)
x<-1:4
> 5 + (6 + 7) * pi^2
[1] 133.3049
> log(exp(1))
[1] 1
> log(1000, 10)
[1] 3
> sin(pi/3)^2 + cos(pi/3)^2
[1] 1
> Sin(pi/3)^2 + cos(pi/3)^2
Error: couldn't find function "Sin"
R as a calculator
1.0
> log2(32)
[1] 5
0.5
sin(seq(0, 2 * pi, length = 100))
> sqrt(2)
0.0
[1] 1.414214
-0.5
> seq(0, 5, length=6)
[1] 0 1 2 3 4 5 -1.0
0 20 40 60 80 100
Index
Logical Character
> x <- T; y <- F > a <- "1"; b <- 1
> x; y > a; b
[1] TRUE [1] "1"
[1] 1
[1] FALSE
> a <- "character"
Numerical
> b <- "a"; c <- a
> a <- 5; b <- sqrt(2) > a; b; c
> a; b [1] "character"
[1] 5 [1] "a"
[1] 1.414214 [1] "character"
Vectors, Matrices, Arrays
Vector
Ordered collection of data of the same data type
Example:
last names of all students in this class
microarray
In R, single number is a vector of length 1
Matrix
Rectangular table of data of the same type
Example
Mean intensities of all genes measured during a microarray
experiment
Array
Higher dimensional matrix
Vectors
> Colors[2]
[1] "Green" One element
> x1[3:5]
[1] 27 28 29 Various elements
Operation on vector elements
> Mydata
[1] 2 3.5 -0.2
> Mydata[-c(1,3)]
[1] 3.5 Remove elements
Vector operations
> x <- c(5,-2,3,-7)
> y <- c(1,2,3,4)*10 Operation on all the elements
>y
[1] 10 20 30 40
> order(x)
[1] 4 2 3 1 Element order for sorting
> y[order(x)]
[1] 40 20 30 10 Operation on all the components
[,1] [,2]
[1,] 3 -1
[2,] -3 6
[,1] [,2]
[1,] 3 -1
[2,] -3 6
Dealing with matrices
> dim([Link]) Dimension
[1] 3 2
> t([Link]) Transpose
[,1] [,2] [,3]
[1,] 3 2 -3
[2,] -1 0 6
[[2]]:
[1] "X1" "X2" "X3"
> [Link][[1]]
[1] 5 4 -1
> [Link]
[,1] [,2]
[1,] 3 -1
[2,] 2 0
[3,] -3 6
> [Link]
R1 R2
L1 3 -1
L2 2 0
L3 -3 6
Data frames
data frame: represents a spreadsheet.
Rectangular table with rows and columns; data
within each column has the same type (e.g.
number, text, logical), but different columns may
have different types.
Example:
> cw = chickwts
> cw
weight feed
1 179 horsebean
11 309 linseed
23 243 soybean
37 423 sunflower
...
Data Frames 1
> cw [3,2]
[1] horsebean
6 Levels: casein horsebean linseed ... sunflower
> cw [3,]
weight feed
3 136 horsebean
Subsetting in data frames
an = Animals
an
body brain
Mountain beaver 1.350 8.1
Cow 465.000 423.0
Grey wolf 36.330 119.5
> an [3,]
body brain
Grey wolf 36.33 119.5
Labels in data frames
> labels (an)
[[1]]
[1] "Mountain beaver" "Cow"
[3] "Grey wolf" "Goat"
[5] "Guinea pig" "Dipliodocus"
[7] "Asian elephant" "Donkey"
[9] "Horse" "Potar monkey"
[11] "Cat" "Giraffe"
[13] "Gorilla" "Human"
[15] "African elephant" "Triceratops"
[17] "Rhesus monkey" "Kangaroo"
[19] "Golden hamster" "Mouse"
[21] "Rabbit" "Sheep"
[23] "Jaguar" "Chimpanzee"
[25] "Rat" "Brachiosaurus"
[27] "Mole" "Pig"
[[2]]
[1] "body" "brain"
Control structures and
functions
Grouped expressions in R
x = 1:9
>for(i in 1:10) {
x[i] <- rnorm(1)
}
j=1
while( j < 10) {
print(j)
j <- j + 2
}
Functions
Functions do things with data
“Input”: function arguments (0,1,2,…)
“Output”: function result (exactly one)
Example:
add = function(a,b)
{ result = a+b
return(result) }
Operators:
Short-cut writing for frequently used functions of one or two
arguments.
Examples: + - * / ! & | %%
General Form of Functions
function(arguments) {
expression
}
120
100
80
dist
60
40
20
0
> attach(cars) 5 10 15
speed
20 25
> plot(speed,dist)
> x <- seq(min(speed),max(speed),length=1000)
> newdata=[Link](speed=x)))
> lines(x,predict(lm(dist~speed),newdata))
> seq(2,12,by=2)
[1] 2 4 6 8 10 12
> seq(4,5,length=5)
[1] 4.00 4.25 4.50 4.75 5.00
> rep(4,10)
[1] 4 4 4 4 4 4 4 4 4 4
> paste("V",1:5,sep="")
[1] "V1" "V2" "V3" "V4" "V5"
> LETTERS[1:7]
[1] "A" "B" "C" "D" "E" "F" "G"
Mathematical operation
Opérations usuelles : + - * /
Puissances: 2^5 ou bien 2**5
Divisions entières: %/%
Modulus: %% (7%%5 gives 2)
> plot(rnorm(100),rnorm(100))
Help ?rnorm,
and ?runif,
0
?rexp,
-1
?binom, ...
-2
-3 -2 -1 0 1 2
rnorm(100)
Graphics with plot()
> x <- seq(-2*pi,2*pi,length=100)
> y <- sin(x)
1.0
1.0
> plot(x,y,xlab="x”,
0.5
0.5
ylab="Sin x")
Sinus de x
0.0
0.0
y
-0.5
-0.5
> plot(x,y,type= "l",
-1.0
-1.0
main=“A Line") -6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6
x x
> plot(x[seq(5,100,by=5)],
1.0
y[seq(5,100,by=5)],
0.5
y[seq(5, 100, by = 5)]
type= "b",axes=F)
0.0
y
-1.0
> plot(x,y,type="n",
-2.0
ylim=c(-2,1) -6 -4 -2 0 2 4 6
> par(mfrow=c(1,1))
x[seq(5, 100, by = 5)] x
Graphical Parameters of plot()
x <- 1:10
y <- 2*x + rnorm(10,0,1)
plot(x,y,type=“p”) #Try l,b,s,o,h,n
# axes=T, F
# xlab=“age”, ylab=“weight”
# sub=“sub title”, main=“main title”
# xlim=c(0,12), ylim=c(-1,12)
Other graphical functions
8
6
See also: 4
2
10
0 5
-2
-10
0
-5
barplot() 0 -5
image() 10
-10
hist()
120
pairs()
100
persp()
80
piechart()
dist
60
polygon()
40
library(modreg) 20
[Link]()
0
5 10 15 20 25
speed
Interactive Graphics Functions
pairs(stack.x)
x <- 1:20/20
y <- 1:20/20
z <-
outer(x,y,function(a,b){cos(10*a*b)/(1+a*b^2)})
contour(x,y,z)
persp(x,y,z)
image(x,y,z)
Other graphical functions
0.4
2
1 x
1
0.3
Normal distr 2
f ( x | , ) e
dnorm(x)
2
0.2
PDF in point 2
0.1
> dnorm(2,mean=1,sd=2)
[1] 0.1760327 for X ~ N(1,4)
0.0
-4 -2 0 2 4
[Link]()
Student (test t), determines if the averages of two populations are
statistically different.
[Link]()
hypothesis tests with proportions
Non-parametrical tests
> abline([Link])
Data and models for Pisa tower
760
740
720
inclinaison
700
680
660
640
76 78 80 82 84 86
annee
Using lm() instead of lsfit()
> [Link] <- lm(inclin ~ year)
> [Link]
> summary([Link])
Call:
lm(formula = inclin ~ year)
Residuals:
Min 1Q Median 3Q Max
-5.9670 -3.0989 0.6703 2.3077 7.3956
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -61.1209 25.1298 -2.432 0.0333 *
year 9.3187 0.3099 30.069 6.5e-12 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
> fitted([Link])
1 2 3 4 ...
637.7802 647.0989 656.4176 665.7363 ...
> coef([Link])
(Intercept) annee
-61.12088 9.318681
> par(mfrow=c(2,3))
> plot([Link]) #cf. prochaine page
Diagnostics
2
5
Standardized residuals
1
Residuals
0
-1
-5
8
8
11
11
640 660 680 700 720 740 -1.5 -0.5 0.0 0.5 1.0 1.5
8
1.0
0.6
Cook's distance
0.4
0.5
1
11
0.2
0.0
0.0
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 420.660 131.292 3.204 0.0108 *
temperature -8.840 7.731 -1.143 0.2824
ensoleillement -3.880 1.704 -2.277 0.0488 *
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
...
Multiple regression diagnostics
2
1 1
50
Standardized residuals
1
Residuals
0
6
-50
6
-1
-2
5
-100
5
20 40 60 80 100 120 140 160 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
1.2
1
1
Standardized residuals
1.0
Cook's distance
1.0
0.8
6
0.6
0.5
0.4
5
0.2
11
0.0
0.0
> ls(env=tab)
[1] "btk"
Parlance:
class: the “abstract” definition of it
object: a concrete instance
method: other word for ‘function’
slot: a component of an object
Object orientation advantages
dat = [Link]('../data/alizadeh/[Link]')
z = cbind(dat$CH1I, dat$CH2I)
plot([Link])
R recognizes that [Link] is a “lm” object.
Uses [Link]() .
Importing data
R can import data from other applications
Packages are available to import microarray data, Excel
spreadsheets etc.
The easiest way is to import tab delimited files
> [Link]<-[Link]("file",sep=",") *)
> SimpleData <- [Link](file =
"[Link] header = TRUE,
quote = "", sep = "\t", [Link]="")
Exporting data
R can also export data in various formats
Tab delimited is the most common
> [Link](x, "filename") *)
>? [Link]
or
>help([Link])
Resources
…
Mailing lists Bioconductor documentation (
R-help [Link]
( Vignettes
[Link]
Short Courses
html
) …
Bioconductor ( Google
[Link]
[Link]
)
However, first
read the posting guide/