R IntroWeek 1Scott Chamberlain[modified from Haldre Rogers]September 9, 2011
Don’t just listen to me! Other Intros to R:http://www.stat.duke.edu/programs/gcc/ResourcesDocuments/RTutorial.pdfhttp://www.cyclismo.org/tutorial/R/http://www.r-tutor.com/r-introductionQuick R: http://www.statmethods.net/http://www.bioconductor.org/help/course-materials/2011/CSAMA/Monday/Morning%20Talks/R_intro.pdf
R user frameworksR from command line: OSX and PCJust type “R” into the command line – and have fun!R itselfhttp://www.r-project.org/RStudio – good choicehttp://www.rstudio.org/RevolutionR [free academic version] – this is sort of the SAS-ised version of Rhttp://www.revolutionanalytics.com/downloads/free-academic.phpUses proprietary .xdf file format that speeds up computation timesMany other ways to use R, including GUIs, other IDEs, and huge variety of text editorshttps://github.com/RatRiceEEB/RIntroCode/wiki/R-ResourcesIf you are afraid of the code interface, use Rattle, or R Commander, or Deducer, or Red RYou can learn using these interfaces what code does what after pressing buttons
R user frameworks, cont.R from PythonRPy: http://rpy.sourceforge.net/C from R: rcpp package:http://cran.r-project.org/web/packages/Rcpp/index.htmlhttp://dirk.eddelbuettel.com/code/rcpp.htmlCan hugely speed up computation times by writing R functions in C language. Then the function calls C to run instead of R.E.g., http://helmingstay.blogspot.com/2011/06/efficient-loops-in-r-complexity-versus.html& http://dirk.eddelbuettel.com/code/rcpp.examples.htmlExcel from RXLConnect package: http://cran.r-project.org/web/packages/XLConnect/index.htmlAnd more….see for yourself
R TipsR can crash  Do not use R’s built in text editor or solely write code in the R console. Instead use any text editor that integrates with R. See here for links: https://github.com/RatRiceEEB/RIntroCode/wiki/R-ResourcesWhen asking for help on listserves/help websites, use BRIEF and  REPRODUCIBLE examplesNot doing this makes people not want to help you!R automatically overwrites files with the same file name!!!!Make sure you want to overwrite a file before doing so
Style
Not this kind of style…
This kind of style!!!
StyleStyle is important so YOU and OTHERS can read your code and actually use itGoogle style guide: http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html#generallayoutHenrik Bengtsson style guide: http://www1.maths.lth.se/help/R/RCC/Hadley Wickham's style guide: https://github.com/hadley/devtools/wiki/Style
Preparing your data for RWhat makes clean data?Correct spellingIdentical capitalization (e.g. Premna vspremna)If myvector <- c(3, 4, 5), calling Myvector does not work!No spaces between words (spaces turned into “.”)Generally try to avoid, use underscores insteadNA or blank (if using csv) for missing valuesFind and replace to get rid of spaces after wordsI generally keep an .xls and a .csv file so you can always recreate work in R with the .csv file and still modify the .xls file
Bringing data into RCreate csv fileOne worksheet onlyNo special formatting, filters, comments etc.Copy only columns and rows with your data to the CSV, as R will read in columns without data sometimesName your variables well self-explanatory, unique, lowercase, short-ish, one-word namesIn R, set the working directorysetwd("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro")What is the working directory? getwd()What is in the working directory? dir()Read in dataCSV files: iris.df <- read.csv("iris_df.csv", header=T)Clipboard: read.csv("clipboard")- reads in file like cutting and pasting itFrom web: read.csv("http://explore.data.gov/download/pwaj-zn2n/CSV")From excel files: (using the XLConnect package)iris.df <- readWorksheetFromFile("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro/iris_df.xlsx", sheet=“Sheet1”)Write datawrite.csv(dataframe, “dataframename.csv”), ORsave(iris, “iris.RData”) [and load(“iris.RData”) to open in R]
R data structuresScalar:Object with a single value, either numeric or characterVector:Sequence of any values, including numeric, character, and NAList:Arbitrary collections of variables – very useful R objectCharacter:Text, e.g., “this is some text”Factor:Like character vectors, but only w/ values in predefined “levels”Matrix:Only numeric values allowedDataframe: Each column can be of a different classImmutable dataframe: special dataframe used in plyr package for faster dataframe manipulation, it references the original dataframe for faster calculationsFunctionEnvironment
Exploring dataframesstr(dataframe) gives column formats and dimensionshead(dataframe) and tail() give first and last 6 rowsnames(dataframe) gives column namesrow.names(dataframe) gives row namesattributes(dataframe) gives column and row names and object classsummary(dataframe) gives a lot of good informationMake sure variables are appropriate formCharacter/string, Numeric, Factor, Integer, logicalMake sure mins, maxs, means, etc. seem rightMake sure you don’t have typing errors so Premna and premna are two separate factorsUse: unique(iris$species) to see what all unique values of a column areOr use: levels(spider$species) to see different levels
To attach or not to attach…that is the questionSome like to use ‘attach’ to make dataframe variables accessible by name within the R session Generally, ‘attach’ is frowned upon by R junkies.  Use dataframe$y, or data=dataframe, or dataframe[,”y”], or dataframe[, 2]To detach the object, use: detach()  I recommend: do not use attach, but do what you want
R Packages3,262 packages!!!!Packages are extensions written by anyone for any purpose, usually loaded by:install.packages(”packagename”), thenrequire(packagename) or library()Use ?functionname for help on any function in base R or in R packagesIn RStudio, just press tab when in parentheses after the function name to see function options!!!Explore packages at the CRAN site:http://cran.r-project.org/web/packages/Inside-R package reference: http://www.inside-r.org/packages
Data manipulationPackages: plyr, data.table, doBY, sqldf, reshape2, and moreComparison of packagesModified from code from Recipes, scripts and Genomics blog: https://gist.github.com/878919data.table is by far the fastest!!! BUT, ease of use and flexibility may be plyr? See for yourself…Also, see examples in the tutorial code for reshape2 package for neat data manipulation tricks
VisualizationsA few different approaches:Base graphicsLattice graphicsGrid graphicsggplot2 graphicsFurther reading: http://www.slideshare.net/dataspora/a-survey-of-r-graphicsAn example:
more on ggplot2 graphicsThere are classes taught by Hadley Wickham here at Rice if you want to learn more!Data visualization (Stat645): http://had.co.nz/stat645/Statistical computing (Stat405): http://had.co.nz/stat405/Hadley’s website is really helpful: http://had.co.nz/ggplot2/The ggplot2 google groups site: https://groups.google.com/forum/#!forum/ggplot2
QUICK RSTUDIO RUN THROUGHKeyboard shortcuts!!http://www.rstudio.org/docs/using/keyboard_shortcuts
USE CASE HERE[see intro_usecase.R file]

R Introduction

  • 1.
    R IntroWeek 1ScottChamberlain[modified from Haldre Rogers]September 9, 2011
  • 2.
    Don’t just listento me! Other Intros to R:http://www.stat.duke.edu/programs/gcc/ResourcesDocuments/RTutorial.pdfhttp://www.cyclismo.org/tutorial/R/http://www.r-tutor.com/r-introductionQuick R: http://www.statmethods.net/http://www.bioconductor.org/help/course-materials/2011/CSAMA/Monday/Morning%20Talks/R_intro.pdf
  • 3.
    R user frameworksRfrom command line: OSX and PCJust type “R” into the command line – and have fun!R itselfhttp://www.r-project.org/RStudio – good choicehttp://www.rstudio.org/RevolutionR [free academic version] – this is sort of the SAS-ised version of Rhttp://www.revolutionanalytics.com/downloads/free-academic.phpUses proprietary .xdf file format that speeds up computation timesMany other ways to use R, including GUIs, other IDEs, and huge variety of text editorshttps://github.com/RatRiceEEB/RIntroCode/wiki/R-ResourcesIf you are afraid of the code interface, use Rattle, or R Commander, or Deducer, or Red RYou can learn using these interfaces what code does what after pressing buttons
  • 4.
    R user frameworks,cont.R from PythonRPy: http://rpy.sourceforge.net/C from R: rcpp package:http://cran.r-project.org/web/packages/Rcpp/index.htmlhttp://dirk.eddelbuettel.com/code/rcpp.htmlCan hugely speed up computation times by writing R functions in C language. Then the function calls C to run instead of R.E.g., http://helmingstay.blogspot.com/2011/06/efficient-loops-in-r-complexity-versus.html& http://dirk.eddelbuettel.com/code/rcpp.examples.htmlExcel from RXLConnect package: http://cran.r-project.org/web/packages/XLConnect/index.htmlAnd more….see for yourself
  • 5.
    R TipsR cancrash  Do not use R’s built in text editor or solely write code in the R console. Instead use any text editor that integrates with R. See here for links: https://github.com/RatRiceEEB/RIntroCode/wiki/R-ResourcesWhen asking for help on listserves/help websites, use BRIEF and REPRODUCIBLE examplesNot doing this makes people not want to help you!R automatically overwrites files with the same file name!!!!Make sure you want to overwrite a file before doing so
  • 6.
  • 7.
    Not this kindof style…
  • 8.
    This kind ofstyle!!!
  • 9.
    StyleStyle is importantso YOU and OTHERS can read your code and actually use itGoogle style guide: http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html#generallayoutHenrik Bengtsson style guide: http://www1.maths.lth.se/help/R/RCC/Hadley Wickham's style guide: https://github.com/hadley/devtools/wiki/Style
  • 10.
    Preparing your datafor RWhat makes clean data?Correct spellingIdentical capitalization (e.g. Premna vspremna)If myvector <- c(3, 4, 5), calling Myvector does not work!No spaces between words (spaces turned into “.”)Generally try to avoid, use underscores insteadNA or blank (if using csv) for missing valuesFind and replace to get rid of spaces after wordsI generally keep an .xls and a .csv file so you can always recreate work in R with the .csv file and still modify the .xls file
  • 11.
    Bringing data intoRCreate csv fileOne worksheet onlyNo special formatting, filters, comments etc.Copy only columns and rows with your data to the CSV, as R will read in columns without data sometimesName your variables well self-explanatory, unique, lowercase, short-ish, one-word namesIn R, set the working directorysetwd("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro")What is the working directory? getwd()What is in the working directory? dir()Read in dataCSV files: iris.df <- read.csv("iris_df.csv", header=T)Clipboard: read.csv("clipboard")- reads in file like cutting and pasting itFrom web: read.csv("http://explore.data.gov/download/pwaj-zn2n/CSV")From excel files: (using the XLConnect package)iris.df <- readWorksheetFromFile("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro/iris_df.xlsx", sheet=“Sheet1”)Write datawrite.csv(dataframe, “dataframename.csv”), ORsave(iris, “iris.RData”) [and load(“iris.RData”) to open in R]
  • 12.
    R data structuresScalar:Objectwith a single value, either numeric or characterVector:Sequence of any values, including numeric, character, and NAList:Arbitrary collections of variables – very useful R objectCharacter:Text, e.g., “this is some text”Factor:Like character vectors, but only w/ values in predefined “levels”Matrix:Only numeric values allowedDataframe: Each column can be of a different classImmutable dataframe: special dataframe used in plyr package for faster dataframe manipulation, it references the original dataframe for faster calculationsFunctionEnvironment
  • 13.
    Exploring dataframesstr(dataframe) givescolumn formats and dimensionshead(dataframe) and tail() give first and last 6 rowsnames(dataframe) gives column namesrow.names(dataframe) gives row namesattributes(dataframe) gives column and row names and object classsummary(dataframe) gives a lot of good informationMake sure variables are appropriate formCharacter/string, Numeric, Factor, Integer, logicalMake sure mins, maxs, means, etc. seem rightMake sure you don’t have typing errors so Premna and premna are two separate factorsUse: unique(iris$species) to see what all unique values of a column areOr use: levels(spider$species) to see different levels
  • 14.
    To attach ornot to attach…that is the questionSome like to use ‘attach’ to make dataframe variables accessible by name within the R session Generally, ‘attach’ is frowned upon by R junkies. Use dataframe$y, or data=dataframe, or dataframe[,”y”], or dataframe[, 2]To detach the object, use: detach()  I recommend: do not use attach, but do what you want
  • 15.
    R Packages3,262 packages!!!!Packagesare extensions written by anyone for any purpose, usually loaded by:install.packages(”packagename”), thenrequire(packagename) or library()Use ?functionname for help on any function in base R or in R packagesIn RStudio, just press tab when in parentheses after the function name to see function options!!!Explore packages at the CRAN site:http://cran.r-project.org/web/packages/Inside-R package reference: http://www.inside-r.org/packages
  • 16.
    Data manipulationPackages: plyr,data.table, doBY, sqldf, reshape2, and moreComparison of packagesModified from code from Recipes, scripts and Genomics blog: https://gist.github.com/878919data.table is by far the fastest!!! BUT, ease of use and flexibility may be plyr? See for yourself…Also, see examples in the tutorial code for reshape2 package for neat data manipulation tricks
  • 17.
    VisualizationsA few differentapproaches:Base graphicsLattice graphicsGrid graphicsggplot2 graphicsFurther reading: http://www.slideshare.net/dataspora/a-survey-of-r-graphicsAn example:
  • 18.
    more on ggplot2graphicsThere are classes taught by Hadley Wickham here at Rice if you want to learn more!Data visualization (Stat645): http://had.co.nz/stat645/Statistical computing (Stat405): http://had.co.nz/stat405/Hadley’s website is really helpful: http://had.co.nz/ggplot2/The ggplot2 google groups site: https://groups.google.com/forum/#!forum/ggplot2
  • 19.
    QUICK RSTUDIO RUNTHROUGHKeyboard shortcuts!!http://www.rstudio.org/docs/using/keyboard_shortcuts
  • 20.
    USE CASE HERE[seeintro_usecase.R file]

Editor's Notes

  • #12 Header=T means first row contains variable names
  • #14 Some numbers are actually factors- think of 0/1 for dead/alive or zipcodes (average zipcode?)