Revolution Confidential 
Revolution Analytics 
R 
and 
Data Science 
Joseph B Rickert 
September 25, 2014
Revolution Confidential What is R? 
 Most widely used data analysis 
software 
 Used by 2M+ data scientists, 
statisticians and analysts 
 Most powerful statistical 
programming language 
 Flexible, extensible and 
comprehensive for productivity 
 Platform for beautiful and unique 
data visualizations 
 As seen in New York Times, Twitter 
and Flowing Data 
 Thriving open-source community 
 Leading edge of analytics research 
www.revolutionanalytics.com/what-r
OPEN SOURCE R
Revolution Confidential 
4 
R’s popularity is growing rapidly 
R Usage Growth 
Rexer Data Miner Survey, 2007-2013 
• Rexer Data Miner Survey • IEEE Spectrum, July 2014 
#9: R 
Language Popularity 
IEEE Spectrum Top Programming Languages
Revolution Confidential Poll Question #1 
 What are the statistical programming 
languages/platforms you are most familiar 
with? (choose all that apply) 
 A) R 
 B) SAS 
 C) SPSS 
 D) KXEN 
 E) Statistica 
5
Revolution Confidential Tools for Data Science 
Source: O’Reilly Data Science Survey 
6
Revolution Confidential 
7 
R is among the highest-paid IT skills in the 
US 
Dice Tech Salary Survey, January 2014 O’Reilly Strata 2013 Data Science Salary Survey
Revolution Confidential 
8 
Photo by Ksayer1 on flickr.
Revolution Confidential Why R for Data Science? 
X <- if (!is.empty.model(mt)) 
model.matrix(mt, mf, contrasts) 
else matrix(, NROW(Y), 0L) 
weights <- as.vector(model.weights(mf)) 
if (!is.null(weights) && !is.numeric(weights)) 
stop("'weights' must be a numeric vector") 
if (!is.null(weights) && any(weights < 0)) 
stop("negative weights not allowed") 
offset <- as.vector(model.offset(mf)) 
if (!is.null(offset)) { 
if (length(offset) != NROW(Y)) 
stop(gettextf("number of offsets is %d should equal %d (number of observations)", 
length(offset), NROW(Y)), domain = NA) 
} 
mustart <- model.extract(mf, "mustart") 
etastart <- model.extract(mf, "etastart") 
fit <- eval(call(if (is.function(method)) "method" else method, 
Algorithms 
x = X, y = Y, weights = weights, start = start, etastart = etastart, 
mustart = mustart, offset = offset, family = family, 
control = control, intercept = attr(mt, "intercept") > 
0L)) 
if (length(offset) && attr(mt, "intercept") > 0L) { 
fit2 <- eval(call(if (is.function(method)) "method" else method, 
x = X[, "(Intercept)", drop = FALSE], y = Y, weights = weights, 
offset = offset, family = family, control = control, 
intercept = TRUE)) 
if (!fit2$converged) 
warning("fitting to calculate the null deviance did not converge -- increase 'maxit'?") 
fit$null.deviance <- fit2$deviance 
} 
if (model) 
fit$model <- mf 
fit$na.action <- attr(mf, "na.action") 
if (x) 
fit$x <- X 
if (!y) 
fit$y <- NULL 
fit <- c(fit, list(call = call, formula = formula, terms = mt, 
data = data, offset = offset, control = control, method = method, 
contrasts = attr(X, "contrasts"), xlevels = .getXlevels(mt, 
mf))) 
class(fit) <- c(fit$class, c("glm", "lm")) 
fit 
9 
Task Views
Revolution Confidential R Growth 
Put this astonishing growth in 
perspective: 
 SAS.V 9.3S contains ~ 
1,200 commands that are 
roughly equivalent to R 
functions 
 R packages contain a 
median of 5 functions 
 Therefore R has ~ 36,820 
functions 
 During 2013 alone, R added 
more functions than SAS 
Institute has written in its 
entire history! 
Bob Muenchen 
10 
5882 packages 9/25/14
Revolution Confidential Why R for Data Science? 
Visualizations 
11
Revolution Confidential Why R for Data Science? 
 Scripting 
 Functional programming 
 Parallel programming 
 Data structures 
 Objects 
 Data Types 
 Regular expressions 
 Data connections 
 Interfaces to other 
Programming 
languages 
12
Revolution Confidential Why R for Data Science? 
Data Manipulation 
13 
“It's often said that 80% of the effort of analysis is spent just getting the data 
ready to analyse, the process of data cleaning. Data cleaning is not only a 
vital first step, but it is often repeated multiple times over the course of an 
analysis as new problems come to light.” Hadley Wickham Tidy Data
Revolution Confidential Why R for Data Science? 
R Integrates 
 Web applications 
 Internet graphics 
 D3 
 Potly 
 Other Languages 
 C, C++ 
 Java 
 BI Tools 
 Data bases 
 SQL 
 MongoDB 
14
Revolution Confidential Poll Question #2 
 What are the data platforms that you are 
connecting to regularly? (choose all that 
apply) 
 A) Hadoop 
 B) Spark 
 C) Cloud-based (Azure/AWS/Google) 
 D) Data Warehouses 
 E) Servers (Grid or Cluster) 
15
Revolution Confidential Why R for Data Science 
Hadoop 
Servers & 
Clusters 
Data 
Warehouses 
R Scales
Revolution Confidential Poll Question #3 
 What are the types of models that you are 
working with most? (choose all that apply) 
 A) Linear models / Regression / GLM 
 B) Decision Trees / Random Forests 
 C) Survival Models 
 D) GBM 
 E) Time Series models 
17
Let’s look at some 
code. 
www.revolutionanalytics.com 
1.855.GET.REVO 
Twitter: @RevolutionR
Revolution Confidential 
19 
Why is R Right for Data Science? 
 R is open source 
 R is a powerful language 
 Data Manipulation 
 Computational Statistics 
 Machine Learning 
 R is an innovation engine 
 R has a rich and expanding ecosystem
Revolution Confidential 
20 
Q&A / Resources 
R Code and Markdown Files 
https://github.com/joseph-rickert/DataScienceRWebinar 
What is R? 
revolutionanalytics.com/what-is-r 
Companies using R 
revolutionanalytics.com/companies-using-r 
AcademyR training 
revolutionanalytics.com/AcademyR 
AcademyR Certification 
revolutionanalytics.com/AcademyR-certification 
Contact Revolution Analytics 
revolutionanalytics.com/contact-us
Thank you 
Revolution Analytics is the leading commercial 
provider of software and support for the 
popular open source R statistics language. 
www.revolutionanalytics.com, 1.855.GET.REVO, Twitter: @RevolutionR 21

More Related Content

PPTX
R programming presentation
PPT
R programming slides
PPTX
Unit 1 sepm process models
PPTX
R programming
PDF
Introduction to R
PPTX
Vertex cover Problem
PPTX
How to get started with R programming
PDF
Class ppt intro to r
R programming presentation
R programming slides
Unit 1 sepm process models
R programming
Introduction to R
Vertex cover Problem
How to get started with R programming
Class ppt intro to r

What's hot (20)

PDF
Introduction to R Programming
PPT
Graph colouring
PPTX
Open addressiing &amp;rehashing,extendiblevhashing
PDF
Introduction to R and R Studio
PPTX
Python and its Applications
PPTX
R programming
PPTX
Data analysis with R
PPTX
Data structure and algorithm using java
PPT
Introduction to Natural Language Processing
PDF
Latent Dirichlet Allocation
PPT
R studio
PDF
Data Types and Structures in R
PPTX
R Programming Language
PPT
Case study windows
PPTX
Data Structures in Python
PPTX
Data structures and algorithms
PPT
DESIGN AND ANALYSIS OF ALGORITHMS
PDF
R data-import, data-export
 
PPTX
Python Tutorial for Beginner
PPT
R programming
Introduction to R Programming
Graph colouring
Open addressiing &amp;rehashing,extendiblevhashing
Introduction to R and R Studio
Python and its Applications
R programming
Data analysis with R
Data structure and algorithm using java
Introduction to Natural Language Processing
Latent Dirichlet Allocation
R studio
Data Types and Structures in R
R Programming Language
Case study windows
Data Structures in Python
Data structures and algorithms
DESIGN AND ANALYSIS OF ALGORITHMS
R data-import, data-export
 
Python Tutorial for Beginner
R programming
Ad

Viewers also liked (6)

PPTX
A Workshop on R
PPTX
Data Analytics with R and SQL Server
PPTX
Training in Analytics, R and Social Media Analytics
PDF
Introduction to Data Analytics with R
PPTX
Tata consultancy services final
A Workshop on R
Data Analytics with R and SQL Server
Training in Analytics, R and Social Media Analytics
Introduction to Data Analytics with R
Tata consultancy services final
Ad

Similar to R and Data Science (20)

PPTX
R at Microsoft
PPTX
Revolution R: 100% R and more
PPTX
How the growth of R helps data-driven organizations succeed
PDF
Big Data Analytics with R
PDF
Introduction to R for Data Mining (Feb 2013)
PDF
Scalable Data Analysis in R Webinar Presentation
PPTX
Revolution Analytics Podcast
PDF
useR2011 - Edlefsen
PDF
Introduction to R for Data Mining
PDF
New Features in Revolution R Enterprise 5.0 to Support Scalable Data Analysis
PPTX
Are You Ready for Big Data Big Analytics?
PDF
Executive Intro to R
PPTX
Applications of R (DataWeek 2014)
PDF
R and Big Data using Revolution R Enterprise with Hadoop
PDF
R - the language
PDF
In-Database Analytics Deep Dive with Teradata and Revolution
PDF
GET STARTED WITH R FOR DATA SCIENCE
PPTX
A Step Towards Reproducibility in R
PPTX
Revolution Analytics: a 5-minute history
PDF
100% R and More: Plus What's New in Revolution R Enterprise 6.0
R at Microsoft
Revolution R: 100% R and more
How the growth of R helps data-driven organizations succeed
Big Data Analytics with R
Introduction to R for Data Mining (Feb 2013)
Scalable Data Analysis in R Webinar Presentation
Revolution Analytics Podcast
useR2011 - Edlefsen
Introduction to R for Data Mining
New Features in Revolution R Enterprise 5.0 to Support Scalable Data Analysis
Are You Ready for Big Data Big Analytics?
Executive Intro to R
Applications of R (DataWeek 2014)
R and Big Data using Revolution R Enterprise with Hadoop
R - the language
In-Database Analytics Deep Dive with Teradata and Revolution
GET STARTED WITH R FOR DATA SCIENCE
A Step Towards Reproducibility in R
Revolution Analytics: a 5-minute history
100% R and More: Plus What's New in Revolution R Enterprise 6.0

More from Revolution Analytics (20)

PPTX
Speeding up R with Parallel Programming in the Cloud
PPTX
Migrating Existing Open Source Machine Learning to Azure
PPTX
R in Minecraft
PPTX
The case for R for AI developers
PPTX
Speed up R with parallel programming in the Cloud
PPTX
The R Ecosystem
PPTX
R Then and Now
PPTX
Predicting Loan Delinquency at One Million Transactions per Second
PPTX
Reproducible Data Science with R
PPTX
The Value of Open Source Communities
PPTX
The R Ecosystem
PPTX
R at Microsoft (useR! 2016)
PPTX
Building a scalable data science platform with R
PPTX
R at Microsoft
PPTX
The Business Economics and Opportunity of Open Source Data Science
PPTX
Taking R Analytics to SQL and the Cloud
PPTX
The Network structure of R packages on CRAN & BioConductor
PPTX
The network structure of cran 2015 07-02 final
PPTX
Simple Reproducibility with the checkpoint package
PDF
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Speeding up R with Parallel Programming in the Cloud
Migrating Existing Open Source Machine Learning to Azure
R in Minecraft
The case for R for AI developers
Speed up R with parallel programming in the Cloud
The R Ecosystem
R Then and Now
Predicting Loan Delinquency at One Million Transactions per Second
Reproducible Data Science with R
The Value of Open Source Communities
The R Ecosystem
R at Microsoft (useR! 2016)
Building a scalable data science platform with R
R at Microsoft
The Business Economics and Opportunity of Open Source Data Science
Taking R Analytics to SQL and the Cloud
The Network structure of R packages on CRAN & BioConductor
The network structure of cran 2015 07-02 final
Simple Reproducibility with the checkpoint package
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15

Recently uploaded (20)

PPTX
SGT Report The Beast Plan and Cyberphysical Systems of Control
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
Co-training pseudo-labeling for text classification with support vector machi...
PDF
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
PDF
SaaS reusability assessment using machine learning techniques
PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PDF
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
PDF
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PPTX
Microsoft User Copilot Training Slide Deck
PDF
Statistics on Ai - sourced from AIPRM.pdf
PPTX
future_of_ai_comprehensive_20250822032121.pptx
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PDF
Auditboard EB SOX Playbook 2023 edition.
PPTX
Training Program for knowledge in solar cell and solar industry
PDF
Advancing precision in air quality forecasting through machine learning integ...
SGT Report The Beast Plan and Cyberphysical Systems of Control
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Co-training pseudo-labeling for text classification with support vector machi...
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
SaaS reusability assessment using machine learning techniques
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
Early detection and classification of bone marrow changes in lumbar vertebrae...
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
Custom Battery Pack Design Considerations for Performance and Safety
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
Microsoft User Copilot Training Slide Deck
Statistics on Ai - sourced from AIPRM.pdf
future_of_ai_comprehensive_20250822032121.pptx
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
Auditboard EB SOX Playbook 2023 edition.
Training Program for knowledge in solar cell and solar industry
Advancing precision in air quality forecasting through machine learning integ...

R and Data Science

  • 1. Revolution Confidential Revolution Analytics R and Data Science Joseph B Rickert September 25, 2014
  • 2. Revolution Confidential What is R?  Most widely used data analysis software  Used by 2M+ data scientists, statisticians and analysts  Most powerful statistical programming language  Flexible, extensible and comprehensive for productivity  Platform for beautiful and unique data visualizations  As seen in New York Times, Twitter and Flowing Data  Thriving open-source community  Leading edge of analytics research www.revolutionanalytics.com/what-r
  • 4. Revolution Confidential 4 R’s popularity is growing rapidly R Usage Growth Rexer Data Miner Survey, 2007-2013 • Rexer Data Miner Survey • IEEE Spectrum, July 2014 #9: R Language Popularity IEEE Spectrum Top Programming Languages
  • 5. Revolution Confidential Poll Question #1  What are the statistical programming languages/platforms you are most familiar with? (choose all that apply)  A) R  B) SAS  C) SPSS  D) KXEN  E) Statistica 5
  • 6. Revolution Confidential Tools for Data Science Source: O’Reilly Data Science Survey 6
  • 7. Revolution Confidential 7 R is among the highest-paid IT skills in the US Dice Tech Salary Survey, January 2014 O’Reilly Strata 2013 Data Science Salary Survey
  • 8. Revolution Confidential 8 Photo by Ksayer1 on flickr.
  • 9. Revolution Confidential Why R for Data Science? X <- if (!is.empty.model(mt)) model.matrix(mt, mf, contrasts) else matrix(, NROW(Y), 0L) weights <- as.vector(model.weights(mf)) if (!is.null(weights) && !is.numeric(weights)) stop("'weights' must be a numeric vector") if (!is.null(weights) && any(weights < 0)) stop("negative weights not allowed") offset <- as.vector(model.offset(mf)) if (!is.null(offset)) { if (length(offset) != NROW(Y)) stop(gettextf("number of offsets is %d should equal %d (number of observations)", length(offset), NROW(Y)), domain = NA) } mustart <- model.extract(mf, "mustart") etastart <- model.extract(mf, "etastart") fit <- eval(call(if (is.function(method)) "method" else method, Algorithms x = X, y = Y, weights = weights, start = start, etastart = etastart, mustart = mustart, offset = offset, family = family, control = control, intercept = attr(mt, "intercept") > 0L)) if (length(offset) && attr(mt, "intercept") > 0L) { fit2 <- eval(call(if (is.function(method)) "method" else method, x = X[, "(Intercept)", drop = FALSE], y = Y, weights = weights, offset = offset, family = family, control = control, intercept = TRUE)) if (!fit2$converged) warning("fitting to calculate the null deviance did not converge -- increase 'maxit'?") fit$null.deviance <- fit2$deviance } if (model) fit$model <- mf fit$na.action <- attr(mf, "na.action") if (x) fit$x <- X if (!y) fit$y <- NULL fit <- c(fit, list(call = call, formula = formula, terms = mt, data = data, offset = offset, control = control, method = method, contrasts = attr(X, "contrasts"), xlevels = .getXlevels(mt, mf))) class(fit) <- c(fit$class, c("glm", "lm")) fit 9 Task Views
  • 10. Revolution Confidential R Growth Put this astonishing growth in perspective:  SAS.V 9.3S contains ~ 1,200 commands that are roughly equivalent to R functions  R packages contain a median of 5 functions  Therefore R has ~ 36,820 functions  During 2013 alone, R added more functions than SAS Institute has written in its entire history! Bob Muenchen 10 5882 packages 9/25/14
  • 11. Revolution Confidential Why R for Data Science? Visualizations 11
  • 12. Revolution Confidential Why R for Data Science?  Scripting  Functional programming  Parallel programming  Data structures  Objects  Data Types  Regular expressions  Data connections  Interfaces to other Programming languages 12
  • 13. Revolution Confidential Why R for Data Science? Data Manipulation 13 “It's often said that 80% of the effort of analysis is spent just getting the data ready to analyse, the process of data cleaning. Data cleaning is not only a vital first step, but it is often repeated multiple times over the course of an analysis as new problems come to light.” Hadley Wickham Tidy Data
  • 14. Revolution Confidential Why R for Data Science? R Integrates  Web applications  Internet graphics  D3  Potly  Other Languages  C, C++  Java  BI Tools  Data bases  SQL  MongoDB 14
  • 15. Revolution Confidential Poll Question #2  What are the data platforms that you are connecting to regularly? (choose all that apply)  A) Hadoop  B) Spark  C) Cloud-based (Azure/AWS/Google)  D) Data Warehouses  E) Servers (Grid or Cluster) 15
  • 16. Revolution Confidential Why R for Data Science Hadoop Servers & Clusters Data Warehouses R Scales
  • 17. Revolution Confidential Poll Question #3  What are the types of models that you are working with most? (choose all that apply)  A) Linear models / Regression / GLM  B) Decision Trees / Random Forests  C) Survival Models  D) GBM  E) Time Series models 17
  • 18. Let’s look at some code. www.revolutionanalytics.com 1.855.GET.REVO Twitter: @RevolutionR
  • 19. Revolution Confidential 19 Why is R Right for Data Science?  R is open source  R is a powerful language  Data Manipulation  Computational Statistics  Machine Learning  R is an innovation engine  R has a rich and expanding ecosystem
  • 20. Revolution Confidential 20 Q&A / Resources R Code and Markdown Files https://github.com/joseph-rickert/DataScienceRWebinar What is R? revolutionanalytics.com/what-is-r Companies using R revolutionanalytics.com/companies-using-r AcademyR training revolutionanalytics.com/AcademyR AcademyR Certification revolutionanalytics.com/AcademyR-certification Contact Revolution Analytics revolutionanalytics.com/contact-us
  • 21. Thank you Revolution Analytics is the leading commercial provider of software and support for the popular open source R statistics language. www.revolutionanalytics.com, 1.855.GET.REVO, Twitter: @RevolutionR 21

Editor's Notes

  • #3: Image reference: http://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919
  • #8: Dice Tech Salary Survey, January 2014 O’Reilly Strata 2013 Data Science Salary Survey