Data Analytics Using R Environment
Prabhat Mittal
University of Delhi
Introduction Getting Started with R Program Exercises
Table of contents I
1 Introduction
About R
The R Foundation
The R Environment
Characteristics of R
Why R?
Installation of R?
Help Facility
2 Getting Started with R Program
Introduction
3 Exercises
Prabhat Mittal Data Analytics Using R Environment 2 / 22
Introduction Getting Started with R Program Exercises
Introduction
About R
R is a language and environment for statistical computing and
graphics. It is a GNU project created by Ross Ihaka and Robert Gen-
tleman at the University of Auckland, New Zealand and is currently
developed by the R Development Core team of which Chambers is
member.
R is an implementation of the S programming language combined
with lexical scoping. The S-statistical programming developed by
John Chambers and colleagues while at Bell Laboratories (formerly
AT &T, now Lucent Technologies) is often the vehicle of choice for
research in statistical methodology, and R provides an Open Source
route to participation in that activity.
R is available as Free Software under GNU General Public License
in source code form (Unix like Operating System, started in 1984).
It compiles and runs on a wide variety of UNIX platforms, Berkley
Software Distribution (freeBSD), Linux, Windows and MacOS.
Prabhat Mittal Data Analytics Using R Environment 3 / 22
Introduction Getting Started with R Program Exercises
Introduction
The R Foundation
The R Foundation is seated in Vienna, Austria and currently hosted
by the Vienna University of Economics and Business. The R Foun-
dation can be contacted at
The R Foundation for Statistical Computing
c/o Institute for Statistics and Mathematics
Wirtschaftsuniversitt Wien
Welthandelsplatz 1
1020 Vienna, Austria
Tel: (+43 1) 31336 4754
Prabhat Mittal Data Analytics Using R Environment 4 / 22
Introduction Getting Started with R Program Exercises
Introduction
The R Environment
R is a programming language and software environment.We can
access R and commands can be submitted to R through the
command Prompt. We mark the code in a script and press
”Control-R” to executed the code.
Prabhat Mittal Data Analytics Using R Environment 5 / 22
Introduction Getting Started with R Program Exercises
Introduction
Characteristics of R
R is an integrated suite of software facilities for data
manipulation, calculation and graphical display. It includes
an effective data handling and storage facility,
a suite of operators for calculations on arrays, in particular
matrices,
a large, coherent, integrated collection of intermediate
tools for data analysis,
graphical facilities for data analysis and display either
on-screen or on hardcopy, and
a well-developed, simple and effective programming
language which includes conditionals, loops, user-defined
recursive functions and input and output facilities.
Prabhat Mittal Data Analytics Using R Environment 6 / 22
Introduction Getting Started with R Program Exercises
Introduction
Why R?
The term environment is intended to characterize it as a fully
planned and coherent system, rather than an incremental
accretion of very specific and inflexible tools, as is frequently
the case with other data analysis software.
R, like S, is designed around a true computer language, and it
allows users to add additional functionality by defining new
functions. Much of the system is itself written in the R dialect
of S, which makes it easy for users to follow the algorithmic
choices made. For computationally-intensive tasks, C, C++
and Fortran code can be linked and called at run time.
Advanced users can write C code to manipulate R objects
directly.
Many users think of R as a statistics system. We prefer to
think of it of an environment within which statistical
techniques are implemented.
Prabhat Mittal Data Analytics Using R Environment 7 / 22
Introduction Getting Started with R Program Exercises
Introduction
Why R?
R can be extended (easily) via packages. There are about
eight packages supplied with the R distribution and many
more are available through the Comprehensive R-Archive
Network (CRAN) family of Internet sites covering a very wide
range of modern statistics.
R has its own LaTeX-like documentation format, which is
used to supply comprehensive documentation, both on-line in
a number of formats and in hardcopy.
Prabhat Mittal Data Analytics Using R Environment 8 / 22
Introduction Getting Started with R Program Exercises
Introduction
Installation of R?
RConsole R (Programming language) and its console allows
to write and execute codes but it is not elegant form of
coding.
RStudio is an integrated development environment (IDE) a
comprehensive facilities to computer programmers for software
development. It consists of source code editor, build
automation tools & debugger.
Prabhat Mittal Data Analytics Using R Environment 9 / 22
Introduction Getting Started with R Program Exercises
Introduction
Characteristics of RStudio?
Syntax highlighting, code completion, and smart indentation
Execute R code directly from the source editor
Easily manage multiple working directories using projects
Interactive debugger to diagnose and fix errors quickly
Workspace browser and data viewer
Extensive package development tools
Prabhat Mittal Data Analytics Using R Environment 10 / 22
Introduction Getting Started with R Program Exercises
Introduction
Microsoft R Open with RStudio?
Microsoft R Open (MRO) is the perfect complement for the
RStudio environment. MRO supports multiple operating
system and provide features that enhance the performance and
reproducible code, sharing, and collaboration in R language.
Install R Studio
Install Microsoft R Open
install.packages(”readxl”)
After you have installed MRO on your system, open RStudio,
go the ”Tools” tab at the top, and select ”Global Options”.
You should see a couple of pop-up windows. If RStudio is not
already pointing to MRO browse to it, and Click ”OK”.
Prabhat Mittal Data Analytics Using R Environment 11 / 22
Introduction Getting Started with R Program Exercises
Introduction
The Interface of RStudio
Prabhat Mittal Data Analytics Using R Environment 12 / 22
Introduction Getting Started with R Program Exercises
Introduction
The Interface of RStudio
R Console: This area shows the output of code you run.
Also, you can directly write codes in console
R Script: As the name suggest, here you get space to write
codes. To run those codes, simply select the line(s) of code
and press Ctrl + Enter.
R environment: This space displays the set of external
elements added. To check if data has been loaded properly in
R, always look at this area.
Graphical Output: This space display the graphs created
during exploratory data analysis.
Prabhat Mittal Data Analytics Using R Environment 13 / 22
Introduction Getting Started with R Program Exercises
Introduction
Help Facility
R Installation and Administration can be accessed from a Web
browser at
https://cran.r-project.org/doc/manuals/R-admin.html
The help facility can be accessed from a Web browser at
https://cran.r-project.org/doc/manuals/R-intro.html
Prabhat Mittal Data Analytics Using R Environment 14 / 22
Introduction Getting Started with R Program Exercises
Getting Started with R Program
Basic Commands
getwd() returns an absolute filepath representing the current
working directory of the R process
setwd(”c:/docs/mydir”) is used to set the working directory
to mydir
Prabhat Mittal Data Analytics Using R Environment 15 / 22
Introduction Getting Started with R Program Exercises
Getting Started with R Program
R-Program Example I
Simulate 100 normally distributed random numbers and store them in
the object x which is stored in the R memory.
Prabhat Mittal Data Analytics Using R Environment 16 / 22
Introduction Getting Started with R Program Exercises
Getting Started with R Program
R-Program Example I
Prabhat Mittal Data Analytics Using R Environment 17 / 22
Introduction Getting Started with R Program Exercises
Getting Started with R Program
R-Program Example II
polynomial equation: AX 2 + BX + C = 0 has
Consider the general √
the solutions:(−B ± B 2 − 4AC )/2A
Solve the following equation:X 2 + 3X + 1 = 0 .Construct a vector of
length 2 that contains the solutions.
Prabhat Mittal Data Analytics Using R Environment 18 / 22
Introduction Getting Started with R Program Exercises
Getting Started with R Program
R-Program Example III
Construct the object x containing 100 random numbers following
N(µ = .32, σ 2 = .01). Calculate the mean, SD and Histogram of X
(second axis on right side).
Prabhat Mittal Data Analytics Using R Environment 19 / 22
Introduction Getting Started with R Program Exercises
Getting Started with R Program
R-Program Example IV
Perform the following tasks:
Assign the first five positive odd numbers to a vector A.
A <-seq(1,10,2)
Assign the mean of vector A to variable B.
B <-mean(A)
Assign the first five positive even numbers (0 excluded) to X.
X <-seq(2,10,2)
Add vector A and X and assign the result to vector Z.
Z <-A+X
Some Useful syntax
Help(tail), ?tailare commands for help
x <- 5 : 6 will result 5 6
ls() can be used to list all the R objects stored in the working
memory
Prabhat Mittal Data Analytics Using R Environment 20 / 22
Introduction Getting Started with R Program Exercises
Exercises
Perform statistical analysis of data using R programming:
Generate descriptive statistics of the data
Summarize Samples and Tables
Test of hypotheses
Perform Students t-test and Analysis of Variance
Develop Plots, Correlograms and Line Charts
Linear and Logistic Regression models
Cluster Analysis
Prabhat Mittal Data Analytics Using R Environment 21 / 22
Thank You
Email:
[email protected] URL: http://people.du.ac.in/p̃mittal/