0% found this document useful (0 votes)
3 views

ProgrammingForDS14_Rbasics

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

ProgrammingForDS14_Rbasics

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Basics of R

Liana Harutyunyan
Programming for Data Science
April 8, 2024
American University of Armenia
[email protected]

1
Special Values in R — NAs

In R, the NA values are used to represent missing values.


(NA stands for “not available.”)

vec <- c(1, 2, 3, NA)

There is is.na() function to check whether you have NA.


This returns TRUE-FALSE values, you can use sum function to
count how many TRUE you have.
There is which function as well that will return the index of
TRUE value.

2
Type conversion

In Python, when we wanted to change the data type of an


object, we wrote for example:

int("123")

In R, there are as.numeric, as.double, as.integer (the last


two are combined in the first), as.character, as.factor etc
for data type conversion.
When the conversion is not possible, R returns NA values.

as.numeric(c(1, 10, "a"))

3
Special Values in R — Infs

If a computation results in a number that is too big, R will


return Inf for a positive number and -Inf for a negative
number.

50/0, -50/0, 5010000

4
Special Values in R — NaN

Sometimes, a computation will produce a result that makes


little sense. In these cases, R will often return NaN (meaning
“not a number”)

0 / 0

5
Special Values in R — NULL

Additionally, there is a NULL in R.

• NULL is often used as an argument in functions to mean


that no value was assigned to the argument.
• Additionally, some functions may return NULL.
• NULL is used mainly to represent the lists with zero
length, and is often returned by expressions and
functions. (similar to Python’s None).

6
Reading data

In Data Science, everything revolves around the data.


If the data is given in a tabular format in a CSV file, we can
easily read it into R (like we did in Python with pandas).

cities <- read.csv("cities.csv")

By str function you can see info about each column.


• stringsAsFactors argument of read.csv function
allows to specify how to read strings - as strings or as
factors.
• sep argument of read.csv function allows to specify the
separator type that is used in the data. This can be ’,’
(comma), ’\t’, etc.
7
Subsetting data

Exercises:

• Create a named vector of football country teams and


scored goals of at least 4 elements.
• Take the elements 1 and 2.
• Take the elements 1 and 3.
• Take elements by subsetting of names.

Why named vector[1, 2] gives error?

8
Subsetting data

Exercises solutions:

• Create a named vector of football country teams and


scored goals of at least 4 elements.
• Take the elements 1 and 2. named vector[1:2]
• Take the elements 1 and 3. named vector[c(1, 3)]
• Take elements by subsetting of names.
named vector[c("name1", "name2")]

named vector[1, 2] gives error, because when writing with


comma, R understands that as a second dimension:
vector[row, col].

9
Subsetting of Data Frame

data[rows, cols]

What they will return?

• data[2, 5]
• data[1:10, 4:6]
• data[2, c(1, 5)]
• data[1:3, ]
• data[, c(1, 4:6)]

10
Subsetting of Data Frame

We can also subset by column names:

• data[, c("City", "State")]

We can exclude columns/rows by negative indexing, but it


does not work with name indexing.

• data[, -c(2, 5, 7)]

11
Subsetting of Data Frame

Exercises:

• include first 100 rows and columns 2,3,5


• exclude rows 10, 20, 30 and exclude column 5.

12
Data Frames

You can access specific column in dataframe by using $.

data$City

• Can count statistics on the column: mean(data$City),


table(data$City)

13
Conditional Indexing

Creating new dataframe where we will have only cities from


”WA” state.
This means, we want to have those ROWS, that have ”WA”
for column ”State”.

14
Conditional Indexing

Creating new dataframe where we will have only cities from


”WA” state.
This means, we want to have those ROWS, that have ”WA”
for column ”State”.

cities[cities$State == "WA",]

Can check with table or unique functions if we did correctly.

14
Conditional Indexing

Create new dataframe where we have cities from EITHER


”WA” state or ”OH” state.
How would we do this in Python?

15
Conditional Indexing

Create new dataframe where we have cities from EITHER


”WA” state or ”OH” state.
How would we do this in Python?
For R, instead of isin (Python), we have

cities[cities$State %in% c("WA", "OH"),]

To condition by multiple statements use ’&’ or ’|’.

15
Column Types

To change the column types, we can use as.DATATYPE


functions.

data$column = as.numeric(data$column)

16
New Column

To add a new column in the dataframe:

data$lat diff <- data$LatD - data$LatM

17
Conditionals

else if / else parts can be omitted.

18
Conditionals

Exercises:

• Given a year value, display whether the year is a leap


year or not.
• If given value is negative, change the number to its
square, if the value is 0, change to 1, if the value is
positive, change to 2 * number.

19
For loops

• Iterating over an object.

for(i in x) { }

• Iterating over object containing the indices of the main


object.
for(i in 1:length(x)) { }

Here, we also have break and next (intead of continue in


python).

20
For loops

• Iterate over a vector and add 2 to each value. Do this


once inplace, and once storing the changed values into
a new vector.
• Use a for loop to count the number of even numbers
stored inside a vector of numbers.
• Use a for loop to get indices of even numbers in the
vector.
• Use for and if to find solution to equation
2x 2 − 20x − 48 = 0. Stop when you find the solution.

21
For loops

How to iterate using a for loop over a dataset.

for (i in 1:ncol(data)) { }

Exercise: Iterate over iris dataset’s numerical columns and


add 1000 to each.

22
While loops

while (test expression) { statement }

Exercises:

• Given a number, print all the even numbers from 0 to


that number using while.
• Given a positive integer and calculate the sum of all the
integers from 1 to that number using while.

23
Functions

• There are built-in R functions, such as mean, sum,


length, nchar, runif, etc
• You can write custom functions:

function name <- function(x) { return(x) }

Same as

function name <- function(x) { x }

In R, the last line of function expression is returned


by the function.

24
Functions

Functions can have default arguments, and unlike Python,


there is no order for that.
Reference: In Python you can not have an argument w/o
default value after an argument w default value (func(x=2,
y)).

25
Functions

Examples:

• A function that calculates the area of a rectangle given


its length and width as arguments.
• A function that calculates the factorial of a given
number.
• A function that takes an array of numbers as input and
returns the largest number in the array, without using
the function max.

26
apply family of functions

These functions allow crossing the data in a number of ways


and avoid explicit use of loop constructs.
The most commun functions: apply(), lapply() ,
sapply().

• apply(X, MARGIN, FUN, ...) - applies the FUN on X


with MARGIN (axis).
Example: Create a matrix object and calculate sum of
the columns.

27
apply family of functions

• lapply(X, FUN, ...) - applies the same function on


each element of object (list, vector, dataframe) and
always returns a list.

Example: Have a list containing a vector and a dataframe.


Calculate sum of each element using lapply.
Example: Having a vector, calculate sqrt of each element.

28
apply family of functions

The sapply() and lapply() work basically the same.


The only difference is that lapply() always returns a list,
whereas sapply() tries to simplify the result into a vector or
matrix

sapply(X, FUN, ...) - applies the FUN on X and simplifies


result if possible.

29
Data Vizualization

The most popular package in R, that we can use to plot


graphs is ggplot2.
As a reminder to install a package, we write:

install.packages("ggplot2")

You should write this either in the console, or in the R script


or Rmd part, but after completion, delete or comment the
line.
Then you need to ”import” the package.

library(ggplot2)

30

You might also like