0% found this document useful (0 votes)
44 views7 pages

Data Frames in R

The document provides an overview of data types and structures in R, focusing on data frames and vectors. It explains how to create and manipulate data frames, including adding rows and columns, and using the $ operator for data access. Additionally, it includes an assignment for analyzing airline passenger data using R functions.

Uploaded by

keneitarus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views7 pages

Data Frames in R

The document provides an overview of data types and structures in R, focusing on data frames and vectors. It explains how to create and manipulate data frames, including adding rows and columns, and using the $ operator for data access. Additionally, it includes an assignment for analyzing airline passenger data using R functions.

Uploaded by

keneitarus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

SDS3102: FOUNDATIONS OF DATA SCIENCE

Dataframes in R
CEMA

Recap

In R, data types define the kind of values that a variable can hold. The key data types
include:

• Double: Represents decimal numbers, such as 3.14 or 2.5.


• Integer: Represents whole numbers, such as 5 or 10. In R, to explicitly define a number
as an integer, you must add the suffix “L” to it. For example, 10L represents the number
10 as an integer.
• Character: Represents text or string values, such as "Hello" or "R Language".
• Logical: Represents Boolean values, either TRUE or FALSE.

Both double and integer values are categorized as numeric in R. While they have different
internal representations, they both belong to the broader numeric category.

typeof(3.14) # "double"

[1] "double"

typeof(10L) # "integer"

[1] "integer"

[Link](3.14) # TRUE

[1] TRUE

[Link](10L) # TRUE

[1] TRUE

1
A vector is the most basic data structure in R. It is a one-dimensional collection of elements,
all of the same data type. Vectors can store numbers, characters, or logical values. You can
create a vector using the c() function:

# Numeric vector
num_vec <- c(1, 2, 3, 4.5)

# Integer vector
int_vec <- c(10L, 20L, 30L)

# Character vector
char_vec <- c("apple", "banana", "cherry")

# Logical vector
log_vec <- c(TRUE, FALSE, TRUE)

To check the type of vector, you can use:

typeof(num_vec) # "double"

[1] "double"

typeof(int_vec) # "integer"

[1] "integer"

typeof(char_vec) # "character"

[1] "character"

typeof(log_vec) # "logical"

[1] "logical"

2
Data frames in R

A data frame is a table-like data structure in R that stores data in rows and columns. It is
similar to a spreadsheet or database table, where each column represents a variable, and each
row represents an observation. Unlike matrices, a data frame can contain multiple data types
(e.g., numeric, character, logical) in different columns.
You can create a data frame in R using the [Link]() function. Below is an example of
creating a data frame using two vectors:

names_vec <- c("Alice", "Bob", "Charlie") # Character vector


ages_vec <- c(25, 30, 22) # Numeric vector

# Creating a data frame


people_df <- [Link](Name = names_vec, Age = ages_vec)

# Print the data frame


print(people_df) # Print function is used to print the output in the console

Name Age
1 Alice 25
2 Bob 30
3 Charlie 22

Recall that we can also create a data frame using the following method:

people_df <- [Link](names_vec = c("Alice", "Bob", "Charlie"),


ages_vec = c(25, 30, 22))

R provides several functions to inspect the structure and properties of a data frame:

1. The str() function provides an overview of the data frame, showing column names, data
types, and a preview of the values.
2. The dim() function returns the number of rows and columns in the data frame.
3. The colnames() function returns the names of all columns.
4. The rownames() function returns the row names of the data frame.
5. The ncol() function returns the number of columns in the data frame.
6. The nrow() function returns the number of rows in the data frame.

3
str(people_df)

'[Link]': 3 obs. of 2 variables:


$ names_vec: chr "Alice" "Bob" "Charlie"
$ ages_vec : num 25 30 22

dim(people_df) # Returns (number of rows, number of columns)

[1] 3 2

colnames(people_df)

[1] "names_vec" "ages_vec"

rownames(people_df)

[1] "1" "2" "3"

ncol(people_df)

[1] 2

Adding a row and column in a data frame

Once a data frame is created, we can modify it by adding new rows and columns using rbind()
and cbind(). Let’s create a data frame of fictional students’ exam scores.

# Creating a data frame


students_df <- [Link](
student_names = c("Alice", "Bob", "Charlie"),
math_scores = c(85, 90, 78),
science_scores = c(88, 92, 80)
)

# Print the data frame


print(students_df)

4
student_names math_scores science_scores
1 Alice 85 88
2 Bob 90 92
3 Charlie 78 80

Suppose we want to add a new column for English scores:

# New column
english_scores <- c(89, 85, 82)

# Adding the column using cbind()


students_df1 <- cbind(students_df, english_scores)

# Print updated data frame


print(students_df1)

student_names math_scores science_scores english_scores


1 Alice 85 88 89
2 Bob 90 92 85
3 Charlie 78 80 82

Now, let’s add a new student’s record using rbind():

# New student data (must match existing column structure)


new_student <- c("David", 88, 86, 90)

# Adding the row using rbind()


students_df2 <- rbind(students_df1, new_student)

# Print updated data frame


print(students_df2)

student_names math_scores science_scores english_scores


1 Alice 85 88 89
2 Bob 90 92 85
3 Charlie 78 80 82
4 David 88 86 90

5
The $ Operator in R

The $ operator is used to access or modify specific columns in a data frame. It allows us to
refer to a column by its name, making data manipulation easier. You can use the $ operator
to extract a specific column from a data frame, e.g:
# Extracting the "Math" column
students_df2$math_scores

[1] "85" "90" "78" "88"

We can also use the $ operator to add a new column to an existing data frame. Let’s add a
Total Score column by summing the subject scores.

# Adding a new "Gender" column using the $ operator and c()


students_df2$Gender <- c("Female", "Male", "Male", "Male")

# Adding an "Age" column using the $ operator and c()


students_df2$Age <- c(20, 22, 21, 24)

# Print the updated data frame


print(students_df2)

student_names math_scores science_scores english_scores Gender Age


1 Alice 85 88 89 Female 20
2 Bob 90 92 85 Male 22
3 Charlie 78 80 82 Male 21
4 David 88 86 90 Male 24

We can also create a new column with the existing columns as:

# Adding a new column using the $ operator


students_df1$total_score <- students_df1$math_scores +
students_df1$science_scores +
students_df1$english_scores

# Print updated data frame


print(students_df1)

student_names math_scores science_scores english_scores total_score


1 Alice 85 88 89 262
2 Bob 90 92 85 267
3 Charlie 78 80 82 240

6
Assignment: Airline Passenger Data Analysis in R

Creating the data frame

You are working as a data analyst for an airline company. Your task is to analyze passenger
data. Create a data frame named flights_df with the following columns:

1. passenger_id (integer)
2. name (character)
3. age (integer)
4. gender (character)
5. flight_number (character)
6. destination (character)
7. ticket_price (numeric)
8. class (character) - Can be “Economy”, “Business”, or “First Class”

Data Analysis

Answer the following questions using appropriate functions:

1. How much total revenue has been generated from ticket sales? (Hint: Use sum() on
ticket_price)
2. What is the average ticket price for passengers? (Hint: Use mean() on ticket_price)
3. What is the variance in ticket prices? (Hint: Use var() on ticket_price)
4. How many passengers are traveling to each destination? (Hint: Use table() and $
on destination)
5. Use the $ operator to:

• Extract and display only the name column.


• Extract and display the ticket prices of all passengers.
• Add a new column called discounted_price, where passengers aged below 18 or above
60 receive a discount on their ticket price.

You might also like