Faculty of : FCE Program: B.Tech Class/Section: Sem V, Sec.
A,B,C(AIDS) Date:
Name of Faculty: Seema Kaloria Name of Course: R Programming Code: BADCCE5104
Descriptive Stastics:
Descriptive statistics in R provides a way to summarize and describe the main features of a dataset. These
statistics give insight into the central tendency, dispersion, and shape of a dataset’s distribution, which are
essential for understanding and interpreting data.
R provides several built-in functions and libraries for calculating descriptive statistics, including measures like
mean, median, standard deviation, variance, and more.
1. Basic Descriptive Statistics Functions in R
Here are some common functions for basic descriptive statistics in R:
a) Mean (mean())
The mean is the average of the data.
data <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
mean(data)
b) Median (median())
The median is the middle value of the dataset.
median(data)
c) Mode
R does not have a built-in function for mode, but it can be calculated as follows:
mode <- function(x) {
uniq <- unique(x)
uniq[which.max(tabulate(match(x, uniq)))]
}
mode(data)
d) Standard Deviation (sd())
The standard deviation measures how spread out the numbers in the dataset are.
sd(data)
e) Variance (var())
Variance is the square of the standard deviation and measures how much the data points deviate from the
mean.
var(data)
Session 2024-25
f) Range (range())
The range returns the minimum and maximum values of the dataset.
range(data)
g) Minimum and Maximum (min() and max())
To find the minimum and maximum values individually:
min(data)
max(data)
h) Quantiles (quantile())
Quantiles help you understand the distribution of the data. You can specify which quantiles to calculate.
quantile(data, probs = c(0.25, 0.5, 0.75)) # 25th, 50th (median), and 75th percentiles
i) Interquartile Range (IQR())
The interquartile range is the difference between the 75th and 25th percentiles and gives the spread of the
middle 50% of the data.
IQR(data)
j) Summary Statistics (summary())
R provides a built-in summary() function that gives a quick overview of the dataset.
summary(data)
This function provides:
Minimum
1st Quartile (25%)
Median (50%)
Mean
3rd Quartile (75%)
Maximum
2. Descriptive Statistics for Data Frames
If you are working with a data frame, you can apply the above functions to each column or use some of R's
packages to obtain more detailed descriptive statistics.
# Example data frame
df <- data.frame(
age = c(23, 45, 31, 35, 28),
weight = c(70, 80, 60, 75, 68),
height = c(165, 170, 158, 172, 160)
)
# Summary of each column in the data frame
summary(df)
3. Using Libraries for More Detailed Descriptive Statistics
Session 2024-25
a) psych package
The psych package provides a variety of functions for descriptive statistics. To install and load it:
install.packages("psych")
library(psych)
# Descriptive statistics for each column in a data frame
describe(df)
This function provides:
Mean
Standard deviation
Median
Minimum
Maximum
Skewness
Kurtosis
b) Hmisc package
The Hmisc package also provides functions for more detailed statistical summaries.
install.packages("Hmisc")
library(Hmisc)
# Summary statistics
describe(df)
c) summarytools package
For more flexible and comprehensive summaries, the summarytools package is useful.
install.packages("summarytools")
library(summarytools)
# Descriptive statistics for a data frame
dfSummary(df)
4. Visualizing Descriptive Statistics:To complement descriptive statistics, visualizing the data can help
identify patterns or outliers. Here are some commonly used plots:
a) Histogram (hist())
A histogram is useful to visualize the distribution of data.
hist(df$age, main = "Age Distribution", xlab = "Age", col = "lightblue")
b) Boxplot (boxplot())
A boxplot shows the distribution of the data along with the median, quartiles, and potential outliers.
boxplot(df$weight, main = "Boxplot of Weight", ylab = "Weight")
Session 2024-25