0% found this document useful (0 votes)
48 views3 pages

R Programming: Descriptive Stats Guide

Uploaded by

RAHUL SHARMA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views3 pages

R Programming: Descriptive Stats Guide

Uploaded by

RAHUL SHARMA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Faculty of : FCE Program: B.Tech Class/Section: Sem V, Sec.

A,B,C(AIDS) Date:

Name of Faculty: Seema Kaloria Name of Course: R Programming Code: BADCCE5104

Descriptive Stastics:
Descriptive statistics in R provides a way to summarize and describe the main features of a dataset. These
statistics give insight into the central tendency, dispersion, and shape of a dataset’s distribution, which are
essential for understanding and interpreting data.

R provides several built-in functions and libraries for calculating descriptive statistics, including measures like
mean, median, standard deviation, variance, and more.
1. Basic Descriptive Statistics Functions in R

Here are some common functions for basic descriptive statistics in R:

a) Mean (mean())

The mean is the average of the data.

data <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)


mean(data)
b) Median (median())

The median is the middle value of the dataset.

median(data)
c) Mode

R does not have a built-in function for mode, but it can be calculated as follows:

mode <- function(x) {


uniq <- unique(x)
uniq[which.max(tabulate(match(x, uniq)))]
}
mode(data)
d) Standard Deviation (sd())

The standard deviation measures how spread out the numbers in the dataset are.

sd(data)
e) Variance (var())

Variance is the square of the standard deviation and measures how much the data points deviate from the
mean.

var(data)

Session 2024-25
f) Range (range())

The range returns the minimum and maximum values of the dataset.

range(data)
g) Minimum and Maximum (min() and max())

To find the minimum and maximum values individually:

min(data)
max(data)
h) Quantiles (quantile())

Quantiles help you understand the distribution of the data. You can specify which quantiles to calculate.

quantile(data, probs = c(0.25, 0.5, 0.75)) # 25th, 50th (median), and 75th percentiles
i) Interquartile Range (IQR())

The interquartile range is the difference between the 75th and 25th percentiles and gives the spread of the
middle 50% of the data.

IQR(data)
j) Summary Statistics (summary())

R provides a built-in summary() function that gives a quick overview of the dataset.

summary(data)

This function provides:

 Minimum
 1st Quartile (25%)
 Median (50%)
 Mean
 3rd Quartile (75%)
 Maximum

2. Descriptive Statistics for Data Frames

If you are working with a data frame, you can apply the above functions to each column or use some of R's
packages to obtain more detailed descriptive statistics.

# Example data frame


df <- data.frame(
age = c(23, 45, 31, 35, 28),
weight = c(70, 80, 60, 75, 68),
height = c(165, 170, 158, 172, 160)
)

# Summary of each column in the data frame


summary(df)

3. Using Libraries for More Detailed Descriptive Statistics

Session 2024-25
a) psych package

The psych package provides a variety of functions for descriptive statistics. To install and load it:

install.packages("psych")
library(psych)

# Descriptive statistics for each column in a data frame


describe(df)

This function provides:

 Mean
 Standard deviation
 Median
 Minimum
 Maximum
 Skewness
 Kurtosis

b) Hmisc package

The Hmisc package also provides functions for more detailed statistical summaries.

install.packages("Hmisc")
library(Hmisc)

# Summary statistics
describe(df)
c) summarytools package

For more flexible and comprehensive summaries, the summarytools package is useful.

install.packages("summarytools")
library(summarytools)

# Descriptive statistics for a data frame


dfSummary(df)
4. Visualizing Descriptive Statistics:To complement descriptive statistics, visualizing the data can help
identify patterns or outliers. Here are some commonly used plots:
a) Histogram (hist())

A histogram is useful to visualize the distribution of data.

hist(df$age, main = "Age Distribution", xlab = "Age", col = "lightblue")


b) Boxplot (boxplot())

A boxplot shows the distribution of the data along with the median, quartiles, and potential outliers.

boxplot(df$weight, main = "Boxplot of Weight", ylab = "Weight")

Session 2024-25

You might also like