0% found this document useful (0 votes)
16 views

Statistics Introduction

The document discusses the mtcars data set in R, which contains information on 32 cars. It describes that the data set has 32 observations and 11 variables, and provides definitions of the variables. Examples are given to find the dimensions, variable names, row names, and print, sort, and analyze values of variables.

Uploaded by

Rukmaninambi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Statistics Introduction

The document discusses the mtcars data set in R, which contains information on 32 cars. It describes that the data set has 32 observations and 11 variables, and provides definitions of the variables. Examples are given to find the dimensions, variable names, row names, and print, sort, and analyze values of variables.

Uploaded by

Rukmaninambi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Statistics Introduction

Statistics is the science of analyzing, reviewing and conclude data.

Some basic statistical numbers include:

 Mean, median and mode


 Minimum and maximum value
 Percentiles
 Variance and Standard Deviation
 Covariance and Correlation
 Probability distributions

The R language was developed by two statisticians. It has many built-in


functionalities, in addition to libraries for the exact purpose of statistical analysis.

R Data Set
Data Set
A data set is a collection of data, often presented in a table.

There is a popular built-in data set in R called "mtcars" (Motor Trend Car Road
Tests), which is retrieved from the 1974 Motor Trend US Magazine.

In the examples below (and for the next chapters), we will use the mtcars data set,
for statistical purposes:

Example
# Print the mtcars data set
mtcars

Result:

mpg cyl disp hp drat wt qsec vs am gear


carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4
4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4
4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4
1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3
1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3
2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3
1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3
4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4
2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4
2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4
4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4
4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3
3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3
3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3
3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3
4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3
4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3
4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4
1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4
2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4
1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3
1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3
2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3
2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3
4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3
2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4
1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5
2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5
2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5
4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5
6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5
8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4
2
Try it Yourself »
Information About the Data Set
You can use the question mark (?) to get information about the mtcars data set:

Example
# Use the question mark to get information about the data set

?mtcars

Result:

mtcars {datasets} R Documentation

Motor Trend Car Road Tests


Description
The data was extracted from the 1974 Motor Trend US magazine, and comprises
fuel consumption and 10 aspects of automobile design and performance for 32
automobiles (1973-74 models).

Usage
mtcars

Format
A data frame with 32 observations on 11 (numeric) variables.

[, 1] mpg Miles/(US) gallon

[, 2] cyl Number of cylinders

[, 3] disp Displacement (cu.in.)

[, 4] hp Gross horsepower

[, 5] drat Rear axle ratio

[, 6] wt Weight (1000 lbs)

[, 7] qsec 1/4 mile time

[, 8] vs Engine (0 = V-shaped, 1 = straight)

[, 9] am Transmission (0 = automatic, 1 = manual)

[,10] gear Number of forward gears


[,11] carb Number of carburetors

Note
Henderson and Velleman (1981) comment in a footnote to Table 1: 'Hocking
[original transcriber]'s noncrucial coding of the Mazda's rotary engine as a straight
six-cylinder engine and the Porsche's flat engine as a V engine, as well as the
inclusion of the diesel Mercedes 240D, have been retained to enable direct
comparisons to be made with previous analyses.'

Source
Henderson and Velleman (1981), Building multiple regression models
interactively. Biometrics, 37, 391-411.

Examples
require(graphics)
pairs(mtcars, main = "mtcars data", gap = 1/4)
coplot(mpg ~ disp | as.factor(cyl), data = mtcars,
panel = panel.smooth, rows = 1)
## possibly more meaningful, e.g., for summary() or bivariate plots:
mtcars2 <- within(mtcars, {
vs <- factor(vs, labels = c("V", "S"))
am <- factor(am, labels = c("automatic", "manual"))
cyl <- ordered(cyl)
gear <- ordered(gear)
carb <- ordered(carb)
})
summary(mtcars2)
Try it Yourself »

ADVERTISEMENT

Get Information
Use the dim() function to find the dimensions of the data set, and
the names() function to view the names of the variables:

Example
Data_Cars <- mtcars # create a variable of the mtcars data set for better
organization

# Use dim() to find the dimension of the data set


dim(Data_Cars)

# Use names() to find the names of the variables from the data set
names(Data_Cars)

Result:

[1] 32 11
[1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am"
"gear"
[11] "carb"
Try it Yourself »

Use the rownames() function to get the name of each row in the first column, which is
the name of each car:

Example
Data_Cars <- mtcars

rownames(Data_Cars)

Result:

[1] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710"


[4] "Hornet 4 Drive" "Hornet Sportabout" "Valiant"
[7] "Duster 360" "Merc 240D" "Merc 230"
[10] "Merc 280" "Merc 280C" "Merc 450SE"
[13] "Merc 450SL" "Merc 450SLC" "Cadillac Fleetwood"
[16] "Lincoln Continental" "Chrysler Imperial" "Fiat 128"
[19] "Honda Civic" "Toyota Corolla" "Toyota Corona"
[22] "Dodge Challenger" "AMC Javelin" "Camaro Z28"
[25] "Pontiac Firebird" "Fiat X1-9" "Porsche 914-2"
[28] "Lotus Europa" "Ford Pantera L" "Ferrari Dino"
[31] "Maserati Bora" "Volvo 142E"
Try it Yourself »

From the examples above, we have found out that the data set
has 32 observations (Mazda RX4, Mazda RX4 Wag, Datsun 710, etc)
and 11 variables (mpg, cyl, disp, etc).

A variable is defined as something that can be measured or counted.

Here is a brief explanation of the variables from the mtcars data set:

Variable Name Description

mpg Miles/(US) Gallon


cyl Number of cylinders

disp Displacement

hp Gross horsepower

drat Rear axle ratio

wt Weight (1000 lbs)

qsec 1/4 mile time

vs Engine (0 = V-shaped, 1 = straight)

am Transmission (0 = automatic, 1 = manual)

gear Number of forward gears

carb Number of carburetors

Print Variable Values


If you want to print all values that belong to a variable, access the data frame by
using the $ sign, and the name of the variable (for example cyl (cylinders)):
Example
Data_Cars <- mtcars

Data_Cars$cyl

Result:

[1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
Try it Yourself »

Sort Variable Values


To sort the values, use the sort() function:

Example
Data_Cars <- mtcars

sort(Data_Cars$cyl)

Result:

[1] 4 4 4 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 8 8 8 8 8 8 8 8 8 8 8 8 8 8
Try it Yourself »

From the examples above, we see that most cars have 4 and 8 cylinders.

Analyzing the Data


Now that we have some information about the data set, we can start to analyze it
with some statistical numbers.

For example, we can use the summary() function to get a statistical summary of the
data:

Example
Data_Cars <- mtcars

summary(Data_Cars)

The summary() function returns six statistical numbers for each variable:
 Min
 First quantile (percentile)
 Median
 Mean
 Third quantile (percentile)
 Max

OutPut:

Information About the Data Set


You can use the question mark (?) to get information about the mtcars data set:

Result:

You might also like