Statistics Introduction
Statistics Introduction
R Data Set
Data Set
A data set is a collection of data, often presented in a table.
There is a popular built-in data set in R called "mtcars" (Motor Trend Car Road
Tests), which is retrieved from the 1974 Motor Trend US Magazine.
In the examples below (and for the next chapters), we will use the mtcars data set,
for statistical purposes:
Example
# Print the mtcars data set
mtcars
Result:
Example
# Use the question mark to get information about the data set
?mtcars
Result:
Usage
mtcars
Format
A data frame with 32 observations on 11 (numeric) variables.
[, 4] hp Gross horsepower
Note
Henderson and Velleman (1981) comment in a footnote to Table 1: 'Hocking
[original transcriber]'s noncrucial coding of the Mazda's rotary engine as a straight
six-cylinder engine and the Porsche's flat engine as a V engine, as well as the
inclusion of the diesel Mercedes 240D, have been retained to enable direct
comparisons to be made with previous analyses.'
Source
Henderson and Velleman (1981), Building multiple regression models
interactively. Biometrics, 37, 391-411.
Examples
require(graphics)
pairs(mtcars, main = "mtcars data", gap = 1/4)
coplot(mpg ~ disp | as.factor(cyl), data = mtcars,
panel = panel.smooth, rows = 1)
## possibly more meaningful, e.g., for summary() or bivariate plots:
mtcars2 <- within(mtcars, {
vs <- factor(vs, labels = c("V", "S"))
am <- factor(am, labels = c("automatic", "manual"))
cyl <- ordered(cyl)
gear <- ordered(gear)
carb <- ordered(carb)
})
summary(mtcars2)
Try it Yourself »
ADVERTISEMENT
Get Information
Use the dim() function to find the dimensions of the data set, and
the names() function to view the names of the variables:
Example
Data_Cars <- mtcars # create a variable of the mtcars data set for better
organization
# Use names() to find the names of the variables from the data set
names(Data_Cars)
Result:
[1] 32 11
[1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am"
"gear"
[11] "carb"
Try it Yourself »
Use the rownames() function to get the name of each row in the first column, which is
the name of each car:
Example
Data_Cars <- mtcars
rownames(Data_Cars)
Result:
From the examples above, we have found out that the data set
has 32 observations (Mazda RX4, Mazda RX4 Wag, Datsun 710, etc)
and 11 variables (mpg, cyl, disp, etc).
Here is a brief explanation of the variables from the mtcars data set:
disp Displacement
hp Gross horsepower
Data_Cars$cyl
Result:
[1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
Try it Yourself »
Example
Data_Cars <- mtcars
sort(Data_Cars$cyl)
Result:
[1] 4 4 4 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 8 8 8 8 8 8 8 8 8 8 8 8 8 8
Try it Yourself »
From the examples above, we see that most cars have 4 and 8 cylinders.
For example, we can use the summary() function to get a statistical summary of the
data:
Example
Data_Cars <- mtcars
summary(Data_Cars)
The summary() function returns six statistical numbers for each variable:
Min
First quantile (percentile)
Median
Mean
Third quantile (percentile)
Max
OutPut:
Result: