R Data Set


Data Set

A data set is a set of data, usually presented in a table.

There is a popular built-in data set on R called "mtcars" (Motor Trend Car Road Tests), available in 1974 Motor Trend US Magazine.

In the examples below (and in the following chapters), we will use a data set for mtcars, for mathematical purposes:


Example
# Print the mtcars data set
mtcars

Result
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
    Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
    Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
    Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
    Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
    Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
    Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
    Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
    Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
    Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
    Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
    Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
    Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
    Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
    Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
    Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
    Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
    Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
    Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
    Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
    Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
    Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
    AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
    Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
    Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
    Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
    Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
    Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
    Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
    Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
    Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
    Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
    


Information About the Data Set

You can use the question mark (?) To get information about a mtcars data set:


Example
# Use the question mark to get information about the data set

?mtcars

Result
mtcars {datasets}R Documentation

Motor Trend Car Road Tests

Description

The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).

Usage

mtcars

Format

A data frame with 32 observations on 11 (numeric) variables.

[, 1] mpg Miles/(US) gallon
[, 2] cyl Number of cylinders
[, 3] disp Displacement (cu.in.)
[, 4] hp Gross horsepower
[, 5] drat Rear axle ratio
[, 6] wt Weight (1000 lbs)
[, 7] qsec 1/4 mile time
[, 8] vs Engine (0 = V-shaped, 1 = straight)
[, 9] am Transmission (0 = automatic, 1 = manual)
[,10] gear Number of forward gears
[,11] carb Number of carburetors

Note

Henderson and Velleman (1981) comment in a footnote to Table 1: 'Hocking [original transcriber]'s noncrucial coding of the Mazda's rotary engine as a straight six-cylinder engine and the Porsche's flat engine as a V engine, as well as the inclusion of the diesel Mercedes 240D, have been retained to enable direct comparisons to be made with previous analyses.'

Source

Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391-411.

Examples

require(graphics)
    pairs(mtcars, main = "mtcars data", gap = 1/4)
    coplot(mpg ~ disp | as.factor(cyl), data = mtcars,
           panel = panel.smooth, rows = 1)
    ## possibly more meaningful, e.g., for summary() or bivariate plots:
    mtcars2 <- within(mtcars, {
       vs <- factor(vs, labels = c("V", "S"))
       am <- factor(am, labels = c("automatic", "manual"))
       cyl  <- ordered(cyl)
       gear <- ordered(gear)
       carb <- ordered(carb)
    })
    summary(mtcars2)
    


Get Information

Use the dim() function to determine the size of the data set, and the function of the names() to view dynamic words:


Example
Data_Cars <- mtcars # create a variable of the mtcars data set for better organization

# Use dim() to find the dimension of the data set
dim(Data_Cars)

# Use names() to find the names of the variables from the data set
names(Data_Cars)

Result
[1] 32 11
    [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
   [11] "carb"
   

Use the rownames() function to find the name of each row in the first column, which is the name of each car:


Example
Data_Cars <- mtcars

rownames(Data_Cars)

Result
 [1] "Mazda RX4"           "Mazda RX4 Wag"       "Datsun 710"         
    [4] "Hornet 4 Drive"      "Hornet Sportabout"   "Valiant"            
    [7] "Duster 360"          "Merc 240D"           "Merc 230"           
   [10] "Merc 280"            "Merc 280C"           "Merc 450SE"         
   [13] "Merc 450SL"          "Merc 450SLC"         "Cadillac Fleetwood" 
   [16] "Lincoln Continental" "Chrysler Imperial"   "Fiat 128"           
   [19] "Honda Civic"         "Toyota Corolla"      "Toyota Corona"      
   [22] "Dodge Challenger"    "AMC Javelin"         "Camaro Z28"         
   [25] "Pontiac Firebird"    "Fiat X1-9"           "Porsche 914-2"      
   [28] "Lotus Europa"        "Ford Pantera L"      "Ferrari Dino"       
   [31] "Maserati Bora"       "Volvo 142E"         

From the examples above, we found that the data set has 32 variables (Mazda RX4, Mazda RX4 Wag, Datsun 710, etc.) and 11 variables (mpg, cyl, disp, etc.).

Variables are defined as something that can be measured or calculated.

Here is a brief description of the variations from the mtcars data set:

Variable Name Description
mpg Miles/(US) Gallon
cyl Number of cylinders
disp Displacement
hp Gross horsepower
drat Rear axle ratio
wt Weight (1000 lbs)
qsec 1/4 mile time
vs Engine (0 = V-shaped, 1 = straight)
am Transmission (0 = automatic, 1 = manual)
gear Number of forward gears
carb Number of carburetors


Print Variable Value

If you want to print all the variable values, access the data frame using the $ symbol, and the variable name (for example cyl (cylinders)):


Example
Data_Cars <- mtcars

Data_Cars$cyl

Result
 [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4


Sort Variable Values

To sort values, use the sort() function:


Example
Data_Cars <- mtcars

sort(Data_Cars$cyl)

Result
 [1] 4 4 4 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 8 8 8 8 8 8 8 8 8 8 8 8 8 8

In the examples above, we see that most cars have 4 and 8 cylinders.



Analyzing the Data

Now that we have some information about the data set, we can begin to analyze it with other statistical numbers.

For example, we can use the summary() function to obtain a mathematical summary:


Example
Data_Cars <- mtcars

summary(Data_Cars)

Don't worry if you do not understand the numbers coming out. You will know better soon.

The summary() function returns six statistical numbers for each variable:

  • Min
  • First quantile (percentile)
  • Median
  • Mean
  • Third quantile (percentile)
  • Max

We will cover them all, along with other mathematical numbers in the following chapters.