Practice Set: Objects Revisited

Data.frames

A data.frame is basically a spreadsheet. It’s 2-dimensional (rows & columns). Remember our starwars dataset?

name height mass sex homeworld species
Luke Skywalker 172 77.0 male Tatooine Human
C-3PO 167 75.0 none Tatooine Droid
R2-D2 96 32.0 none Naboo Droid
Darth Vader 202 136.0 male Tatooine Human
Leia Organa 150 49.0 female Alderaan Human
Obi-Wan Kenobi 182 77.0 male Stewjon Human
Chewbacca 228 112.0 male Kashyyyk Wookiee
Han Solo 180 80.0 male Corellia Human
Yoda 66 17.0 male NA Yoda’s species
Boba Fett 183 78.2 male Kamino Human

Each row has 6 pieces of information and each column has 10 observations.

Let’s make our own data.frame!

For the most part, you will be importing your data from somewhere else. We’ll cover how to import data shortly in topic #7 of “The Basics”.

Making a data.frame has the following formatting (or syntax):

newDf <- data.frame(vector1,
                    vector2,
                    vector3)

☝️ Here we have:

  1. What we want to name our data.frame: newDf in blue
  2. The actual words data.frame in red
  3. The vectors we want included in yellow

In order to effectively create a new data.frame from multiple vectors, each vector should have the same number of items! If it doesn’t, R will ususally fill it in with the missing variable class NA.

Remember these variables?

ids <- c("ID1", "ID2", "ID3", "ID4", "ID5") # character
state <- c("Missouri", "Iowa", "Missouri", "Idaho", "Maine") # character/factor
smoker <- c(TRUE, FALSE, FALSE, TRUE, FALSE) # logical
age <- c(20, 18, 32, 25, 25) # numeric

Let’s turn them into a data.frame.

demographics <- data.frame(ids,
                           state,
                           smoker,
                           age)
Now that it’s a data.frame called demographics, it looks like this:
ids state smoker age
ID1 Missouri TRUE 20
ID2 Iowa FALSE 18
ID3 Missouri FALSE 32
ID4 Idaho TRUE 25
ID5 Maine FALSE 25

If you’re creating a new data.frame from vectors, you can easily adjust the column names.

demographics <- data.frame(SubjectID = ids,
                           Location = state,
                           SmokerTF = smoker,
                           SubjectAge = age)
SubjectID Location SmokerTF SubjectAge
ID1 Missouri TRUE 20
ID2 Iowa FALSE 18
ID3 Missouri FALSE 32
ID4 Idaho TRUE 25
ID5 Maine FALSE 25

You try! Make a data.frame from the following SAT/ACT variables we’ve been working with. Call the entire data.frame scores. Include the following variables and make sure they have the following as their column names:

  • Age
  • SAT_Verbal
  • SAT_Quant
  • SAT_Total
  • SAT_Scaled
  • ACT
  • Avg_Scores

Indexing Data.frames

In our last practice session, we went over indexing a 1-dimensional vector. For data.frames, the process is very similar, but now we have to account for 2 dimensions!

Let’s take a look back at demographics
SubjectID Location SmokerTF SubjectAge
ID1 Missouri TRUE 20
ID2 Iowa FALSE 18
ID3 Missouri FALSE 32
ID4 Idaho TRUE 25
ID5 Maine FALSE 25

What if I wanted the 4th row, 3rd column?

demographics[4,3]
## [1] TRUE

How about the location of subjects 1 through 3?

demographics[1:3, 2]
## [1] "Missouri" "Iowa"     "Missouri"

If you want all of either rows or columns, just leave it blank. But don’t forget the comma!

# all the columns for row 2
demographics[2,]
##   SubjectID Location SmokerTF SubjectAge
## 2       ID2     Iowa    FALSE         18
# all the rows of column 4
demographics[,4]
## [1] 20 18 32 25 25

Earlier in this practice, I asked you to run the line head(scores). Using head() is just a shortcut for scores[1:6,]! head(scores) is a nice way to very quickly view a data.frame (especially really big data.frames!)

What if you don’t know the column number?

Use the $ sign! This is really, really, REALLY helpful! Plus, autocomplete is magical.

demographics$Location
## [1] "Missouri" "Iowa"     "Missouri" "Idaho"    "Maine"

You try! Let’s practice using the empire dataset. (If you’ve forgotten what it looks like, scroll to the top of this page) Follow the directions above each code chunk!

You’re chugging along and doing wonderfully!