Practice Set: Objects Revisited
Data.frames
A data.frame is basically a spreadsheet. It’s 2-dimensional (rows & columns). Remember our starwars
dataset?
name | height | mass | sex | homeworld | species |
---|---|---|---|---|---|
Luke Skywalker | 172 | 77.0 | male | Tatooine | Human |
C-3PO | 167 | 75.0 | none | Tatooine | Droid |
R2-D2 | 96 | 32.0 | none | Naboo | Droid |
Darth Vader | 202 | 136.0 | male | Tatooine | Human |
Leia Organa | 150 | 49.0 | female | Alderaan | Human |
Obi-Wan Kenobi | 182 | 77.0 | male | Stewjon | Human |
Chewbacca | 228 | 112.0 | male | Kashyyyk | Wookiee |
Han Solo | 180 | 80.0 | male | Corellia | Human |
Yoda | 66 | 17.0 | male | NA | Yoda’s species |
Boba Fett | 183 | 78.2 | male | Kamino | Human |
Each row has 6
pieces of information and each column has 10
observations.
Let’s make our own data.frame!
For the most part, you will be importing your data from somewhere else. We’ll cover how to import data shortly in topic #7 of “The Basics”.
Making a data.frame has the following formatting (or syntax):
newDf <- data.frame(vector1,
vector2,
vector3)
☝️ Here we have:
- What we want to name our data.frame:
newDf
in blue - The actual words
data.frame
in red - The vectors we want included in yellow
In order to effectively create a new data.frame from multiple vectors, each vector should have the same number of items! If it doesn’t, R will ususally fill it in with the missing variable class NA
.
Remember these variables?
ids <- c("ID1", "ID2", "ID3", "ID4", "ID5") # character
state <- c("Missouri", "Iowa", "Missouri", "Idaho", "Maine") # character/factor
smoker <- c(TRUE, FALSE, FALSE, TRUE, FALSE) # logical
age <- c(20, 18, 32, 25, 25) # numeric
Let’s turn them into a data.frame.
demographics <- data.frame(ids,
state,
smoker,
age)
Now that it’s a data.frame called demographics
, it looks like this:
ids | state | smoker | age |
---|---|---|---|
ID1 | Missouri | TRUE | 20 |
ID2 | Iowa | FALSE | 18 |
ID3 | Missouri | FALSE | 32 |
ID4 | Idaho | TRUE | 25 |
ID5 | Maine | FALSE | 25 |
If you’re creating a new data.frame from vectors, you can easily adjust the column names.
demographics <- data.frame(SubjectID = ids,
Location = state,
SmokerTF = smoker,
SubjectAge = age)
SubjectID | Location | SmokerTF | SubjectAge |
---|---|---|---|
ID1 | Missouri | TRUE | 20 |
ID2 | Iowa | FALSE | 18 |
ID3 | Missouri | FALSE | 32 |
ID4 | Idaho | TRUE | 25 |
ID5 | Maine | FALSE | 25 |
You try! Make a data.frame from the following SAT/ACT variables we’ve been working with. Call the entire data.frame scores
. Include the following variables and make sure they have the following as their column names:
- Age
- SAT_Verbal
- SAT_Quant
- SAT_Total
- SAT_Scaled
- ACT
- Avg_Scores
Indexing Data.frames
In our last practice session, we went over indexing a 1-dimensional vector. For data.frames, the process is very similar, but now we have to account for 2 dimensions!
Let’s take a look back atdemographics
SubjectID | Location | SmokerTF | SubjectAge |
---|---|---|---|
ID1 | Missouri | TRUE | 20 |
ID2 | Iowa | FALSE | 18 |
ID3 | Missouri | FALSE | 32 |
ID4 | Idaho | TRUE | 25 |
ID5 | Maine | FALSE | 25 |
What if I wanted the 4th row, 3rd column?
demographics[4,3]
## [1] TRUE
How about the location of subjects 1 through 3?
demographics[1:3, 2]
## [1] "Missouri" "Iowa" "Missouri"
If you want all of either rows or columns, just leave it blank. But don’t forget the comma!
# all the columns for row 2
demographics[2,]
## SubjectID Location SmokerTF SubjectAge
## 2 ID2 Iowa FALSE 18
# all the rows of column 4
demographics[,4]
## [1] 20 18 32 25 25
Earlier in this practice, I asked you to run the line head(scores)
. Using head()
is just a shortcut for scores[1:6,]
! head(scores)
is a nice way to very quickly view a data.frame (especially really big data.frames!)
What if you don’t know the column number?
Use the $
sign! This is really, really, REALLY helpful! Plus, autocomplete is magical.
demographics$Location
## [1] "Missouri" "Iowa" "Missouri" "Idaho" "Maine"
You try! Let’s practice using the empire
dataset. (If you’ve forgotten what it looks like, scroll to the top of this page) Follow the directions above each code chunk!