Practice Set: Objects Revisited

Data.frames
Indexing Data.frames

Data.frames

A data.frame is basically a spreadsheet. It’s 2-dimensional (rows & columns). Remember our starwars dataset?

name	height	mass	sex	homeworld	species
Luke Skywalker	172	77.0	male	Tatooine	Human
C-3PO	167	75.0	none	Tatooine	Droid
R2-D2	96	32.0	none	Naboo	Droid
Darth Vader	202	136.0	male	Tatooine	Human
Leia Organa	150	49.0	female	Alderaan	Human
Obi-Wan Kenobi	182	77.0	male	Stewjon	Human
Chewbacca	228	112.0	male	Kashyyyk	Wookiee
Han Solo	180	80.0	male	Corellia	Human
Yoda	66	17.0	male	NA	Yoda’s species
Boba Fett	183	78.2	male	Kamino	Human

Each row has 6 pieces of information and each column has 10 observations.

Let’s make our own data.frame!

For the most part, you will be importing your data from somewhere else. We’ll cover how to import data shortly in topic #7 of “The Basics”.

Making a data.frame has the following formatting (or syntax):

newDf <- data.frame(vector1,
                    vector2,
                    vector3)

☝️ Here we have:

What we want to name our data.frame: newDf in blue
The actual words data.frame in red
The vectors we want included in yellow

In order to effectively create a new data.frame from multiple vectors, each vector should have the same number of items! If it doesn’t, R will ususally fill it in with the missing variable class NA.

Remember these variables?

ids <- c("ID1", "ID2", "ID3", "ID4", "ID5") # character
state <- c("Missouri", "Iowa", "Missouri", "Idaho", "Maine") # character/factor
smoker <- c(TRUE, FALSE, FALSE, TRUE, FALSE) # logical
age <- c(20, 18, 32, 25, 25) # numeric

Let’s turn them into a data.frame.

demographics <- data.frame(ids,
                           state,
                           smoker,
                           age)

Now that it’s a data.frame called demographics, it looks like this:

ids	state	smoker	age
ID1	Missouri	TRUE	20
ID2	Iowa	FALSE	18
ID3	Missouri	FALSE	32
ID4	Idaho	TRUE	25
ID5	Maine	FALSE	25

If you’re creating a new data.frame from vectors, you can easily adjust the column names.

demographics <- data.frame(SubjectID = ids,
                           Location = state,
                           SmokerTF = smoker,
                           SubjectAge = age)

SubjectID	Location	SmokerTF	SubjectAge
ID1	Missouri	TRUE	20
ID2	Iowa	FALSE	18
ID3	Missouri	FALSE	32
ID4	Idaho	TRUE	25
ID5	Maine	FALSE	25

You try! Make a data.frame from the following SAT/ACT variables we’ve been working with. Call the entire data.frame scores. Include the following variables and make sure they have the following as their column names:

Age
SAT_Verbal
SAT_Quant
SAT_Total
SAT_Scaled
ACT
Avg_Scores

Indexing Data.frames

In our last practice session, we went over indexing a 1-dimensional vector. For data.frames, the process is very similar, but now we have to account for 2 dimensions!

Let’s take a look back at demographics

SubjectID	Location	SmokerTF	SubjectAge
ID1	Missouri	TRUE	20
ID2	Iowa	FALSE	18
ID3	Missouri	FALSE	32
ID4	Idaho	TRUE	25
ID5	Maine	FALSE	25

What if I wanted the 4th row, 3rd column?

demographics[4,3]

## [1] TRUE

How about the location of subjects 1 through 3?

demographics[1:3, 2]

## [1] "Missouri" "Iowa"     "Missouri"

If you want all of either rows or columns, just leave it blank. But don’t forget the comma!

# all the columns for row 2
demographics[2,]

##   SubjectID Location SmokerTF SubjectAge
## 2       ID2     Iowa    FALSE         18

# all the rows of column 4
demographics[,4]

## [1] 20 18 32 25 25

Earlier in this practice, I asked you to run the line head(scores). Using head() is just a shortcut for scores[1:6,]! head(scores) is a nice way to very quickly view a data.frame (especially really big data.frames!)

What if you don’t know the column number?

Use the $ sign! This is really, really, REALLY helpful! Plus, autocomplete is magical.

demographics$Location

## [1] "Missouri" "Iowa"     "Missouri" "Idaho"    "Maine"

You try! Let’s practice using the empire dataset. (If you’ve forgotten what it looks like, scroll to the top of this page) Follow the directions above each code chunk!

Practice Set: Objects Revisited

Data.frames

Indexing Data.frames

What if you don’t know the column number?

You’re chugging along and doing wonderfully!