Practice Set: Indexing

Vectors Recap

Before diving into indexing, let’s make sure we’re 💯 good with vectors. In the previous practice set, you created the following 4 vectors:

ids <- c("ID1", "ID2", "ID3", "ID4", "ID5") # character
state <- c("Missouri", "Iowa", "Missouri", "Idaho", "Maine") # character/factor
smoker <- c(TRUE, FALSE, FALSE, TRUE, FALSE) # logical
age <- c(20, 18, 32, 25, 25) # numeric

You can do something to all elements of the vector:

# add 10 years to everyone's age
age <- age + 10
age
## [1] 30 28 42 35 35

If you have 2 vectors, these can be combined in a number of ways!

Let’s try to add the number 2 to the first person’s age, 4, to the second person’s age etc.

numbers <- c(2, 4, 6, 8, 10)

age + numbers
## [1] 32 32 48 43 45

You can also append one vector on to the next. Notice that to do this, we use c(). We’re concatenating or combining the two vectors.

newVector <- c(age, numbers)
newVector
##  [1] 30 28 42 35 35  2  4  6  8 10

If you have 2 vectors, but they are not the same length, R will recycle the shorter vector.

shortVector <- c(1000, 2000)

newVector + shortVector
##  [1] 1030 2028 1042 2035 1035 2002 1004 2006 1008 2010

If the longer vector is not a multiple of the shorter vector, the process still works but you’ll also get a warning message (notice how it still works, though):

age + shortVector
## Warning in age + shortVector: longer object length is not a multiple of shorter
## object length
## [1] 1030 2028 1042 2035 1035

You try! The following exercises are a modified version of the sat.act dataset (but broken up into vectors). The SAT & ACT are both standardized college admissions tests in the United States. The ACT technically has 4 parts, but only the overall score is reported here. The SAT is broken up into verbal and quantitative categories:

  • age in years
  • act contains scores from the ACT tests; range 1-36, norm ~20
  • satV contains the Verbal scores from the SAT; range 200-800, avg ~500
  • satQ contains the quantitative scores from the SAT; range 200-800, avg ~500

Follow the directions above each code chunk!

Indexing

Nice work! Let’s move on to practicing indexing, specifically with 1-dimensional vectors.

Think of the index as an address, and your entire vector as a street. Where on the street is the particular object you’re looking for? We want to use that object’s address to find it. To do so, we use the following format:

VECTOR[index]

For example, if I wanted to get the 4th item in the state vector, it would look something like this:

state[4]

## [1] "Idaho"

To check out more than 1 index:

  • If the indices you want are in a row:
    • Use the colon : to get items # through #
    • Ex: 3:5 reads as “items 3 through 5”
  • If they are NOT in a row:
    • Make a mini vector with c()
    • Ex: c(1, 4, 18) reads as “items 1, 4, and 18”

You try! Practice indexing on the same sat.act variables from above, including the variables you created! Follow the directions above each code chunk.

A Demonstration

Let’s say we wanted to find the act scores of everyone that was over the age of 35. The line of code below finds out who is over 35 years old, and stores it as it’s own vector called older. Let’s take a look:

# find people over 35 yrs old
older <- which(age > 35)
older
##  [1] 10 14 20 21 24 28 35 37 38 43 45 61 62 74 80 81 85 87 90 98

Look at the numbers that printed out. They are not the actual ages! We know this because we asked for people over 35 years old, and there are lots of numbers under 35 here. Our vector older now contains the indices of people over the age of 35!

The first item in older

older[1]
## [1] 10

…corresponds to an index in the age vector of someone over the age of 35. Let’s look at the first 15 people in the age vector.

age[1:15]
##  [1] 19 23 20 27 33 26 30 19 23 40 23 34 32 41 20

Notice that the first person over 35 is the 10th person. They are 40 years old. That is why the first number in older is “10”. It’s not the actual age of 40…it’s the “address” of the 40 year old within the age vector; it’s the index!

We can use this to our advantage! We wanted to find the act scores of everyone over 35 yrs old. Now that we have our older vector, we don’t need to manually type out the indices!

actOldScores <- act[older]
actOldScores
##  [1] 35 35 32 28 30 21 33 28 24 30 36 34 32 32 21 26 28 24 30 33

To verify, we said that the 1st person over the age of 35 was at index #10. Let’s look at the act score for index #10

act[10]
## [1] 35

That number (which happens to be 35!) matches the very first number in the actOldScores vector. Our code worked! actOldScores represents all the act scores of people over the age of 35.

FYI: R is kind of weird in that it starts indexing at 1. You should know that most other programming languages begin indexing at 0. R and MATLAB are the only languages I’m aware of that begin at 1. Normally, the first element in the vector has an index of 0. If you plan on also using other programming languages, like python 🐍, then this is something to be aware of.

Keep up the great work!

Sometimes, it feels like you’re just kind of guessing until you get it right. Coding is kind of like a sport. It takes some time and practice to become good. But by completing this practice set and continuing with this bootcamp, you’re on the right track!

If you feel lost at any point, please reach out!