Practice Set: Indexing
Vectors Recap
Before diving into indexing, let’s make sure we’re 💯 good with vectors. In the previous practice set, you created the following 4 vectors:
ids <- c("ID1", "ID2", "ID3", "ID4", "ID5") # character
state <- c("Missouri", "Iowa", "Missouri", "Idaho", "Maine") # character/factor
smoker <- c(TRUE, FALSE, FALSE, TRUE, FALSE) # logical
age <- c(20, 18, 32, 25, 25) # numeric
You can do something to all elements of the vector:
# add 10 years to everyone's age
age <- age + 10
age
## [1] 30 28 42 35 35
If you have 2 vectors, these can be combined in a number of ways!
Let’s try to add the number 2
to the first person’s age, 4
, to the second person’s age etc.
numbers <- c(2, 4, 6, 8, 10)
age + numbers
## [1] 32 32 48 43 45
You can also append one vector on to the next. Notice that to do this, we use c()
. We’re concatenating
or combining
the two vectors.
newVector <- c(age, numbers)
newVector
## [1] 30 28 42 35 35 2 4 6 8 10
If you have 2 vectors, but they are not the same length, R will recycle the shorter vector.
shortVector <- c(1000, 2000)
newVector + shortVector
## [1] 1030 2028 1042 2035 1035 2002 1004 2006 1008 2010
If the longer vector is not a multiple of the shorter vector, the process still works but you’ll also get a warning message (notice how it still works, though):
age + shortVector
## Warning in age + shortVector: longer object length is not a multiple of shorter
## object length
## [1] 1030 2028 1042 2035 1035
You try! The following exercises are a modified version of the sat.act
dataset (but broken up into vectors). The SAT & ACT are both standardized college admissions tests in the United States. The ACT technically has 4 parts, but only the overall score is reported here. The SAT is broken up into verbal and quantitative categories:
age
in yearsact
contains scores from the ACT tests; range 1-36, norm ~20satV
contains the Verbal scores from the SAT; range 200-800, avg ~500satQ
contains the quantitative scores from the SAT; range 200-800, avg ~500
Follow the directions above each code chunk!
Indexing
Nice work! Let’s move on to practicing indexing, specifically with 1-dimensional vectors.
Think of the index as an address, and your entire vector as a street. Where on the street is the particular object you’re looking for? We want to use that object’s address to find it. To do so, we use the following format:
VECTOR[index]
For example, if I wanted to get the 4th item in the state
vector, it would look something like this:
state[4]
## [1] "Idaho"
To check out more than 1 index:
- If the indices you want are in a row:
- Use the colon
:
to get items # through # - Ex:
3:5
reads as “items 3 through 5”
- Use the colon
- If they are NOT in a row:
- Make a mini vector with
c()
- Ex:
c(1, 4, 18)
reads as “items 1, 4, and 18”
- Make a mini vector with
You try! Practice indexing on the same sat.act
variables from above, including the variables you created! Follow the directions above each code chunk.
A Demonstration
Let’s say we wanted to find the act
scores of everyone that was over the age of 35. The line of code below finds out who is over 35 years old, and stores it as it’s own vector called older
. Let’s take a look:
# find people over 35 yrs old
older <- which(age > 35)
older
## [1] 10 14 20 21 24 28 35 37 38 43 45 61 62 74 80 81 85 87 90 98
Look at the numbers that printed out. They are not the actual ages! We know this because we asked for people over 35 years old, and there are lots of numbers under 35 here. Our vector older
now contains the indices of people over the age of 35!
The first item in older
…
older[1]
## [1] 10
…corresponds to an index in the age
vector of someone over the age of 35. Let’s look at the first 15 people in the age
vector.
age[1:15]
## [1] 19 23 20 27 33 26 30 19 23 40 23 34 32 41 20
Notice that the first person over 35 is the 10th person. They are 40 years old. That is why the first number in older
is “10”. It’s not the actual age of 40…it’s the “address” of the 40 year old within the age
vector; it’s the index!
We can use this to our advantage! We wanted to find the act
scores of everyone over 35 yrs old. Now that we have our older
vector, we don’t need to manually type out the indices!
actOldScores <- act[older]
actOldScores
## [1] 35 35 32 28 30 21 33 28 24 30 36 34 32 32 21 26 28 24 30 33
To verify, we said that the 1st person over the age of 35 was at index #10. Let’s look at the act
score for index #10
act[10]
## [1] 35
That number (which happens to be 35!) matches the very first number in the actOldScores
vector. Our code worked! actOldScores
represents all the act
scores of people over the age of 35.
FYI: R
is kind of weird in that it starts indexing at 1. You should know that most other programming languages begin indexing at 0. R
and MATLAB
are the only languages I’m aware of that begin at 1. Normally, the first element in the vector has an index of 0. If you plan on also using other programming languages, like python 🐍, then this is something to be aware of.
Keep up the great work!
Sometimes, it feels like you’re just kind of guessing until you get it right. Coding is kind of like a sport. It takes some time and practice to become good. But by completing this practice set and continuing with this bootcamp, you’re on the right track!
If you feel lost at any point, please reach out!