Practice Set: Functions

Logical Operators

Logical operators are crucial for understanding any programming language, not just R. Effective use of logic allows you to perform really complex computations with ease. The more comfortable you feel in using these, the better.

For the time being, let’s start simple. Check out the R&R slides for more complicated logical operator statements.

What are the operators?

  • == equality
  • != inequality (! is read as not)
  • > greater than
  • >= greater than or equal to
  • < less than
  • <= less than or equal to

Logical operators test whether a statement is TRUE or FALSE.

Let’s revisit our demographics data.frame
SubjectID Location SmokerTF SubjectAge
ID1 Missouri TRUE 20
ID2 Iowa FALSE 18
ID3 Missouri FALSE 32
ID4 Idaho TRUE 25
ID5 Maine FALSE 25

TRUE or FALSE: Is anyone from the state of Missouri? To answer this with code, we could do:

demographics$Location == "Missouri"
## [1]  TRUE FALSE  TRUE FALSE FALSE

The result is a vector of TRUE and FALSE – one for each item in demographics$Location. The 1st and 3rd elements are TRUE, so those people are from “Missouri”. The others are no.

Spelling, capitalization, and quotation marks count!

These things (and missing/extra parentheses) account for at least 75% of all your errors! 🙀 If you’re evaluating a number, you do NOT need quotation marks. If it’s a character string, you do. For example:

# no quotes around the character string
demographics$Location == Missouri
## Error in eval(expr, envir, enclos): object 'Missouri' not found
# capitalization
demographics$Location == "missouri"
## [1] FALSE FALSE FALSE FALSE FALSE

☝️ Note that this doesn’t throw and error, but everything is returned as FALSE. But we know that’s not true…

# spelling
demographics$Location == "Missour-uh"
## [1] FALSE FALSE FALSE FALSE FALSE

Same as the capitalization thing.

Be careful with = vs ==!

  • == is a logical operator
  • = is for assignment. It’s the same thing as <-

Look at the example below.

Logical operator usage…

# Is each element in the SubjectID column equal to the character
# string "ID4"?
demographics$SubjectID == "ID4"
## [1] FALSE FALSE FALSE  TRUE FALSE

Assignment usage…

# Make each element in the SubjectID column equal to the character
# string "ID4"
demographics$SubjectID = "ID4"

# now view the column
demographics$SubjectID
## [1] "ID4" "ID4" "ID4" "ID4" "ID4"

Now that we’ve gone through this demo, let’s change the SubjectID column back to what it’s supposed to look like.

demographics$SubjectID <- ids

# view the column to double check
demographics$SubjectID
## [1] "ID1" "ID2" "ID3" "ID4" "ID5"

Quick Quiz

Functions

Functions are the heart and soul of R, and are especially powerful. They are the verbs of a programming language because the act on the objects. Each function has:

  1. Name that is unique
  2. Arguments - What is the verb acting on? How should the function behave?
  3. Output or the result of the function (can be anything!) that can be stored to another object

You’ve already seen some functions in action! You just didn’t know it, yet!

c() is the combine/concatenate function:

  • It’s name is c
  • It’s input is whatever you’re trying to combine
  • It’s output is the vector that has the elements combined

data.frame() is the function for making a data.frame:

  • It’s name is data.frame
  • It’s input is a series of vectors that all have the same length
  • It’s output is a nicely formatted data.frame

class() is the function for finding out the class of an object:

  • It’s name is class
  • It’s input is the object you are examining
  • It’s output is a character telling you what class the object is

Most functions use more arguments than what we’ve seen above. And those arguments have names.

Use the argument names!

Autocomplete will help so it’s not so tedious. See the lecture slides for the example with the round() function. When you have multiple arguments, you separate them with a comma function(argument 1 = something, argument 2 = something else).

You try! Follow the directions for each chunk below to create different sequences of numbers.

Help Pages

One of the biggest mistakes I see new coders make is that they pull up the help pages, but don’t actually read them. TAKE THE TIME TO READ THEM! There is a lot of really, really helpful information in there. And they are all organized the same way. See the lecture slides for a breakdown of the Help pages.

You can always look at help by typing ?functionName. For example, ?cbind will bring up the help documentation for the function cbind().

You try! Follow the directions below each block to ultimately create a plot of standardized variables from our empire dataset. Note that this exercise is intentionally a bit harder than previous exercises – that’s because you’re getting better!

FYI: In this exercise, we made a very basic scatter plot. But it’s not very pretty. We will talk a lot about making really pretty plots. You can check out the very basics in the Stats & Plot section, and then we’ll go into a lot more detail when we get to the Data Visualization section.

Congratulations! You’re making great progress and are well on your way to being a bad@$% programmer!

Examples of a few functioins

In the corresponding lecture, I listed a couple functions that you might use quite frequently. Below are examples of how to use these functions. Note that you do not need to go through all of them at this time. However, I would encourage you to come back and take a look at some point.

length(), nrow(), ncol()

  • length() is the same as getting the last element’s index position
  • nrow() and ncol() tell you how many rows and columns there are in a data.frame, respectively
length(demographics$SubjectID)
## [1] 5
nrow(demographics)
## [1] 5
ncol(demographics)
## [1] 4

Note that these don’t work properly for any object

# length on a 2-d object just gives the 2nd dimension (columns)
# lots of people get this confused and think it should return the
# number of rows!
length(demographics)
## [1] 4
# nrow and ncol of a 1-d vector don't make sense...
# there is no such thing as columns or rows in a vector
nrow(demographics$SubjectID)
## NULL
ncol(demographics$SubjectID)
## NULL

factor() and as.factor()

Both of these convert a character vector into a factor. Factors are a special case of character where there is inherent meaning/grouping. Categorical variables.

(Note: For the life of me, I can’t figure out the difference between factor() and as.factor() 🤷‍♀ so you can think of them as interchangable.)

# Our 'Location' variable consists of states. We might want
# to think of this as a categorical variable, where each state
# is a meaingfully different category from the next state.

# Don't forget to re-assign the variable so that it stores your new
# factor!
demographics$Location <- factor(demographics$Location)
demographics$Location
## [1] Missouri Iowa     Missouri Idaho    Maine   
## Levels: Idaho Iowa Maine Missouri
# To double check, use the `class()` function
class(demographics$Location)
## [1] "factor"

The LEVELS of a factor are all the different categories. Here, we have 4 states. R will always put these in alphabetical order unless you tell it otherwise (hint: you can change this by modifying one of the arguments).

table() for quick counts

Especially good for factors & logicals!

# how many states are in each category?
table(demographics$Location)
## 
##    Idaho     Iowa    Maine Missouri 
##        1        1        1        2
# states x smoker table
table(demographics$Location, demographics$SmokerTF)
##           
##            FALSE TRUE
##   Idaho        0    1
##   Iowa         1    0
##   Maine        1    0
##   Missouri     1    1

cbind() and rbind() for adding columns and rows, respectively

These are for if you have a vector that you want to add to a data.frame. You should make sure there’s the same number of items – which changes depending on if you’re adding a row or a column!

Let’s add a column consisting of survey scores, on a scale of 1 to 10. Then we’ll add a row.

# first, let's add a column for surveys. there are currently
# 5 rows/observations, so we'll want 5 elements in this vector
survey <- c(6, 4, 10, 2, 8)

# now use cbind() to add it to your data.frame
# the 1st argument is the data.frame
# the 2nd argument is the thing you want to add to it
# (you're combining BOTH things)

# note that we re-assign the entire data.frame
demographics <- cbind(demographics, survey)

# print it out to view
demographics
##   SubjectID Location SmokerTF SubjectAge survey
## 1       ID1 Missouri     TRUE         20      6
## 2       ID2     Iowa    FALSE         18      4
## 3       ID3 Missouri    FALSE         32     10
## 4       ID4    Idaho     TRUE         25      2
## 5       ID5    Maine    FALSE         25      8
############

# now let's add a row. there are 5 columns, so we need to add an
# element that will match up with each column
newParticipant <- c("ID6", "Iowa", FALSE, 28, 5)

# now use rbind() to bind by rows
# the 1st argument is the data.frame
# the 2nd argument is the row you want to add to it. This makes it the LAST row
# (if you switched the order, it would become the FIRST row)
demographics <- rbind(demographics, newParticipant)

# print it out to view
demographics
##   SubjectID Location SmokerTF SubjectAge survey
## 1       ID1 Missouri     TRUE         20      6
## 2       ID2     Iowa    FALSE         18      4
## 3       ID3 Missouri    FALSE         32     10
## 4       ID4    Idaho     TRUE         25      2
## 5       ID5    Maine    FALSE         25      8
## 6       ID6     Iowa    FALSE         28      5

There is 1 slight issue with rbind(). Vectors will always take on the least specific class So newParticipant is considered a character vector because all of it’s elements could be thought of as characters, but not everything could be considered numeric or logical. This means that when we added it to the demographics data.frame, it made any column that wasn’t already a character into characters!

class(demographics$SubjectID)
## [1] "character"
class(demographics$Location) # factors are special characters
## [1] "factor"
class(demographics$SmokerTF)
## [1] "character"
class(demographics$SubjectAge)
## [1] "character"
class(demographics$survey)
## [1] "character"

However, the last 3 columns should NOT be characters. I change them back below. See if you can figure out what each function does.

demographics$SmokerTF <- as.logical(demographics$SmokerTF)
class(demographics$SmokerTF)
## [1] "logical"
demographics$SubjectAge <- as.numeric(demographics$SubjectAge)
class(demographics$SubjectAge)
## [1] "numeric"
demographics$survey <- as.numeric(demographics$survey)
class(demographics$survey)
## [1] "numeric"

(Ok for real this time…The End)