Lots of practice sets; letting it all come together
Some random tidbits to make your life easier
For this portion, we will use the midus.csv
dataset.
You should do the exercises in this tutorial locally (on your own computer).
Project Description: "The purpose of the study was to investigate the role of behavioral, psychological, and social factors in understanding age-related differences in physical and mental health."
Project Description: "The purpose of the study was to investigate the role of behavioral, psychological, and social factors in understanding age-related differences in physical and mental health." Variables available in this data file:
Open a new .R
script file
Load the following packages using code in your new .R
file:
Get
your w
orking d
irectory
If you haven't already, download midus.csv
from above. Drag and drop the file into a directory that you want to continue using for this class.
Import midus.csv
into R. You can keep the name midus.
Make sure that the import code ends up in your SCRIPT (.R
) file.
Save your new .R
script file in the same working directory as midus.csv
. You can name the script whatever you want.
Check the classes of your variables. Below is a list of each variable and the class they should be. Change the class of any that don't match what you see.
ID
should be a factorsex
should be a factorage
should be an integerBMI
should be numericphysical_health_self
& mental_health_self
should be integersself_esteem
, life_satisfaction
, and hostility
should all be numericheart_self
and heart_father
should be factorsSave your script again. What happens to the text on the tab in RStudio?
Sometimes, your Global Environment (top right) can fill up with stuff that you don't need anymore. You can clean this!
A) Delete EVERYTHING using the broom icon (next to the Import Dataset tab)
B) Switch to GRID view, check boxes of individual objects you DON'T want, then press the broom icon. Be sure to switch back to LIST view when finished
Now...
Clear your global environment, select your entire script file, and RUN it!
...what happens? what does this mean?
Indexing a vector (1 dimension):
midus$age[1]
1
gives us the 1st item in the age vector that is part of the midus data.frameIndexing a vector (1 dimension):
midus$age[1]
1
gives us the 1st item in the age vector that is part of the midus data.frameIndexing a data.frame (2 dimensions):
midus[1, 2:3]
Returns a value of TRUE
or FALSE
==
!=
>
>=
<
<=
equality
inequality
greater than
greater than or equal to
less than
less than or equal to
A = 3
- "A is an objec that stores the number 3."
A == 3
- "Is A equal to 3?"
You can combine logical operators to simultaneously see if multiple conditions have been met.
&
|
!
and
or
not
Is this true? AND Is this true?
(10 < 100 & 24 == 23 + 1)
## [1] TRUE
(5 > 4 & 5 > 10)
## [1] FALSE
Is this true? AND Is this true?
(10 < 100 & 24 == 23 + 1)
## [1] TRUE
(5 > 4 & 5 > 10)
## [1] FALSE
Is this true? OR Is this true?
(5 > 4 | 5 > 10)
## [1] TRUE
# Could also be something like:# (A == B | C != D)
Sometimes, you will want to perform functions on only some of your data points.
You can subset your data to identify subjects in a certain subgroup.
Use the subset()
function:
subset(x = empire, subset = (mass > 100))
## # A tibble: 2 x 6## name height mass sex homeworld species## <chr> <int> <dbl> <chr> <chr> <fct> ## 1 Darth Vader 202 136 male Tatooine Human ## 2 Chewbacca 228 112 male Kashyyyk Wookiee
Sometimes, you will want to perform functions on only some of your data points.
You can subset your data to identify subjects in a certain subgroup.
Use the subset()
function:
subset(x = empire, subset = (mass > 100))
## # A tibble: 2 x 6## name height mass sex homeworld species## <chr> <int> <dbl> <chr> <chr> <fct> ## 1 Darth Vader 202 136 male Tatooine Human ## 2 Chewbacca 228 112 male Kashyyyk Wookiee
Notes:
data =
or x =
argument, where you name your data.frame. Because you've named in within the function's arguments, you do not need to use the dollar sign $
to access the vector.( )
around mass > 100
are not necessary, but as logical expressions become more complicated, the extra parentheses can be helpful for keeping it straight. subset()
is the name of the function we're using, but one of it's arguments is also called subset =
Check the R
documentation!
?subset
When in doubt, you can always search the "Help" tab or the internet
Often, you can wrap functions within functions. This is called nesting.
Example:
round(x = sqrt(86), digits = 1)
## [1] 9.3
Often, you can wrap functions within functions. This is called nesting.
Example:
round(x = sqrt(86), digits = 1)
## [1] 9.3
How does it do this? Inner to outer!
sqrt(86)
--> 9.2736185 --> round(x = 9.2736185, digits = 1)
--> 9.3
midus[,9] <- rep(x = "Wave2", times = 40)midus$wave <- rep(x = "Wave2", times = 40)midus["wave"] <- rep(x = "Wave2", times = 40)
midus[,9] <- rep(x = "Wave2", times = 40)midus$wave <- rep(x = "Wave2", times = 40)midus["wave"] <- rep(x = "Wave2", times = 40)
midus <- midus[,-9]midus <- subset(midus, select = -wave)
midus[,9] <- rep(x = "Wave2", times = 40)midus$wave <- rep(x = "Wave2", times = 40)midus["wave"] <- rep(x = "Wave2", times = 40)
midus <- midus[,-9]midus <- subset(midus, select = -wave)
midus <- midus[-c(1:10),]midus <- subset(x = midus, subset = ID != "10175")
Use nested functions to create a subset
of the midus
data.frame which contains only participants who have self_esteem
scores above the mean
.
midus
mean
. Where can you look to find the arguments of the mean
function? There are MANY ways to deal with missing data
Decisions regarding missing data are best made by you and your colleagues/advisors since it will depend on your research question
For now, we're going to use listwise deletion aka using complete cases
To overwrite the midus
data.frame so that it only shows participants that have a data point for every variable, we can use the following line of code:
midus <- na.omit(midus)
Sometimes it's useful to better understand the dimensions of your data.frame. Comes up in regards to error messages! For more, details on this, see the "Examples of a few functions" section at the end of the 5: Functions practice set.
dim(empire) # for both dimensions
## [1] 10 6
nrow(empire) # for rows
## [1] 10
ncol(empire) # for columns
## [1] 6
So you've manipulated and cleaned your dataset, and now you want to save it...
write.csv(x = data.frame, file = "newFile.csv", row.names = FALSE)
Exploring data that is stored within a list obect (i.e., regression, ANOVA, t-test results) is really hard.
In these cases, I suggest pulling out the pieces of information that you actually care about, making it your own data.frame, and then writing out that data.frame. There are other ways, though (you'll need to Google it)!
recode()
function from the dplyr
packagegsub()
colnames(midus) <- gsub(pattern = "_", replacement = ".", x = colnames(midus))
colnames(midus) <- gsub(pattern = "_", replacement = "", x = colnames(midus))
meanAgeControls
MeanAgeControls
mean.age.controls
mean.age.controls.csv
)Get into good coding habits now, so you don't hate yourself later!
Pro tip:
What does this mean?
Your R
script should look something like this:
library(psych)library(dplyr)library(ggplot2)getwd()# set your working directory to the folder you need...if not there already# setwd("folder/with/midus.csv")midus <- read.csv("folder/midus.csv")# change 4 character variables into factors# see the practice for 5: Functions for more on how to do this!midus$ID <- factor(midus$ID)midus$sex <- factor(midus$sex)midus$heart_self <- factor(midus$heart_self)midus$heart_father <- factor(midus$heart_father)
Your code should look something like this:
midusMini <- subset(x = midus, subset = self_esteem > mean(self_esteem, na.rm = TRUE))
To look up the arguments of the mean()
function, use ?mean
. Many functions have an argument called na.rm
or similar to it that asks if you want the function to remove missing variables. Here, we're saying, "yes, please remove missing variables before calculating the mean"
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |