Assignment 3

Due by 11:30am on Monday, September 30, 2024

Testing Things

The first part is going to be a test run to see if a particular type of document works with your computer. It should, but we’re going to test it out here just in case. You do NOT need to submit the test file.

The following section requires the internet. Please make sure you have a stable connection before completing the exercise. PLEASE tell me as soon as possible if any errors come up. This will be a large part of what we will be doing next week, so if it’s not working, we need to get it worked out BEFORE you continue on to the Reproducibility section.

PLEASE READ ALL OF THE STEPS BELOW BEFORE ACTUALLY TRYING IT. Follow these steps:

Step 1

In your console set your working directory somewhere. It doesn’t really matter where, as long as you can easily find it. If you are using an RProject, you can skip this step.

Step 2

Next, open a new R Notebook. You can do this via File > New File > R Notebook or by clicking the white paper with green plus sign from the top left corner (this is the same as opening a new R script) and choosing R Notebook. It should look something like this:

Step 3

Next, find the button at the top that says Preview and click the arrow next to it. You should get a drop down menu. Select Knit to HTML. It should look like this:

It will prompt you to save the file. That’s OK! Give it a name like testNotebook or whatever you’d like.

Step 4

Finally, you should see something that pops up, and it should look like this:

As long as the final thing shown above popped up, you’re good! It’s ok if you see a lot of scary stuff in the console after. But, if the thing above did not pop up, please take a screen shot of the scary console text! It will be easier for me to help you if I can see the message. Scary text that is actually OK might look like:

Actual Assignment

You MUST use either the nhanes_small_assignment3.csv or superheroData_assignment3.csv datasets! These have been modified specifically for this assignment. You can find both on Canvas.

Script Prep

Open a new script file and complete this assignment (the same way you’ve been doing for previous assignments).

Here’s your to-do list:

At the top of your script, read in your dataset, load in the tidyverse package
Make a comment that confirms that the above test run thing worked.

Useful Plot

Let’s make a bar plot with error bars!

Here’s your to-do list:

Make a smaller data.frame that contains all the information you would need to make a bar graph (including information for error bars of +/- 1 standard deviation) for at least two of your categorical variables.
Then, use the plotting code below and replace the missing values with your data so that you can actually make the bar graph!

Here’s a HINT:

The hardest part of this is trying to imagine what it is that you need!
You need a continuous variable that you can collapse into a value per level of each of your two categorical variables. Then you need to calculate everything needed for the error bars.

Here’s the plotting code that you can copy/replace your data with

XXX-1: correct data.frame
XXX-2: one of the two categorical variables
XXX-3: the summarized continuous variable
XXX-4: the second of the categorical variables
XXX-5: the value of the lower edge of the error bar
XXX-6: the value of the upper edge of the error bar

ggplot(data = XXX-1, aes(x = XXX-2, y = XXX-3, fill = XXX-4)) +
  geom_col(position = position_dodge(width=.9)) +
  geom_errorbar(aes(ymin = XXX-5, ymax = XXX-6),
                position = position_dodge(width=.9),
                width = .2)

Data Wrangling Part 1

It’s very common to receive data in the wrong format. Either it comes to you as “long” but you need it to be “wide” or vice versa.

Both datasets now include information about time point. We’re going to use this information to convert to the other format!

Here’s your to-do list:

Determine if your data are currently in wide or long format. Make a note of this in your script.
Use the time information to go from wide to long or long to wide. Use pivot_wider or pivot_longer.
Rather than printing out the ENTIRE data.frame, print out the first 15 rows and the relevant columns using indexing.
Make a comment that describes what aspects of your data.frame become wide or long; that is, describe what it is you just performed.

Data Wrangling Part 2

Let’s do more practice data wrangling. Everyone’s data is different, so this might depend on what you’re working with.

Here’s your to-do list:

Make a tidyverse code chunk that uses at least 3 tidyverse functions. Do whatever you think makes the most sense.
All 3 of the functions should be within the SAME code chunk. That is, you should not have 3 separate chunks of code. It should be a cohesive code chunk (put differently, a paragraph of code).
The only rule for this is that you should NOT re-use functions from the other sections as part of your 3 functions. That is, if you want to use a function from one of the other parts of this assignment, then you need 3 additional functions. The goal is to get you practice with new functions.

Here’s a list of some of the common tidyverse functions that you might want to use (there are lots more, too!):

filter
separate
unite
mutate
select
arrange
rename
recode
any of the join functions (see cheatsheet for more)

Fantastic work!

Don’t forget to submit your code and data file to Canvas.