Hopefully you're feeling more comfortable with some of terminology used in programming:
You already know everything you need to do stats & plots!
You already know everything you need to do stats & plots!
All statistics are computed with functions.
If you know the type of analysis you want to run, find the corresponding function and go for it!
You already know everything you need to do stats & plots!
All statistics are computed with functions.
If you know the type of analysis you want to run, find the corresponding function and go for it!
All plots are made with functions.
Slightly different is that one particular package is a lot better at plotting than base
R.
Very basic statistics
Introduction to plotting with ggplot2
PRACTICE PRACTICE PRACTICE
Variables available in this data file:
Variables available in this data file:
Please load in midus
, make sure:
sex
, heart_self
, and heart_father
are factor()
variables (rather than characters)na.omit()
function to remove all NA
values Check to make sure you have the ggplot2
package installed
Check to make sure the ggplot2
package is loaded
If "no" to either, how can you solve this?
ggplot2
ggplot2
ggplot2
has the following structure:
ggplot(things that impact the entire plot) + geom_something(things that impact just the something)
ggplot2
ggplot2
has the following structure:
ggplot(things that impact the entire plot) + geom_something(things that impact just the something)
Things like:
ggplot2
ggplot2
has the following structure:
ggplot(things that impact the entire plot) + geom_something(things that impact just the something)
geom_
typically means shape. What shapes do you want to use to represent your data in the plot?
geom_histogram
-- histogramgeom_density
-- distributionsgeom_violin
-- distributionsgeom_point
-- scatter plotgeom_col
-- bar plotggplot2
The functions ggplot()
and geom_()
can take on different aesthetics as an argument, using aes()
.
Aesthetics are how you control what you want your plot to look like; how can you make it pretty? Examples:
x-
and y-
axes?color
(should you color the plot by some variable?)fill
(very similar to color
, should you fill the plot in somehow; used for bar graphs and boxplots)shape
(do you want groups to have different shaped points?)size
(how big should plotted data be?)Note: person that made this package is from New Zealand; the British spellings and American spellings work! Although using tab-complete my auto-fill the British spellings
ggplot2
aes()
contains some information that comes directly from the dataaes()
argument. ggplot(data = midus, aes(x = age, y = BMI)) + geom_point()
ggplot2
aes()
contains some information that comes directly from the dataaes()
argument. ggplot(data = midus, aes(x = age, y = BMI)) + geom_point(color = "cornflowerblue")
ggplot2
aes()
contains some information that comes directly from the dataaes()
argument. ggplot(data = midus, aes(x = age, y = BMI)) + geom_point(aes(color = sex))
Make a scatter plot of self_esteem
(x-axis) against life_satisfaction
(y-axis)
Make the points of the scatter plot a different shape
based on the sex
variable (for example, males might be circles and females might be squares)
Make the color
of the points different based on sex
Set the size
of all points equal to 3
We're going to practice plotting with ggplot2
while learning some really basic statistical tests.
We're going to practice plotting with ggplot2
while learning some really basic statistical tests.
Is there a difference in hostility
between men and women in the midus
sample?
Statistic: T-test
Plot: boxplot
You can read the ~
(tilda) as "by" or "predicted by"
hostility ~ sex
means...
Is there a difference in hostility
between men and women in the midus
sample?
t.test(hostility ~ sex, data = midus)
## ## Welch Two Sample t-test## ## data: hostility by sex## t = -6.097, df = 3519.4, p-value = 1.198e-09## alternative hypothesis: true difference in means is not equal to 0## 95 percent confidence interval:## -0.4491034 -0.2305455## sample estimates:## mean in group Female mean in group Male ## 5.638040 5.977864
Is there a difference in hostility
between men and women in the midus
sample?
ggplot(data = midus, aes(x = sex, y = hostility)) + geom_boxplot()
Does self_esteem
correlate with life_satisfaction
?
Statistic: Correlation
Plot: scatter plot
cor()
gives straight correlation; no frillscor.test()
gives probabilities but only for one pair of values at a timecorr.test()
is part of the psych
package and reports sample sizes along with probabilitiesDoes self_esteem
correlate with life_satisfaction
?
# Stored as it's own object. Play with it in your Global Environment!esteemVsLifeSat <- cor.test(x = midus$self_esteem, y = midus$life_satisfaction)esteemVsLifeSat
## ## Pearson's product-moment correlation## ## data: midus$self_esteem and midus$life_satisfaction## t = 34.292, df = 3738, p-value < 2.2e-16## alternative hypothesis: true correlation is not equal to 0## 95 percent confidence interval:## 0.4644257 0.5131989## sample estimates:## cor ## 0.4891947
Does self_esteem
correlate with life_satisfaction
?
ggplot(data = midus, aes(x = self_esteem, y = life_satisfaction)) + geom_point()
Say you wanted to dichotomize your self_esteem
variable into those with high self-esteem (above the mean) and those with low self-esteem (below the mean).
You want to see if sex
and your newly dichotomized self_esteem
variables predict BMI
.
Statistic: 2x2 ANOVA
Plot: bar plot
As a general rule, don't do this
BUT, it does make for a nice teaching example 😃
# create the groups; store as a new variablemidus$self_esteem_di <- ifelse(test = midus$self_esteem > mean(midus$self_esteem), yes = "high", no = "low")# make sure the new variable is treated as a factormidus$self_esteem_di <- factor(midus$self_esteem_di)# for us to view ithead(midus[,c(1,2,3,12)])
## ID sex age self_esteem_di## 1 10001 Male 61 high## 2 10002 Male 69 low## 6 10011 Female 52 high## 8 10015 Female 53 low## 10 10018 Male 49 high## 11 10019 Male 51 high
Does sex
and your newly dichotomized self_esteem
variable predict BMI
? (no interaction)
anova1 <- aov(BMI ~ sex + self_esteem_di, data = midus)summary(anova1)
## Df Sum Sq Mean Sq F value Pr(>F) ## sex 1 541 541.0 16.39 5.25e-05 ***## self_esteem_di 1 878 877.5 26.59 2.65e-07 ***## Residuals 3737 123328 33.0 ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Does sex
and your newly dichotomized self_esteem
variable predict BMI
? (WITH interaction)
anova2 <- aov(BMI ~ sex * self_esteem_di, data = midus)summary(anova2)
## Df Sum Sq Mean Sq F value Pr(>F) ## sex 1 541 541.0 16.411 5.20e-05 ***## self_esteem_di 1 878 877.5 26.618 2.61e-07 ***## sex:self_esteem_di 1 162 162.2 4.921 0.0266 * ## Residuals 3736 123166 33.0 ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Does sex
and your newly dichotomized self_esteem
variable predict BMI
?
Bar plots suck. Height of each bar should reflect that group's mean. So we first need to calculate the means, and store them in a data.frame.
femaleHigh <- subset(midus, sex == "Female" & self_esteem_di == "high")femaleHighMean <- mean(femaleHigh$BMI)femaleLow <- subset(midus, sex == "Female" & self_esteem_di == "low")femaleLowMean <- mean(femaleLow$BMI)maleHigh <- subset(midus, sex == "Male" & self_esteem_di == "high")maleHighMean <- mean(maleHigh$BMI)maleLow <- subset(midus, sex == "Male" & self_esteem_di == "low")maleLowMean <- mean(maleLow$BMI)meansData <- data.frame(sex = c("Female", "Female", "Male", "Male"), self_esteem_di = c("high", "low", "high", "low"), meanBMI = c(femaleHighMean, femaleLowMean, maleHighMean, maleLowMean))
Does sex
and your newly dichotomized self_esteem
variable predict BMI
?
Then we can plot, using our NEW data.frame (Note: we will cover a MUCH easier way of doing this when we talk about tidyverse
in the next section)
ggplot(data = meansData, aes(x = sex, y = meanBMI)) + geom_col(aes(fill = self_esteem_di), position = "dodge")
Is life_satisfaction
predicted by self_esteem
?
Are self_esteem
and hostility
both independent predictors of life_satisfaction
?
Is there an interaction between self_esteem
and hostility
predicting life_satsifaction
?
Statistic: Simple & Multiple Regression
Plot: scatter plot with mean, +1SD, and -1SD of hostility
Is life_satisfaction
predicted by self_esteem
?
lm(life_satisfaction ~ self_esteem, data = midus)
Are self_esteem
and hostility
both independent predictors of life_satisfaction
?
lm(life_satisfaction ~ self_esteem + hostility, data = midus)
Is there an interaction between self_esteem
and hostility
predicting life_satsifaction
?
lm(life_satisfaction ~ self_esteem * hostility, data = midus)
ggeffects
Has a function called ggpredict
that makes it very easy to visualize interactions of continuous variables.
lm()
object to your global environment. You can get coefficients, predicted values etc.Assign your lm()
object to your global environment. You can get coefficients, predicted values etc.
If you want the relationship betwen X1 and Y, after controlling for X2, you can make a scatter plot with the model's fitted values.
Assign your lm()
object to your global environment. You can get coefficients, predicted values etc.
If you want the relationship betwen X1 and Y, after controlling for X2, you can make a scatter plot with the model's fitted values.
If you want to view the output table of a regression, use summary()
(just like we did with ANOVA).
Assign your lm()
object to your global environment. You can get coefficients, predicted values etc.
If you want the relationship betwen X1 and Y, after controlling for X2, you can make a scatter plot with the model's fitted values.
If you want to view the output table of a regression, use summary()
(just like we did with ANOVA).
If you want to be able to extract the R2, F-statistic etc., assign the summary(model)
object to your global environment.
Assign your lm()
object to your global environment. You can get coefficients, predicted values etc.
If you want the relationship betwen X1 and Y, after controlling for X2, you can make a scatter plot with the model's fitted values.
If you want to view the output table of a regression, use summary()
(just like we did with ANOVA).
If you want to be able to extract the R2, F-statistic etc., assign the summary(model)
object to your global environment.
Check out the broom
package to format your regression outputs into a nice data.frame.
The only way to get better is to PRACTICE! Some helpful resources:
swirl
package helps you learn R from inside RStudio! Strong recommend!tidyverse
-heavy; go through our tidyverse
portion first, then check the book outGoogle, Google, Google!
You made it through our R
Basic Training!
Up next:
Learn to clean and prepare your data more effectively with tidyverse
. This is a HUGE part of the R
ecosystem, so please don't skip this! It will make your life a lot easier!
How to generate reports (PDF, Word, or HTML) files that integrate your thoughts and your code. This is the core of reproducibility
and will allow you to share code with your advisors, collaborators, and journals in a much prettier and easier manner.
Here we covered the basics of plotting with ggplot2
, but learn just how flexible it can be for data visualization
. Make your plots incredible!
Hopefully you're feeling more comfortable with some of terminology used in programming:
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |