+ - 0:00:00
Notes for current slide
Notes for next slide

Comparing Means

1 / 29

Recap

Normal distributions all have well-characterized properties

  • AUC = 1
  • ~68% fall within 1 σ, ~95% within 2 σ, and ~99.7% within 3 σ
2 / 29

Recap

Normal distributions all have well-characterized properties

  • AUC = 1
  • ~68% fall within 1 σ, ~95% within 2 σ, and ~99.7% within 3 σ

The standard normal distribution is a particular type of normal distribution

  • distribution of z-scores
  • μ=0
  • σ=1
2 / 29

Recap

Normal distributions all have well-characterized properties

  • AUC = 1
  • ~68% fall within 1 σ, ~95% within 2 σ, and ~99.7% within 3 σ

The standard normal distribution is a particular type of normal distribution

  • distribution of z-scores
  • μ=0
  • σ=1

Using the standard normal & these cool properties, we can make probability statements

  • What is the probability of getting a z-score or more extreme?
2 / 29

Recap

Normal distributions all have well-characterized properties

  • AUC = 1
  • ~68% fall within 1 σ, ~95% within 2 σ, and ~99.7% within 3 σ

The standard normal distribution is a particular type of normal distribution

  • distribution of z-scores
  • μ=0
  • σ=1

Using the standard normal & these cool properties, we can make probability statements

  • What is the probability of getting a z-score or more extreme?

Sampling Distributions are distributions of statistics

  • They also happen to mostly be normally distributed...let's use this!
2 / 29

Example

University X has been around for 150 years, and so has 150 years worth of ratings of male applicants. You pay an undergrad dig through all the old university files and calculate the average rating of male applicants (5.3 out of 10) and also the standard deviation of those ratings (3.3).

You then collect the ratings of 9 female applicants in 2018 and calculate their average rating (2.9) and also the standard deviation of their rating (3.1)

How do you generate the sampling distribution around the null?

3 / 29

The mean of the sampling distribution = the mean of the null hypothesis

The standard deviation of the sampling distribution:

4 / 29

Pop mean (male) = 5.3 SD populataion (males) = 3.3 construct it around the null

What must we assume in order to use the SEM?

Random sampling

The mean of the sampling distribution = the mean of the null hypothesis

The standard deviation of the sampling distribution:

SEM=σN

5 / 29

Now, we're trying to decide if given this null hypothesis, that has a distribution that looks like so, is it likely that our females come from this same distribution. the females had a mean of 2.9, so our purple line moves over. The standard deviation of our sampling distribution here is our standard error of the mean. to caluclate that we take our population standard deviation, sigma, and divide by the square root of N. we had 9 females, so square root of 9. this turns out to be 1.1. Even though we're only talking about this tail at the moment, i've plotted out the equivalent tail on the other side.

Calculate SEM on board.

σ=3.3

N=9

Example Cont...

All well and good.

But rarely will you have access to all the data in your population, so you won't be able to calculate the population standard deviation. What ever will you do?

6 / 29

Example Cont...

All well and good.

But rarely will you have access to all the data in your population, so you won't be able to calculate the population standard deviation. What ever will you do?

SEM=σ^N=sN

So long as your estimate of the standard deviation is already corrected for bias (you've divided by N1 ), then you can swap in your sample SD.

6 / 29

If you didn't know the population (male's) standard deviation, you would use the sample of females to estimate the population standard deviation.

SEM=σ^N

7 / 29

Calculate SEM on board.

σ^=3.1

N=9

We have a normal distribution for which we know the mean (M), the standard deviation (SEM), and a score of interest ( X¯ ).

We can use this information to calculate a Z-score; in the context of comparing one mean to a sampling distribution of means, we call this a Z-statistic.

Z=X¯MSEM=2.95.31.03=2.18

8 / 29

The z here isn't a z-score bc indtrsf we’re calculating standardised version of a sample mean, not a standardised version of a single observation, which is what a z-score usually refers to)

let's just recap. our null hypothesis was that the mean of females equaled the mean of females in their applicaton ratings. the alternative is that these means are different. That is, the null is that males and females come from the same population dist, and the alternative is that the females do not come from the same population dist as the males. the population mean of the males, 5.3, was used to derive the null distribution. then we used our sample estimate of the standard deviation from the females as a way of approximating the male population standard deviation (we're pretending we didn't know that). We used this to get a standard error of the mean, and we put it all together to get this z-statistic. the z tells us the number of standard errors that separate the observed sample mean from our population mean, that's predicted by our null hypothesis.

Z=X¯MSEM=2.95.31.03=2.18

And here's where we use the properties of the Standard Normal Distribution to calculate probabilities, specifically the probability of getting a score this far away from μ or more extreme:

pnorm(-2.18) + pnorm(2.18, lower.tail = F)
## [1] 0.02925746
pnorm(-2.18)*2
## [1] 0.02925746

The probability that the average female applicant's score would be at least 2.32 units away from the average male score is 0.029.

This whole process is called the z-test. It's almost never used IRL, but it's a useful tool in terms of understanding what's happening. We use it when we want to know if our mean is the same or different from a population mean.

9 / 29

NHST Steps

  1. Define H0 and H1.

  2. Choose your α level.

  3. Collect data.

  4. Define your sampling distribution using your null hypothesis and either the knowns about the population or estimates of the population from your sample.

  5. Calculate the probability of your data or more extreme under the null. (To get the probability, you'll need to calculate some kind of standardized score, like a z-statistic.)

  6. Compare your probability (p-value) to your α level and decide whether your data are "statistically significant" (reject the null) or not (fail to reject the null).

10 / 29

Who Cares?

Nearly all statistical tests follow this format. The things that are different are which sampling distribution to use (is it normal, a t, a F, a binomial, a poisson etc.)

11 / 29

t-tests

We don't really use z-tests much, but we do use t-tests!

One Sample, Independent Samples (2 kinds), and Paired Samples

  • I want you to know...

    • What each of these is testing
    • When we use each one
    • How to interpret the findings
    • How to perform them in R
  • I don't care about...

    • Knowing the formula
    • Being able to calculate by hand
12 / 29

The t distribution

The normal distribution assumes we know the population mean and standard deviation. But we don’t usually. We only know the sample mean and standard deviation, and those have some uncertainty about them.

That uncertainty is reduced with large samples, so that it's “close enough” to the normal. In small samples, the t distribution is better.

13 / 29

t distribution

  • The primary difference between the normal distribution and the t distribution is the fatter tails

    • At smaller N, it becomes harder to reject the null hypothesis in favor of the alternative
    • The penalty we have to pay for our ignorance about the population
  • When we want to do a t-test, we should use the t sampling distribution; not the normal (unless we have a large N, in which case they'll give you the same answers)

  • There are different types, so let's work through them

14 / 29

One sample t-test

The question: "Is my sample mean equal to a population mean?" The vast majority of the time, we're asking if it's different from 0.

You've basically already done this! It is the exact same procedure as what we just went through with the z-test. The only differences are:

  • We don't know the population standard deviation, so to calculate the SEM, we use our best estimate of sigma ( σ^ ), which is our sample standard deviation that has been corrected for bias ($s$) using that N1 denominator
  • We use the t sampling distribution. If doing it the long way like before, to get the p-values, use the pt() function instead of the pnorm() function. Or just use the data...
15 / 29

One sample t-test

kable(head(iris))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
16 / 29

One sample t-test

# way 1 -- not as recommended unless mu is a number other than 0
t.test(x = iris$Sepal.Length, mu = 0)
##
## One Sample t-test
##
## data: iris$Sepal.Length
## t = 86.425, df = 149, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 5.709732 5.976934
## sample estimates:
## mean of x
## 5.843333
# way 2 -- recommended if mu = 0
t.test(Sepal.Length ~ 1, data = iris)
##
## One Sample t-test
##
## data: Sepal.Length
## t = 86.425, df = 149, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 5.709732 5.976934
## sample estimates:
## mean of x
## 5.843333
17 / 29

Accessing Your Results

Ok, you just ran a t-test. And you want to keep that output to use for later, so you store it as an object

oneSample <- t.test(Sepal.Length ~ 1, data = iris)

If you look in your Environment, you'll notice this is stored as a List object. Lists can be annoying. You can press on the blue arrow to see the different items contained in your list. To actually accsess them we're going to use our old favorite, indexing

18 / 29

Accessing Your Results

Ok, you just ran a t-test. And you want to keep that output to use for later, so you store it as an object

oneSample <- t.test(Sepal.Length ~ 1, data = iris)

If you look in your Environment, you'll notice this is stored as a List object. Lists can be annoying. You can press on the blue arrow to see the different items contained in your list. To actually accsess them we're going to use our old favorite, indexing

For instance, to get the p-value, we need to access the 3rd thing in the list

oneSample[3]
## $p.value
## [1] 3.331256e-129

See how the name $p.value prints out. This makes it hard to actually do math with! It's a "named number". We want to get rid of that name, by going in a little deeper.

oneSample[[3]]
## [1] 3.331256e-129
18 / 29

Accessing Your Results

The list thing can get obnoxious. We'll revisit it later in the semester. But for now, there's an easier way using tidyverse. Specifically, we will use the broom package. Even though it's part of the tidyverse ecosystem, it does not load when you load tidyverse. So you'll need to do that manually.

library(broom)
oneSample <- t.test(Sepal.Length ~ 1, data = iris)
tidyOneSample <- tidy(oneSample)
kable(tidyOneSample)
estimate statistic p.value parameter conf.low conf.high method alternative
5.843333 86.42537 0 149 5.709733 5.976934 One Sample t-test two.sided
19 / 29

Independent Samples t-test

The question: "Are two means different from one another?" Another way of thinking about this is "is the difference between the means equal to 0?" This is almost always asked in the context of a dichotomous variable.

Two types:

  1. Welch's t-test, the default in R. Assumes the variances are unequal
  2. Student's t-test. Assumes the variances are equal
irisSmall <- iris %>%
filter(Species != "setosa")
tidyWelch <- tidy(t.test(Sepal.Length ~ Species, data = irisSmall))
tidyStudent <- tidy(t.test(Sepal.Length ~ Species, data = irisSmall, var.equal = T))
20 / 29

Independent Samples t-test

kable(tidyWelch)
estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method alternative
-0.652 5.936 6.588 -5.629165 2e-07 94.02549 -0.8819731 -0.4220269 Welch Two Sample t-test two.sided
kable(tidyStudent)
estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method alternative
-0.652 5.936 6.588 -5.629165 2e-07 98 -0.8818516 -0.4221484 Two Sample t-test two.sided
21 / 29

Paried t-test

In the independent sample t-test, we assume that our data are truly independent. What if they're not? Examples: romantic partners, change in year 1 to year 2 etc. AKA "repeated measures"

Let's say we have something like happiness year 1 and happiness year 2. You can't ask if the means are different, because these are very correlated. What we can do is say "is the difference score equal to 0?" That is, "happiness year 1 - happiness year 2; is that equal to 0?" This is basically a one-sample t-test but on difference scores now.

22 / 29

Paried t-test

Let's pretend that the the species versicolor and virginica are actually related (they both start with v, right? lol)

pairedV1 <- tidy(t.test(Sepal.Length ~ Species, data = irisSmall, paired = T))
kable(pairedV1)
estimate statistic p.value parameter conf.low conf.high method alternative
-0.652 -5.275345 3e-06 49 -0.900371 -0.403629 Paired t-test two.sided
23 / 29

More Means

What if you want to compare more than 2 means? Now you're in ANOVA territory

24 / 29

More Means

What if you want to compare more than 2 means? Now you're in ANOVA territory

Oneway ANOVA

You still have a single independent variable, but instead of it being dichotomous, it's trichotomous (or more)

24 / 29

More Means

What if you want to compare more than 2 means? Now you're in ANOVA territory

Oneway ANOVA

You still have a single independent variable, but instead of it being dichotomous, it's trichotomous (or more)

Other ANOVAs

You have more than 1 independent variable, but they are all still factors (not continuous).

24 / 29

ANOVA

To keep things simple, let's say we have a Oneway ANOVA with the original iris dataset that has 3 sepcies. The null hypothesis is:

μsetosa=μversicolor=μviriginica

25 / 29

ANOVA

To keep things simple, let's say we have a Oneway ANOVA with the original iris dataset that has 3 sepcies. The null hypothesis is:

μsetosa=μversicolor=μviriginica

But the alternative hypothesis is that "at least one of these means are different from each other." That could be:

μsetosaμversicolor=μvirgininca

μsetosa=μversicolorμvirginica

...or any of these combinations.

25 / 29

ANOVA

Instead of using the t or normal distributions, we use the F distribution for ANOVA. The F is a ratio of variances. We take the variance between groups and compare it to the variance within groups (and error).

The idea is that if there is a lot of variance because the means between groups are super different, then the numerator is large while the denominator is small.

If the means aren't that different, then there's not going to be a lot of variance in the numerator. Instead, the variance in the denominator will take over. This would yield a non-significant ANOVA.

Fun things: Your F statistic cannot be negative. There is no such thing as a negative variance, and we're looking at a ratio of variances. 0 is the smallest it gets.

26 / 29

ANOVA

It's the same code from lecture #9. You'll still want to nest aov() inside of summary(). To store for later, we can still use tidy() from the broom package. glance() from the broom package can also help with getting our R2 (variance explained).

summary(aov(Sepal.Length ~ Species, data = iris))
## Df Sum Sq Mean Sq F value Pr(>F)
## Species 2 63.21 31.606 119.3 <2e-16 ***
## Residuals 147 38.96 0.265
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#to tidy it, keep it out of the summary
onewayEx <- tidy(aov(Sepal.Length ~ Species, data = iris))
onewayEx
## # A tibble: 2 x 6
## term df sumsq meansq statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Species 2 63.2 31.6 119. 1.67e-31
## 2 Residuals 147 39.0 0.265 NA NA
27 / 29

Downsides

t-tests and ANOVAs are both just special cases of regression. In our regression lecture, we'll talk about this.

Regression is a much more flexible framework for statistical analysis. Generally speaking, t-tests are fine staying as t-tests, but if you ever want to run an ANOVA (especially with 2+ predictors), I strongly suggest using regression instead of ANOVA. It's the same thing, but you'll get more for your money with regression, and you can start to include predictors that are continuous and categorical -- no need to choose!

28 / 29

Next time...

  • Confidence Intervals
  • p-values
  • Power
  • Problems with NHST
29 / 29

Recap

Normal distributions all have well-characterized properties

  • AUC = 1
  • ~68% fall within 1 σ, ~95% within 2 σ, and ~99.7% within 3 σ
2 / 29
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow