Comparing Means1 / 29

Recap

Normal distributions all have well-characterized properties

AUC = 1
~68% fall within 1 $σ$ , ~95% within 2 $σ$ , and ~99.7% within 3 $σ$

2 / 29

Recap

Normal distributions all have well-characterized properties

AUC = 1
~68% fall within 1 $σ$ , ~95% within 2 $σ$ , and ~99.7% within 3 $σ$

The standard normal distribution is a particular type of normal distribution

distribution of $z$ -scores
$μ = 0$
$σ = 1$

2 / 29

Recap

Normal distributions all have well-characterized properties

AUC = 1
~68% fall within 1 $σ$ , ~95% within 2 $σ$ , and ~99.7% within 3 $σ$

The standard normal distribution is a particular type of normal distribution

distribution of $z$ -scores
$μ = 0$
$σ = 1$

Using the standard normal & these cool properties, we can make probability statements

What is the probability of getting a z-score or more extreme?

2 / 29

Recap

Normal distributions all have well-characterized properties

AUC = 1
~68% fall within 1 $σ$ , ~95% within 2 $σ$ , and ~99.7% within 3 $σ$

The standard normal distribution is a particular type of normal distribution

distribution of $z$ -scores
$μ = 0$
$σ = 1$

Using the standard normal & these cool properties, we can make probability statements

What is the probability of getting a z-score or more extreme?

Sampling Distributions are distributions of statistics

They also happen to mostly be normally distributed...let's use this!

2 / 29

Example

University X has been around for 150 years, and so has 150 years worth of ratings of male applicants. You pay an undergrad dig through all the old university files and calculate the average rating of male applicants (5.3 out of 10) and also the standard deviation of those ratings (3.3).

You then collect the ratings of 9 female applicants in 2018 and calculate their average rating (2.9) and also the standard deviation of their rating (3.1)

How do you generate the sampling distribution around the null?

3 / 29

The mean of the sampling distribution = the mean of the null hypothesis

The standard deviation of the sampling distribution:

4 / 29

Pop mean (male) = 5.3 SD populataion (males) = 3.3 construct it around the null

What must we assume in order to use the SEM?

Random sampling

The mean of the sampling distribution = the mean of the null hypothesis

The standard deviation of the sampling distribution:

$S E M = \frac{σ}{\sqrt{N}}$

5 / 29

Now, we're trying to decide if given this null hypothesis, that has a distribution that looks like so, is it likely that our females come from this same distribution. the females had a mean of 2.9, so our purple line moves over. The standard deviation of our sampling distribution here is our standard error of the mean. to caluclate that we take our population standard deviation, sigma, and divide by the square root of N. we had 9 females, so square root of 9. this turns out to be 1.1. Even though we're only talking about this tail at the moment, i've plotted out the equivalent tail on the other side.

Calculate SEM on board.

$σ = 3.3$

$N = 9$

Example Cont...

All well and good.

But rarely will you have access to all the data in your population, so you won't be able to calculate the population standard deviation. What ever will you do?

6 / 29

Example Cont...

All well and good.

But rarely will you have access to all the data in your population, so you won't be able to calculate the population standard deviation. What ever will you do?

$S E M = \frac{\hat{σ}}{\sqrt{N}} = \frac{s}{\sqrt{N}}$

So long as your estimate of the standard deviation is already corrected for bias (you've divided by $N - 1$ ), then you can swap in your sample SD.

6 / 29

If you didn't know the population (male's) standard deviation, you would use the sample of females to estimate the population standard deviation.

$S E M = \frac{\hat{σ}}{\sqrt{N}}$

7 / 29

Calculate SEM on board.

$\hat{σ} = 3.1$

$N = 9$

We have a normal distribution for which we know the mean (M), the standard deviation (SEM), and a score of interest ( $\bar{X}$ ).

We can use this information to calculate a Z-score; in the context of comparing one mean to a sampling distribution of means, we call this a Z-statistic.

$Z = \frac{\bar{X} - M}{S E M} = \frac{2.9 - 5.3}{1.03} = - 2.18$

8 / 29

The z here isn't a z-score bc indtrsf we’re calculating standardised version of a sample mean, not a standardised version of a single observation, which is what a z-score usually refers to)

let's just recap. our null hypothesis was that the mean of females equaled the mean of females in their applicaton ratings. the alternative is that these means are different. That is, the null is that males and females come from the same population dist, and the alternative is that the females do not come from the same population dist as the males. the population mean of the males, 5.3, was used to derive the null distribution. then we used our sample estimate of the standard deviation from the females as a way of approximating the male population standard deviation (we're pretending we didn't know that). We used this to get a standard error of the mean, and we put it all together to get this z-statistic. the z tells us the number of standard errors that separate the observed sample mean from our population mean, that's predicted by our null hypothesis.

$Z = \frac{\bar{X} - M}{S E M} = \frac{2.9 - 5.3}{1.03} = - 2.18$

And here's where we use the properties of the Standard Normal Distribution to calculate probabilities, specifically the probability of getting a score this far away from $μ$ or more extreme:

pnorm(-2.18) + pnorm(2.18, lower.tail = F)

## [1] 0.02925746

pnorm(-2.18)*2

## [1] 0.02925746

The probability that the average female applicant's score would be at least 2.32 units away from the average male score is 0.029.

This whole process is called the $z$ -test. It's almost never used IRL, but it's a useful tool in terms of understanding what's happening. We use it when we want to know if our mean is the same or different from a population mean.

9 / 29

NHST Steps

Define $H_{0}$ and $H_{1}$ .
Choose your $α$ level.
Collect data.
Define your sampling distribution using your null hypothesis and either the knowns about the population or estimates of the population from your sample.
Calculate the probability of your data or more extreme under the null. (To get the probability, you'll need to calculate some kind of standardized score, like a z-statistic.)
Compare your probability (p-value) to your $α$ level and decide whether your data are "statistically significant" (reject the null) or not (fail to reject the null).

10 / 29

Who Cares?

Nearly all statistical tests follow this format. The things that are different are which sampling distribution to use (is it normal, a $t$ , a $F$ , a binomial, a poisson etc.)

11 / 29

$t$ -tests

We don't really use $z$ -tests much, but we do use $t$ -tests!

One Sample, Independent Samples (2 kinds), and Paired Samples

I want you to know...
- What each of these is testing
- When we use each one
- How to interpret the findings
- How to perform them in R
I don't care about...
- Knowing the formula
- Being able to calculate by hand

12 / 29

The $t$ distribution

The normal distribution assumes we know the population mean and standard deviation. But we don’t usually. We only know the sample mean and standard deviation, and those have some uncertainty about them.

That uncertainty is reduced with large samples, so that it's “close enough” to the normal. In small samples, the $t$ distribution is better.

13 / 29

$t$ distribution

The primary difference between the normal distribution and the $t$ distribution is the fatter tails
- At smaller N, it becomes harder to reject the null hypothesis in favor of the alternative
- The penalty we have to pay for our ignorance about the population
When we want to do a $t$ -test, we should use the $t$ sampling distribution; not the normal (unless we have a large N, in which case they'll give you the same answers)
There are different types, so let's work through them

14 / 29

One sample $t$ -test

The question: "Is my sample mean equal to a population mean?" The vast majority of the time, we're asking if it's different from 0.

You've basically already done this! It is the exact same procedure as what we just went through with the $z$ -test. The only differences are:

We don't know the population standard deviation, so to calculate the SEM, we use our best estimate of sigma ( $\hat{σ}$ ), which is our sample standard deviation that has been corrected for bias ($s$) using that $N - 1$ denominator
We use the $t$ sampling distribution. If doing it the long way like before, to get the p-values, use the pt() function instead of the pnorm() function. Or just use the data...

15 / 29

One sample $t$ -test

kable(head(iris))

Sepal.Length	Sepal.Width	Petal.Length	Petal.Width	Species
5.1	3.5	1.4	0.2	setosa
4.9	3.0	1.4	0.2	setosa
4.7	3.2	1.3	0.2	setosa
4.6	3.1	1.5	0.2	setosa
5.0	3.6	1.4	0.2	setosa
5.4	3.9	1.7	0.4	setosa

16 / 29

One sample tt-test# way 1 -- not as recommended unless mu is a number other than 0
t.test(x = iris$Sepal.Length, mu = 0)

## 
##     One Sample t-test
## 
## data:  iris$Sepal.Length
## t = 86.425, df = 149, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  5.709732 5.976934
## sample estimates:
## mean of x 
##  5.843333
# way 2 -- recommended if mu = 0
t.test(Sepal.Length ~ 1, data = iris)

## 
##     One Sample t-test
## 
## data:  Sepal.Length
## t = 86.425, df = 149, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  5.709732 5.976934
## sample estimates:
## mean of x 
##  5.843333
17 / 29

Accessing Your Results

Ok, you just ran a $t$ -test. And you want to keep that output to use for later, so you store it as an object

oneSample <- t.test(Sepal.Length ~ 1, data = iris)

If you look in your Environment, you'll notice this is stored as a List object. Lists can be annoying. You can press on the blue arrow to see the different items contained in your list. To actually accsess them we're going to use our old favorite, indexing

18 / 29

Accessing Your Results

Ok, you just ran a $t$ -test. And you want to keep that output to use for later, so you store it as an object

oneSample <- t.test(Sepal.Length ~ 1, data = iris)

For instance, to get the p-value, we need to access the 3rd thing in the list

oneSample[3]

## $p.value
## [1] 3.331256e-129

See how the name $p.value prints out. This makes it hard to actually do math with! It's a "named number". We want to get rid of that name, by going in a little deeper.

oneSample[[3]]

## [1] 3.331256e-129

18 / 29

Accessing Your Results

The list thing can get obnoxious. We'll revisit it later in the semester. But for now, there's an easier way using tidyverse. Specifically, we will use the broom package. Even though it's part of the tidyverse ecosystem, it does not load when you load tidyverse. So you'll need to do that manually.

library(broom)
oneSample <- t.test(Sepal.Length ~ 1, data = iris)
tidyOneSample <- tidy(oneSample)
kable(tidyOneSample)

estimate	statistic	p.value	parameter	conf.low	conf.high	method	alternative
5.843333	86.42537	0	149	5.709733	5.976934	One Sample t-test	two.sided

19 / 29

Independent Samples $t$ -test

The question: "Are two means different from one another?" Another way of thinking about this is "is the difference between the means equal to 0?" This is almost always asked in the context of a dichotomous variable.

Two types:

Welch's $t$ -test, the default in R. Assumes the variances are unequal
Student's $t$ -test. Assumes the variances are equal

irisSmall <- iris %>% 
  filter(Species != "setosa")
tidyWelch <- tidy(t.test(Sepal.Length ~ Species, data = irisSmall))
tidyStudent <- tidy(t.test(Sepal.Length ~ Species, data = irisSmall, var.equal = T))

20 / 29

Independent Samples $t$ -test

kable(tidyWelch)

estimate	estimate1	estimate2	statistic	p.value	parameter	conf.low	conf.high	method	alternative
-0.652	5.936	6.588	-5.629165	2e-07	94.02549	-0.8819731	-0.4220269	Welch Two Sample t-test	two.sided

kable(tidyStudent)

estimate	estimate1	estimate2	statistic	p.value	parameter	conf.low	conf.high	method	alternative
-0.652	5.936	6.588	-5.629165	2e-07	98	-0.8818516	-0.4221484	Two Sample t-test	two.sided

21 / 29

Paried $t$ -test

In the independent sample $t$ -test, we assume that our data are truly independent. What if they're not? Examples: romantic partners, change in year 1 to year 2 etc. AKA "repeated measures"

Let's say we have something like happiness year 1 and happiness year 2. You can't ask if the means are different, because these are very correlated. What we can do is say "is the difference score equal to 0?" That is, "happiness year 1 - happiness year 2; is that equal to 0?" This is basically a one-sample $t$ -test but on difference scores now.

22 / 29

Paried $t$ -test

Let's pretend that the the species versicolor and virginica are actually related (they both start with v, right? lol)

pairedV1 <- tidy(t.test(Sepal.Length ~ Species, data = irisSmall, paired = T))
kable(pairedV1)

estimate	statistic	p.value	parameter	conf.low	conf.high	method	alternative
-0.652	-5.275345	3e-06	49	-0.900371	-0.403629	Paired t-test	two.sided

23 / 29

More Means

What if you want to compare more than 2 means? Now you're in ANOVA territory

24 / 29

More Means

What if you want to compare more than 2 means? Now you're in ANOVA territory

Oneway ANOVA

You still have a single independent variable, but instead of it being dichotomous, it's trichotomous (or more)

24 / 29

More Means

What if you want to compare more than 2 means? Now you're in ANOVA territory

Oneway ANOVA

You still have a single independent variable, but instead of it being dichotomous, it's trichotomous (or more)

Other ANOVAs

You have more than 1 independent variable, but they are all still factors (not continuous).

24 / 29

ANOVA

To keep things simple, let's say we have a Oneway ANOVA with the original iris dataset that has 3 sepcies. The null hypothesis is:

$μ_{s e t o s a} = μ_{v e r s i c o l o r} = μ_{v i r i g i n i c a}$

25 / 29

ANOVA

To keep things simple, let's say we have a Oneway ANOVA with the original iris dataset that has 3 sepcies. The null hypothesis is:

$μ_{s e t o s a} = μ_{v e r s i c o l o r} = μ_{v i r i g i n i c a}$

But the alternative hypothesis is that "at least one of these means are different from each other." That could be:

$μ_{s e t o s a} \neq μ_{v e r s i c o l o r} = μ_{v i r g i n i n c a}$

$μ_{s e t o s a} = μ_{v e r s i c o l o r} \neq μ_{v i r g i n i c a}$

...or any of these combinations.

25 / 29

ANOVA

Instead of using the $t$ or normal distributions, we use the $F$ distribution for ANOVA. The $F$ is a ratio of variances. We take the variance between groups and compare it to the variance within groups (and error).

The idea is that if there is a lot of variance because the means between groups are super different, then the numerator is large while the denominator is small.

If the means aren't that different, then there's not going to be a lot of variance in the numerator. Instead, the variance in the denominator will take over. This would yield a non-significant ANOVA.

Fun things: Your $F$ statistic cannot be negative. There is no such thing as a negative variance, and we're looking at a ratio of variances. 0 is the smallest it gets.

26 / 29

ANOVA

It's the same code from lecture #9. You'll still want to nest aov() inside of summary(). To store for later, we can still use tidy() from the broom package. glance() from the broom package can also help with getting our $R^{2}$ (variance explained).

summary(aov(Sepal.Length ~ Species, data = iris))

##              Df Sum Sq Mean Sq F value Pr(>F)    
## Species       2  63.21  31.606   119.3 <2e-16 ***
## Residuals   147  38.96   0.265                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#to tidy it, keep it out of the summary
onewayEx <- tidy(aov(Sepal.Length ~ Species, data = iris))
onewayEx

## # A tibble: 2 x 6
##   term         df sumsq meansq statistic   p.value
##   <chr>     <dbl> <dbl>  <dbl>     <dbl>     <dbl>
## 1 Species       2  63.2 31.6        119.  1.67e-31
## 2 Residuals   147  39.0  0.265       NA  NA

27 / 29

Downsides

$t$ -tests and ANOVAs are both just special cases of regression. In our regression lecture, we'll talk about this.

Regression is a much more flexible framework for statistical analysis. Generally speaking, $t$ -tests are fine staying as $t$ -tests, but if you ever want to run an ANOVA (especially with 2+ predictors), I strongly suggest using regression instead of ANOVA. It's the same thing, but you'll get more for your money with regression, and you can start to include predictors that are continuous and categorical -- no need to choose!

28 / 29

Next time...Confidence Intervals
p-values
Power
Problems with NHST
29 / 29

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Comparing Means

Recap

Recap

Recap

Recap

Example

What must we assume in order to use the SEM?

Example Cont...

Example Cont...

NHST Steps

Who Cares?

tt-tests

The tt distribution

tt distribution

One sample tt-test

One sample tt-test

One sample tt-test

Accessing Your Results

Accessing Your Results

Accessing Your Results

Independent Samples tt-test

Independent Samples tt-test

Paried tt-test

Paried tt-test

More Means

More Means

Oneway ANOVA

More Means

Oneway ANOVA

Other ANOVAs

ANOVA

ANOVA

ANOVA

ANOVA

Downsides

Next time...

Recap

Help

$t$ -tests

The $t$ distribution

$t$ distribution

One sample $t$ -test

One sample $t$ -test

One sample $t$ -test

Independent Samples $t$ -test

Independent Samples $t$ -test

Paried $t$ -test

Paried $t$ -test