Practice Set: Comparing Means & Power
- Comparing Means Lecture
- Comparing Means Lecture Review Quiz
- Let’s Practice!
- You calculate the z-statistic and it is -3.54.
- Calculate the probability of getting this # of cups, or more extreme, given the null is true.
- The t distribution has fatter tails because it has to pay a price for not knowing population.
- We recently measured the average rainfall in 70 U.S. cities (“precip”). The average rainfall in the U.S. is 30 inches. We want to determine if our 70 cities are representative of the entire population.
- Let’s run a test!
- The name of the data and variable is “precip” and the mean of entire population is 30.
- Is the sample mean significantly different than the population mean? What is the p-value?
- We want to look at the effects of Vitamin C on tooth growth in guinea pigs. The data (“ToothGrowth”) compares orange juice vs. vitamin C supplements (“supp”). Tooth growth is measured by length (“len”).
- Let’s see what we find!
- A new sleep treatment was tested in rats. They were split up into control and treatment groups (“group”) and their sleep (“extra”) was measured before treatment and after treatment. Each rat was assigned an ID number (“ID”).
- Let’s see what we find:
- How could you tell?
- Let’s run one together!
- Researchers are interested in learning ways to help plants grow larger. They are trying out two different fertilizers and are including a control group. We want to see if the weights of the plants differ in the 3 groups.
- Looking at the F statistic alone, what would you guess about the conclusion?
- How confident can you be in this conclusion?
- Changing gears to power…
- Now back to coding!
- Let’s practice calculating the probability of a Type II error:
- Great job!!!
Comparing Means Lecture
Comparing Means Lecture Review Quiz
Let’s Practice!
The average student consumes 2 cups of coffee per day during the semester, but you believe that this number spikes during midterms. You survey 50 students and find the average is 2.5 cups, with a population standard deviation of 1 cup.
You calculate the z-statistic and it is -3.54.
Calculate the probability of getting this # of cups, or more extreme, given the null is true.
# { have them tell me what to code }
# correct code:
pnorm(-3.54)*2
## [1] 0.000400127
The t distribution has fatter tails because it has to pay a price for not knowing population.
We recently measured the average rainfall in 70 U.S. cities (“precip”). The average rainfall in the U.S. is 30 inches. We want to determine if our 70 cities are representative of the entire population.
Let’s run a test!
The name of the data and variable is “precip” and the mean of entire population is 30.
data(InsectSprays)
# correct code:
oneSample <- t.test(precip, mu = 30, data = precip)
# or
oneSample <- t.test(precip, mu = 30)
oneSample
##
## One Sample t-test
##
## data: precip
## t = 2.9823, df = 69, p-value = 0.003952
## alternative hypothesis: true mean is not equal to 30
## 95 percent confidence interval:
## 31.61748 38.15395
## sample estimates:
## mean of x
## 34.88571
Is the sample mean significantly different than the population mean? What is the p-value?
# correct code:
library(broom)
library(knitr)
tidyOneSample <- tidy(oneSample)
kable(tidyOneSample)
| estimate | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
|---|---|---|---|---|---|---|---|
| 34.88571 | 2.982262 | 0.0039518 | 69 | 31.61748 | 38.15395 | One Sample t-test | two.sided |
# could use:
oneSample[[3]]
## [1] 0.003951783
# but this just gives you the value (not ideal, so use broom package)
We want to look at the effects of Vitamin C on tooth growth in guinea pigs. The data (“ToothGrowth”) compares orange juice vs. vitamin C supplements (“supp”). Tooth growth is measured by length (“len”).
Let’s see what we find!
# correct code:
data(ToothGrowth)
tidy(t.test(len ~ supp, data = ToothGrowth))
## # A tibble: 1 × 10
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 3.7 20.7 17.0 1.92 0.0606 55.3 -0.171 7.57
## # ℹ 2 more variables: method <chr>, alternative <chr>
What is another way you could guess that the p-value might be non-significant based on the tidy output?
A new sleep treatment was tested in rats. They were split up into control and treatment groups (“group”) and their sleep (“extra”) was measured before treatment and after treatment. Each rat was assigned an ID number (“ID”).
Let’s see what we find:
data(sleep)
# correct code:
#tidy(t.test(extra ~ group, data = sleep, paired = T))
# or
#tidy(t.test(extra ~ group, data = sleep, paired = TRUE))
What distribution is this?
How could you tell?
Let’s run one together!
Researchers are interested in learning ways to help plants grow larger. They are trying out two different fertilizers and are including a control group. We want to see if the weights of the plants differ in the 3 groups.
data frame name: PlantGrowth group name: group weight variable: weight
data(PlantGrowth)
# correct code:
onewayANOVA <- tidy(aov(weight ~ group, data = PlantGrowth))
onewayANOVA
## # A tibble: 2 × 6
## term df sumsq meansq statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 group 2 3.77 1.88 4.85 0.0159
## 2 Residuals 27 10.5 0.389 NA NA
#or
print(onewayANOVA)
## # A tibble: 2 × 6
## term df sumsq meansq statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 group 2 3.77 1.88 4.85 0.0159
## 2 Residuals 27 10.5 0.389 NA NA
Looking at the F statistic alone, what would you guess about the conclusion?
How confident can you be in this conclusion?
Changing gears to power…
Let’s review:
Now back to coding!
Ignoring the values, what is wrong with the following code in calculating SEM for confidence intervals?
mean <- 17.2
sd <- 3.4
n <- 100
sem <- (sd/mean)
sem
## [1] 0.1976744
# correct change:
sem <- (sd/sqrt(mean))
We want a confidence interval of 95% (for alpha = .05). Is the following code correct?
ci_lb_z = mean - sem + qnorm(p = .95)
ci_ub_z = mean + sem + qnorm(p = .95)
print(c(ci_lb_z, ci_ub_z))
## [1] 18.02504 19.66467
# correct code: should be:
ci_lb_z = mean - sem + qt(p = .975, df = n-1)
ci_ub_z = mean + sem + qt(p = .975, df = n-1)
print(c(ci_lb_z, ci_ub_z))
## [1] 18.36440 20.00403
Let’s practice calculating the probability of a Type II error:
The z-score of your distribution is -1.84.
zScore <- -1.84
# correct code:
beta <- pnorm(zScore)
beta
## [1] 0.03288412
What about calculating power (probability of detecting an effect that is actually there)?
# correct code:
power <- 1 - beta
power
## [1] 0.9671159