Practice Set: Comparing Means & Power

Comparing Means Lecture
Let’s run a test!
Great job!!!

Comparing Means Lecture

Comparing Means Lecture Review Quiz

Let’s Practice!

The average student consumes 2 cups of coffee per day during the semester, but you believe that this number spikes during midterms. You survey 50 students and find the average is 2.5 cups, with a population standard deviation of 1 cup.

You calculate the z-statistic and it is -3.54.

Calculate the probability of getting this # of cups, or more extreme, given the null is true.

# { have them tell me what to code }

# correct code:
pnorm(-3.54)*2

## [1] 0.000400127

The t distribution has fatter tails because it has to pay a price for not knowing population.

We recently measured the average rainfall in 70 U.S. cities (“precip”). The average rainfall in the U.S. is 30 inches. We want to determine if our 70 cities are representative of the entire population.

Let’s run a test!

The name of the data and variable is “precip” and the mean of entire population is 30.

data(InsectSprays)
# correct code:
oneSample <- t.test(precip, mu = 30, data = precip)
# or
oneSample <- t.test(precip, mu = 30)
oneSample

## 
## 	One Sample t-test
## 
## data:  precip
## t = 2.9823, df = 69, p-value = 0.003952
## alternative hypothesis: true mean is not equal to 30
## 95 percent confidence interval:
##  31.61748 38.15395
## sample estimates:
## mean of x 
##  34.88571

Is the sample mean significantly different than the population mean? What is the p-value?

# correct code:
library(broom)
library(knitr)
tidyOneSample <- tidy(oneSample)
kable(tidyOneSample)

estimate	statistic	p.value	parameter	conf.low	conf.high	method	alternative
34.88571	2.982262	0.0039518	69	31.61748	38.15395	One Sample t-test	two.sided

# could use:
oneSample[[3]]

## [1] 0.003951783

# but this just gives you the value (not ideal, so use broom package)

We want to look at the effects of Vitamin C on tooth growth in guinea pigs. The data (“ToothGrowth”) compares orange juice vs. vitamin C supplements (“supp”). Tooth growth is measured by length (“len”).

Let’s see what we find!

# correct code:
data(ToothGrowth)
tidy(t.test(len ~ supp, data = ToothGrowth))

## # A tibble: 1 × 10
##   estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
##      <dbl>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl>
## 1      3.7      20.7      17.0      1.92  0.0606      55.3   -0.171      7.57
## # ℹ 2 more variables: method <chr>, alternative <chr>

What is another way you could guess that the p-value might be non-significant based on the tidy output?

A new sleep treatment was tested in rats. They were split up into control and treatment groups (“group”) and their sleep (“extra”) was measured before treatment and after treatment. Each rat was assigned an ID number (“ID”).

Let’s see what we find:

data(sleep)
# correct code:
#tidy(t.test(extra ~ group, data = sleep, paired = T))
# or
#tidy(t.test(extra ~ group, data = sleep, paired = TRUE))

What distribution is this?

How could you tell?

Let’s run one together!

Researchers are interested in learning ways to help plants grow larger. They are trying out two different fertilizers and are including a control group. We want to see if the weights of the plants differ in the 3 groups.

data frame name: PlantGrowth group name: group weight variable: weight

data(PlantGrowth)
# correct code:
onewayANOVA <- tidy(aov(weight ~ group, data = PlantGrowth))
onewayANOVA

## # A tibble: 2 × 6
##   term         df sumsq meansq statistic p.value
##   <chr>     <dbl> <dbl>  <dbl>     <dbl>   <dbl>
## 1 group         2  3.77  1.88       4.85  0.0159
## 2 Residuals    27 10.5   0.389     NA    NA

#or
print(onewayANOVA)

## # A tibble: 2 × 6
##   term         df sumsq meansq statistic p.value
##   <chr>     <dbl> <dbl>  <dbl>     <dbl>   <dbl>
## 1 group         2  3.77  1.88       4.85  0.0159
## 2 Residuals    27 10.5   0.389     NA    NA

Looking at the F statistic alone, what would you guess about the conclusion?

How confident can you be in this conclusion?

Changing gears to power…

Let’s review:

Now back to coding!

Ignoring the values, what is wrong with the following code in calculating SEM for confidence intervals?

mean <- 17.2
sd <- 3.4
n <- 100

sem <- (sd/mean)
sem

## [1] 0.1976744

# correct change:
sem <- (sd/sqrt(mean))

We want a confidence interval of 95% (for alpha = .05). Is the following code correct?

ci_lb_z = mean - sem + qnorm(p = .95)
ci_ub_z = mean + sem + qnorm(p = .95)
print(c(ci_lb_z, ci_ub_z))

## [1] 18.02504 19.66467

# correct code: should be:
ci_lb_z = mean - sem + qt(p = .975, df = n-1)
ci_ub_z = mean + sem + qt(p = .975, df = n-1)
print(c(ci_lb_z, ci_ub_z))

## [1] 18.36440 20.00403

Let’s practice calculating the probability of a Type II error:

The z-score of your distribution is -1.84.

zScore <- -1.84

# correct code:
beta <- pnorm(zScore)
beta

## [1] 0.03288412

What about calculating power (probability of detecting an effect that is actually there)?

# correct code:
power <- 1 - beta
power

## [1] 0.9671159