Sums of squares SS=Σ(Xi−μx)2 Variance σ2=Σ(Xi−μx)2N=SSN Standard devation σ=√Σ(Xi−μx)2N=√SSN=√σ2
Sums of squares SS=Σ(Xi−¯X)2 Variance s2=Σ(Xi−¯X)2N−1=SSN−1 Standard devation s=√Σ(Xi−¯X)2N−1=√SSN−1=√s2
"Sum of the cross-products"
SPXY=Σ(Xi−μX)(Yi−μY)
SPXY=Σ(Xi−¯X)(Yi−¯Y)
What does a large, positive SP indicate? A positive relationship, same sign
What does a large, negatve SP indicate? A negative relationship, different sign
What does SP close to 0 indicate? No relationship
Sort of like the variance of two variables
σXY=Σ(Xi−μX)(Yi−μY)N
sXY=covXY=Σ(Xi−¯X)(Yi−¯Y)N−1
KXX=⎡⎢ ⎢⎣σ2XcovXYcovXZcovYXσ2YcovYZcovZXcovZYσ2Z⎤⎥ ⎥⎦
Point out that covxy is the same as covyx
Measure of association
How much two variables are linearly related
-1 to 1
Sign indicates direction of relationship
Invariant to changes in mean or scaling
Pearson product moment correlation
ρXY=ΣzXzYN=SP√SSX√SSY=σXYσXσY
rXY=ΣzXzYn−1=SP√SSX√SSY=sXYsXsY
Why is it called the Pearson Product Moment correlation? Pearson = Karl Pearson Product = multiply Moment = variance is the second moment of a distribution
Ways to think about a correlation:
How two vectors of numbers co-relate (i.e., parent & child heights)
Product of z-scores (mathematically, it is)
The average squared distance between 2 vectors in the same space
Recall that z-scores allow us to compare across units of measure; the products of standardized scores are themselves standardized.
The correlation coefficient is a standardized effect size which can be used communicate the strength of a relationship.
Correlations can be compared across studies, measures, constructs, time.
Example: the correlation between age and height among children is r=.70.
Building blocks of regression!
Cohen (1988): .1 (small), .3 (medium), .5 (large)
Meyer & Hemphill said .3 is average
It's not enough to calculate a correlation between two variables. You should always look at a figure of the data to make sure the number accurately describes the relationship. Correlations can be easily fooled by qualities of your data, like:
Skewed distributions
Outliers
Restriction of range
Nonlinearity
Multiple Groups
Reliability
p = data %>% ggplot(aes(x=x, y=y)) + geom_point()ggMarginal(p, type = "density")
data %>% ggplot(aes(x=x, y=y)) + geom_point()
data %>% ggplot(aes(x=x, y=y)) + geom_point() + geom_smooth(method = "lm", se = FALSE, color = "red") + geom_smooth(data = data[-51,], method = "lm", se = FALSE)
data %>%ggplot(aes(x=x, y=y)) + geom_point() + geom_smooth(method = "lm", se = FALSE, color = "red")
What if I told you there were scores on X could range from 97 to 103?
data %>%ggplot(aes(x=x, y=y)) + geom_point() + geom_smooth(method = "lm", se = FALSE, color = "red") + geom_point(data = real_data) + geom_smooth(method = "lm", se = FALSE, data = real_data, color = "blue")
Can you think of example where this might occur in psychology? My idea: that many psychology studies only look at undergraduates (restricted age, restricted education) -- can't use these as predictors or covariates
data %>% ggplot(aes(x=x, y=y)) + geom_point() + geom_smooth(method = "lm", se = FALSE, color = "red")
Sometimes issues that affect correlations won't appear in your graph, but you still need to know how to look for them.
Multiple groups
Low reliability (take a psychometrics or research methods course)
data %>% ggplot(aes(x=x, y=y)) + geom_point() + geom_smooth(method = "lm", se = FALSE, color = "red")
data %>% ggplot(aes(x=x, y=y, color = gender)) + geom_point() + geom_smooth(method = "lm", se = FALSE) + guides(color = F)
Which would you rather have?
Which would you rather have?
All measurement includes error
Error is random; it cannot correlate with anything
Because we don't measure our variables perfectly, we get lower correlations compared to the "true" correlations
Kind of analgous to power -- it's a ceiling
If we want valid measures, they need to be reliable
Error is random; it cannot correlate with anything
Because we don't measure our variables perfectly, we get lower correlations compared to the "true" correlations
Kind of analgous to power -- it's a ceiling
If we want valid measures, they need to be reliable
If you're going to measure something, do it well
Applies to ALL IVs and DVs, and all designs
Remember this when interpreting other research
Correlations are both a descriptive and an inferential statistic. As a descriptive statistic, they're useful for understanding what's going on in a larger dataset.
Like we use the summary()
or describe()
(psych) functions to examine our dataset before we run any infernetial tests, we should also look at the correlation matrix.
library(psych)data(bfi)head(bfi)
## A1 A2 A3 A4 A5 C1 C2 C3 C4 C5 E1 E2 E3 E4 E5 N1 N2 N3 N4 N5 O1 O2 O3 O4## 61617 2 4 3 4 4 2 3 3 4 4 3 3 3 4 4 3 4 2 2 3 3 6 3 4## 61618 2 4 5 2 5 5 4 4 3 4 1 1 6 4 3 3 3 3 5 5 4 2 4 3## 61620 5 4 5 4 4 4 5 4 2 5 2 4 4 4 5 4 5 4 2 3 4 2 5 5## 61621 4 4 6 5 5 4 4 3 5 5 5 3 4 4 4 2 5 2 4 1 3 3 4 3## 61622 2 3 3 4 5 4 4 5 3 2 2 2 5 4 5 2 3 4 4 3 3 3 4 3## 61623 6 6 5 6 5 6 6 6 1 3 2 1 6 5 6 3 5 2 2 3 4 3 5 6## O5 gender education age## 61617 3 1 NA 16## 61618 3 2 NA 18## 61620 2 2 NA 17## 61621 5 2 NA 17## 61622 3 1 NA 17## 61623 1 2 3 21
cor(bfi)
## A1 A2 A3 A4 A5 C1 C2 C3 C4 C5 E1 E2 E3 E4 E5 N1 N2 N3 N4 N5 O1## A1 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA## A2 NA 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA## A3 NA NA 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA## A4 NA NA NA 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA## A5 NA NA NA NA 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA## C1 NA NA NA NA NA 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA## C2 NA NA NA NA NA NA 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA## C3 NA NA NA NA NA NA NA 1 NA NA NA NA NA NA NA NA NA NA NA NA NA## C4 NA NA NA NA NA NA NA NA 1 NA NA NA NA NA NA NA NA NA NA NA NA## C5 NA NA NA NA NA NA NA NA NA 1 NA NA NA NA NA NA NA NA NA NA NA## E1 NA NA NA NA NA NA NA NA NA NA 1 NA NA NA NA NA NA NA NA NA NA## E2 NA NA NA NA NA NA NA NA NA NA NA 1 NA NA NA NA NA NA NA NA NA## E3 NA NA NA NA NA NA NA NA NA NA NA NA 1 NA NA NA NA NA NA NA NA## E4 NA NA NA NA NA NA NA NA NA NA NA NA NA 1 NA NA NA NA NA NA NA## E5 NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1 NA NA NA NA NA NA## N1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1 NA NA NA NA NA## N2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1 NA NA NA NA## N3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1 NA NA NA## N4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1 NA NA## N5 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1 NA## O1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1## O2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA## O3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA## O4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA## O5 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA## gender NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA## education NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA## age NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA## O2 O3 O4 O5 gender education age## A1 NA NA NA NA NA NA NA## A2 NA NA NA NA NA NA NA## A3 NA NA NA NA NA NA NA## A4 NA NA NA NA NA NA NA## A5 NA NA NA NA NA NA NA## C1 NA NA NA NA NA NA NA## C2 NA NA NA NA NA NA NA## C3 NA NA NA NA NA NA NA## C4 NA NA NA NA NA NA NA## C5 NA NA NA NA NA NA NA## E1 NA NA NA NA NA NA NA## E2 NA NA NA NA NA NA NA## E3 NA NA NA NA NA NA NA## E4 NA NA NA NA NA NA NA## E5 NA NA NA NA NA NA NA## N1 NA NA NA NA NA NA NA## N2 NA NA NA NA NA NA NA## N3 NA NA NA NA NA NA NA## N4 NA NA NA NA NA NA NA## N5 NA NA NA NA NA NA NA## O1 NA NA NA NA NA NA NA## O2 1.00000000 NA NA NA 0.02694778 NA -0.04254386## O3 NA 1 NA NA NA NA NA## O4 NA NA 1 NA NA NA NA## O5 NA NA NA 1 NA NA NA## gender 0.02694778 NA NA NA 1.00000000 NA 0.04770347## education NA NA NA NA NA 1 NA## age -0.04254386 NA NA NA 0.04770347 NA 1.00000000
round(cor(bfi, use = "pairwise"),2)
## A1 A2 A3 A4 A5 C1 C2 C3 C4 C5 E1## A1 1.00 -0.34 -0.27 -0.15 -0.18 0.03 0.02 -0.02 0.13 0.05 0.11## A2 -0.34 1.00 0.49 0.34 0.39 0.09 0.14 0.19 -0.15 -0.12 -0.21## A3 -0.27 0.49 1.00 0.36 0.50 0.10 0.14 0.13 -0.12 -0.16 -0.21## A4 -0.15 0.34 0.36 1.00 0.31 0.09 0.23 0.13 -0.15 -0.24 -0.11## A5 -0.18 0.39 0.50 0.31 1.00 0.12 0.11 0.13 -0.13 -0.17 -0.25## C1 0.03 0.09 0.10 0.09 0.12 1.00 0.43 0.31 -0.34 -0.25 -0.02## C2 0.02 0.14 0.14 0.23 0.11 0.43 1.00 0.36 -0.38 -0.30 0.02## C3 -0.02 0.19 0.13 0.13 0.13 0.31 0.36 1.00 -0.34 -0.34 0.00## C4 0.13 -0.15 -0.12 -0.15 -0.13 -0.34 -0.38 -0.34 1.00 0.48 0.09## C5 0.05 -0.12 -0.16 -0.24 -0.17 -0.25 -0.30 -0.34 0.48 1.00 0.06## E1 0.11 -0.21 -0.21 -0.11 -0.25 -0.02 0.02 0.00 0.09 0.06 1.00## E2 0.09 -0.23 -0.29 -0.19 -0.33 -0.09 -0.06 -0.08 0.20 0.26 0.47## E3 -0.05 0.25 0.39 0.19 0.42 0.12 0.15 0.09 -0.08 -0.16 -0.33## E4 -0.06 0.28 0.38 0.30 0.47 0.14 0.12 0.09 -0.11 -0.20 -0.42## E5 -0.02 0.29 0.25 0.16 0.27 0.25 0.25 0.21 -0.24 -0.23 -0.30## N1 0.17 -0.09 -0.08 -0.10 -0.20 -0.07 -0.02 -0.07 0.22 0.21 0.02## N2 0.14 -0.05 -0.09 -0.14 -0.19 -0.04 -0.01 -0.06 0.16 0.25 0.01## N3 0.10 -0.04 -0.04 -0.07 -0.14 -0.03 0.00 -0.07 0.21 0.24 0.05## N4 0.05 -0.09 -0.13 -0.17 -0.20 -0.10 -0.05 -0.11 0.26 0.34 0.23## N5 0.02 0.02 -0.04 -0.01 -0.08 -0.05 0.05 -0.01 0.20 0.17 0.05## O1 0.01 0.13 0.15 0.06 0.16 0.17 0.16 0.09 -0.09 -0.08 -0.10## O2 0.08 0.02 0.00 0.04 0.00 -0.11 -0.04 -0.03 0.21 0.14 0.04## O3 -0.06 0.16 0.22 0.07 0.24 0.19 0.19 0.06 -0.08 -0.08 -0.22## O4 -0.08 0.09 0.04 -0.04 0.02 0.11 0.06 0.02 0.05 0.14 0.08## O5 0.11 -0.09 -0.05 0.02 -0.05 -0.12 -0.05 -0.01 0.20 0.06 0.10## gender -0.16 0.18 0.14 0.13 0.10 0.01 0.07 0.05 -0.08 -0.09 -0.13## education -0.14 0.01 0.00 -0.02 0.01 0.03 0.00 0.05 -0.04 0.03 0.00## age -0.16 0.11 0.07 0.14 0.13 0.08 0.02 0.07 -0.15 -0.09 -0.03## E2 E3 E4 E5 N1 N2 N3 N4 N5 O1 O2## A1 0.09 -0.05 -0.06 -0.02 0.17 0.14 0.10 0.05 0.02 0.01 0.08## A2 -0.23 0.25 0.28 0.29 -0.09 -0.05 -0.04 -0.09 0.02 0.13 0.02## A3 -0.29 0.39 0.38 0.25 -0.08 -0.09 -0.04 -0.13 -0.04 0.15 0.00## A4 -0.19 0.19 0.30 0.16 -0.10 -0.14 -0.07 -0.17 -0.01 0.06 0.04## A5 -0.33 0.42 0.47 0.27 -0.20 -0.19 -0.14 -0.20 -0.08 0.16 0.00## C1 -0.09 0.12 0.14 0.25 -0.07 -0.04 -0.03 -0.10 -0.05 0.17 -0.11## C2 -0.06 0.15 0.12 0.25 -0.02 -0.01 0.00 -0.05 0.05 0.16 -0.04## C3 -0.08 0.09 0.09 0.21 -0.07 -0.06 -0.07 -0.11 -0.01 0.09 -0.03## C4 0.20 -0.08 -0.11 -0.24 0.22 0.16 0.21 0.26 0.20 -0.09 0.21## C5 0.26 -0.16 -0.20 -0.23 0.21 0.25 0.24 0.34 0.17 -0.08 0.14## E1 0.47 -0.33 -0.42 -0.30 0.02 0.01 0.05 0.23 0.05 -0.10 0.04## E2 1.00 -0.38 -0.51 -0.37 0.17 0.19 0.20 0.35 0.25 -0.16 0.08## E3 -0.38 1.00 0.42 0.38 -0.05 -0.07 -0.02 -0.15 -0.07 0.33 -0.07## E4 -0.51 0.42 1.00 0.32 -0.14 -0.14 -0.10 -0.29 -0.09 0.14 0.06## E5 -0.37 0.38 0.32 1.00 0.04 0.04 -0.06 -0.21 -0.13 0.30 -0.08## N1 0.17 -0.05 -0.14 0.04 1.00 0.71 0.56 0.40 0.38 -0.05 0.13## N2 0.19 -0.07 -0.14 0.04 0.71 1.00 0.55 0.39 0.35 -0.05 0.13## N3 0.20 -0.02 -0.10 -0.06 0.56 0.55 1.00 0.52 0.43 -0.03 0.11## N4 0.35 -0.15 -0.29 -0.21 0.40 0.39 0.52 1.00 0.40 -0.05 0.08## N5 0.25 -0.07 -0.09 -0.13 0.38 0.35 0.43 0.40 1.00 -0.12 0.20## O1 -0.16 0.33 0.14 0.30 -0.05 -0.05 -0.03 -0.05 -0.12 1.00 -0.21## O2 0.08 -0.07 0.06 -0.08 0.13 0.13 0.11 0.08 0.20 -0.21 1.00## O3 -0.23 0.39 0.21 0.29 -0.05 -0.03 -0.03 -0.06 -0.08 0.40 -0.26## O4 0.17 0.05 -0.10 0.00 0.08 0.13 0.18 0.21 0.11 0.18 -0.07## O5 0.08 -0.11 0.05 -0.11 0.11 0.04 0.06 0.04 0.14 -0.24 0.32## gender -0.05 0.05 0.08 0.07 0.04 0.10 0.12 0.00 0.21 -0.10 0.03## education -0.01 0.00 -0.04 0.06 -0.05 -0.05 -0.05 0.01 -0.05 0.03 -0.09## age -0.11 0.00 -0.01 0.11 -0.09 -0.10 -0.11 -0.03 -0.10 0.05 -0.04## O3 O4 O5 gender education age## A1 -0.06 -0.08 0.11 -0.16 -0.14 -0.16## A2 0.16 0.09 -0.09 0.18 0.01 0.11## A3 0.22 0.04 -0.05 0.14 0.00 0.07## A4 0.07 -0.04 0.02 0.13 -0.02 0.14## A5 0.24 0.02 -0.05 0.10 0.01 0.13## C1 0.19 0.11 -0.12 0.01 0.03 0.08## C2 0.19 0.06 -0.05 0.07 0.00 0.02## C3 0.06 0.02 -0.01 0.05 0.05 0.07## C4 -0.08 0.05 0.20 -0.08 -0.04 -0.15## C5 -0.08 0.14 0.06 -0.09 0.03 -0.09## E1 -0.22 0.08 0.10 -0.13 0.00 -0.03## E2 -0.23 0.17 0.08 -0.05 -0.01 -0.11## E3 0.39 0.05 -0.11 0.05 0.00 0.00## E4 0.21 -0.10 0.05 0.08 -0.04 -0.01## E5 0.29 0.00 -0.11 0.07 0.06 0.11## N1 -0.05 0.08 0.11 0.04 -0.05 -0.09## N2 -0.03 0.13 0.04 0.10 -0.05 -0.10## N3 -0.03 0.18 0.06 0.12 -0.05 -0.11## N4 -0.06 0.21 0.04 0.00 0.01 -0.03## N5 -0.08 0.11 0.14 0.21 -0.05 -0.10## O1 0.40 0.18 -0.24 -0.10 0.03 0.05## O2 -0.26 -0.07 0.32 0.03 -0.09 -0.04## O3 1.00 0.19 -0.31 -0.04 0.09 0.04## O4 0.19 1.00 -0.18 0.00 0.05 0.01## O5 -0.31 -0.18 1.00 0.02 -0.06 -0.10## gender -0.04 0.00 0.02 1.00 0.01 0.05## education 0.09 0.05 -0.06 0.01 1.00 0.24## age 0.04 0.01 -0.10 0.05 0.24 1.00
round(cor(bfi, use = "complete"),2)
## A1 A2 A3 A4 A5 C1 C2 C3 C4 C5 E1## A1 1.00 -0.34 -0.26 -0.14 -0.19 0.02 0.01 -0.01 0.10 0.02 0.12## A2 -0.34 1.00 0.48 0.34 0.38 0.09 0.13 0.19 -0.14 -0.11 -0.24## A3 -0.26 0.48 1.00 0.38 0.50 0.10 0.14 0.13 -0.12 -0.15 -0.22## A4 -0.14 0.34 0.38 1.00 0.32 0.08 0.22 0.13 -0.16 -0.24 -0.14## A5 -0.19 0.38 0.50 0.32 1.00 0.12 0.11 0.13 -0.12 -0.16 -0.25## C1 0.02 0.09 0.10 0.08 0.12 1.00 0.43 0.32 -0.35 -0.25 -0.03## C2 0.01 0.13 0.14 0.22 0.11 0.43 1.00 0.36 -0.38 -0.30 0.02## C3 -0.01 0.19 0.13 0.13 0.13 0.32 0.36 1.00 -0.35 -0.35 -0.02## C4 0.10 -0.14 -0.12 -0.16 -0.12 -0.35 -0.38 -0.35 1.00 0.48 0.10## C5 0.02 -0.11 -0.15 -0.24 -0.16 -0.25 -0.30 -0.35 0.48 1.00 0.07## E1 0.12 -0.24 -0.22 -0.14 -0.25 -0.03 0.02 -0.02 0.10 0.07 1.00## E2 0.08 -0.24 -0.29 -0.20 -0.33 -0.10 -0.07 -0.09 0.21 0.26 0.47## E3 -0.04 0.25 0.38 0.20 0.41 0.13 0.15 0.10 -0.09 -0.17 -0.33## E4 -0.07 0.30 0.39 0.33 0.48 0.14 0.12 0.10 -0.12 -0.21 -0.42## E5 -0.02 0.30 0.26 0.16 0.27 0.26 0.25 0.22 -0.23 -0.24 -0.31## N1 0.16 -0.08 -0.07 -0.09 -0.19 -0.06 -0.02 -0.08 0.21 0.21 0.01## N2 0.13 -0.04 -0.08 -0.15 -0.19 -0.03 0.00 -0.06 0.15 0.24 0.01## N3 0.09 -0.02 -0.03 -0.07 -0.13 -0.01 0.01 -0.07 0.20 0.23 0.05## N4 0.04 -0.09 -0.13 -0.16 -0.21 -0.09 -0.04 -0.13 0.28 0.35 0.23## N5 0.01 0.02 -0.04 0.00 -0.08 -0.05 0.05 -0.04 0.21 0.18 0.04## O1 0.00 0.11 0.14 0.04 0.15 0.18 0.16 0.09 -0.10 -0.09 -0.10## O2 0.07 0.03 0.03 0.05 0.00 -0.13 -0.05 -0.03 0.21 0.12 0.06## O3 -0.06 0.15 0.22 0.04 0.22 0.19 0.18 0.06 -0.07 -0.07 -0.21## O4 -0.09 0.05 0.02 -0.06 0.00 0.08 0.03 0.00 0.07 0.14 0.08## O5 0.11 -0.08 -0.04 0.04 -0.04 -0.13 -0.06 0.00 0.18 0.05 0.09## gender -0.17 0.21 0.16 0.13 0.11 0.00 0.06 0.04 -0.07 -0.09 -0.15## education -0.14 0.02 0.00 -0.02 0.02 0.04 0.01 0.06 -0.04 0.04 0.00## age -0.14 0.09 0.04 0.11 0.10 0.08 0.00 0.05 -0.12 -0.07 -0.03## E2 E3 E4 E5 N1 N2 N3 N4 N5 O1 O2## A1 0.08 -0.04 -0.07 -0.02 0.16 0.13 0.09 0.04 0.01 0.00 0.07## A2 -0.24 0.25 0.30 0.30 -0.08 -0.04 -0.02 -0.09 0.02 0.11 0.03## A3 -0.29 0.38 0.39 0.26 -0.07 -0.08 -0.03 -0.13 -0.04 0.14 0.03## A4 -0.20 0.20 0.33 0.16 -0.09 -0.15 -0.07 -0.16 0.00 0.04 0.05## A5 -0.33 0.41 0.48 0.27 -0.19 -0.19 -0.13 -0.21 -0.08 0.15 0.00## C1 -0.10 0.13 0.14 0.26 -0.06 -0.03 -0.01 -0.09 -0.05 0.18 -0.13## C2 -0.07 0.15 0.12 0.25 -0.02 0.00 0.01 -0.04 0.05 0.16 -0.05## C3 -0.09 0.10 0.10 0.22 -0.08 -0.06 -0.07 -0.13 -0.04 0.09 -0.03## C4 0.21 -0.09 -0.12 -0.23 0.21 0.15 0.20 0.28 0.21 -0.10 0.21## C5 0.26 -0.17 -0.21 -0.24 0.21 0.24 0.23 0.35 0.18 -0.09 0.12## E1 0.47 -0.33 -0.42 -0.31 0.01 0.01 0.05 0.23 0.04 -0.10 0.06## E2 1.00 -0.40 -0.52 -0.39 0.17 0.20 0.19 0.35 0.26 -0.16 0.08## E3 -0.40 1.00 0.43 0.40 -0.04 -0.06 -0.01 -0.15 -0.09 0.33 -0.07## E4 -0.52 0.43 1.00 0.33 -0.14 -0.15 -0.13 -0.31 -0.09 0.12 0.05## E5 -0.39 0.40 0.33 1.00 0.04 0.05 -0.06 -0.21 -0.14 0.29 -0.09## N1 0.17 -0.04 -0.14 0.04 1.00 0.71 0.57 0.41 0.38 -0.05 0.14## N2 0.20 -0.06 -0.15 0.05 0.71 1.00 0.55 0.39 0.35 -0.05 0.12## N3 0.19 -0.01 -0.13 -0.06 0.57 0.55 1.00 0.52 0.43 -0.05 0.11## N4 0.35 -0.15 -0.31 -0.21 0.41 0.39 0.52 1.00 0.40 -0.06 0.08## N5 0.26 -0.09 -0.09 -0.14 0.38 0.35 0.43 0.40 1.00 -0.15 0.20## O1 -0.16 0.33 0.12 0.29 -0.05 -0.05 -0.05 -0.06 -0.15 1.00 -0.23## O2 0.08 -0.07 0.05 -0.09 0.14 0.12 0.11 0.08 0.20 -0.23 1.00## O3 -0.24 0.41 0.21 0.30 -0.03 -0.02 -0.03 -0.06 -0.08 0.39 -0.29## O4 0.17 0.04 -0.10 -0.02 0.09 0.13 0.17 0.23 0.11 0.17 -0.08## O5 0.08 -0.13 0.04 -0.11 0.10 0.02 0.05 0.03 0.14 -0.25 0.33## gender -0.08 0.05 0.11 0.08 0.04 0.09 0.11 -0.02 0.21 -0.11 0.04## education -0.01 0.01 -0.03 0.06 -0.04 -0.04 -0.04 0.01 -0.05 0.03 -0.10## age -0.10 -0.02 -0.01 0.10 -0.07 -0.09 -0.11 -0.02 -0.10 0.05 -0.04## O3 O4 O5 gender education age## A1 -0.06 -0.09 0.11 -0.17 -0.14 -0.14## A2 0.15 0.05 -0.08 0.21 0.02 0.09## A3 0.22 0.02 -0.04 0.16 0.00 0.04## A4 0.04 -0.06 0.04 0.13 -0.02 0.11## A5 0.22 0.00 -0.04 0.11 0.02 0.10## C1 0.19 0.08 -0.13 0.00 0.04 0.08## C2 0.18 0.03 -0.06 0.06 0.01 0.00## C3 0.06 0.00 0.00 0.04 0.06 0.05## C4 -0.07 0.07 0.18 -0.07 -0.04 -0.12## C5 -0.07 0.14 0.05 -0.09 0.04 -0.07## E1 -0.21 0.08 0.09 -0.15 0.00 -0.03## E2 -0.24 0.17 0.08 -0.08 -0.01 -0.10## E3 0.41 0.04 -0.13 0.05 0.01 -0.02## E4 0.21 -0.10 0.04 0.11 -0.03 -0.01## E5 0.30 -0.02 -0.11 0.08 0.06 0.10## N1 -0.03 0.09 0.10 0.04 -0.04 -0.07## N2 -0.02 0.13 0.02 0.09 -0.04 -0.09## N3 -0.03 0.17 0.05 0.11 -0.04 -0.11## N4 -0.06 0.23 0.03 -0.02 0.01 -0.02## N5 -0.08 0.11 0.14 0.21 -0.05 -0.10## O1 0.39 0.17 -0.25 -0.11 0.03 0.05## O2 -0.29 -0.08 0.33 0.04 -0.10 -0.04## O3 1.00 0.17 -0.32 -0.04 0.10 0.02## O4 0.17 1.00 -0.18 -0.04 0.06 0.00## O5 -0.32 -0.18 1.00 0.04 -0.06 -0.08## gender -0.04 -0.04 0.04 1.00 0.01 0.05## education 0.10 0.06 -0.06 0.01 1.00 0.25## age 0.02 0.00 -0.08 0.05 0.25 1.00
With pairwise deletion, different sets of cases contribute to different correlations. That maximizes the sample sizes, but can lead to problems if the data are missing for some systematic reason.
With pairwise deletion, different sets of cases contribute to different correlations. That maximizes the sample sizes, but can lead to problems if the data are missing for some systematic reason.
Listwise deletion (often referred to in R
as use complete cases) doesn't have the same issue of biasing correlations, but does result in smaller samples and potentially limited generalizability.
With pairwise deletion, different sets of cases contribute to different correlations. That maximizes the sample sizes, but can lead to problems if the data are missing for some systematic reason.
Listwise deletion (often referred to in R
as use complete cases) doesn't have the same issue of biasing correlations, but does result in smaller samples and potentially limited generalizability.
A good practice is comparing the different matrices; if the correlation values are very different, this suggests that the missingness that affects pairwise deletion is systematic.
round(cor(bfi, use = "pairwise")- cor(bfi, use = "complete"),2)
## A1 A2 A3 A4 A5 C1 C2 C3 C4 C5 E1## A1 0.00 0.00 0.00 0.00 0.00 0.01 0.00 -0.01 0.03 0.03 -0.01## A2 0.00 0.00 0.00 -0.01 0.01 0.00 0.01 0.01 -0.01 -0.01 0.03## A3 0.00 0.00 0.00 -0.02 0.00 0.00 0.00 0.00 0.00 -0.01 0.00## A4 0.00 -0.01 -0.02 0.00 -0.01 0.01 0.01 0.00 0.01 0.00 0.03## A5 0.00 0.01 0.00 -0.01 0.00 0.00 0.00 0.00 -0.01 -0.01 0.00## C1 0.01 0.00 0.00 0.01 0.00 0.00 0.00 -0.01 0.01 0.00 0.00## C2 0.00 0.01 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 -0.01## C3 -0.01 0.01 0.00 0.00 0.00 -0.01 0.00 0.00 0.02 0.01 0.02## C4 0.03 -0.01 0.00 0.01 -0.01 0.01 0.00 0.02 0.00 -0.01 -0.01## C5 0.03 -0.01 -0.01 0.00 -0.01 0.00 0.00 0.01 -0.01 0.00 0.00## E1 -0.01 0.03 0.00 0.03 0.00 0.00 -0.01 0.02 -0.01 0.00 0.00## E2 0.01 0.01 0.00 0.01 0.00 0.01 0.01 0.01 -0.01 0.00 0.00## E3 0.00 0.00 0.00 -0.01 0.00 -0.02 0.00 -0.02 0.01 0.01 0.01## E4 0.01 -0.02 -0.02 -0.03 -0.01 0.00 0.00 -0.01 0.01 0.01 0.00## E5 0.00 0.00 -0.01 0.00 0.00 -0.01 0.00 0.00 0.00 0.01 0.00## N1 0.01 -0.01 -0.02 0.00 0.00 -0.01 0.00 0.01 0.01 0.01 0.01## N2 0.01 -0.01 0.00 0.00 0.00 -0.01 -0.01 0.00 0.01 0.01 0.01## N3 0.01 -0.02 -0.01 0.00 -0.01 -0.02 -0.01 0.01 0.01 0.01 0.00## N4 0.01 0.00 0.00 -0.01 0.01 -0.01 -0.01 0.02 -0.02 -0.01 0.00## N5 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.02 -0.02 -0.01 0.01## O1 0.01 0.02 0.00 0.02 0.02 -0.01 0.01 0.00 0.01 0.01 0.00## O2 0.01 -0.02 -0.03 -0.01 0.00 0.02 0.01 0.00 0.00 0.02 -0.01## O3 0.00 0.02 0.01 0.03 0.02 0.00 0.01 0.01 -0.01 -0.01 0.00## O4 0.01 0.03 0.01 0.02 0.01 0.03 0.03 0.02 -0.02 0.00 -0.01## O5 0.01 -0.01 -0.01 -0.01 -0.01 0.01 0.00 -0.01 0.01 0.01 0.01## gender 0.01 -0.03 -0.02 0.00 -0.01 0.01 0.01 0.01 -0.01 0.00 0.02## education 0.00 -0.01 -0.01 0.00 0.00 -0.01 -0.01 -0.01 0.00 -0.01 0.00## age -0.02 0.02 0.03 0.03 0.03 0.00 0.02 0.02 -0.03 -0.01 0.01## E2 E3 E4 E5 N1 N2 N3 N4 N5 O1 O2## A1 0.01 0.00 0.01 0.00 0.01 0.01 0.01 0.01 0.01 0.01 0.01## A2 0.01 0.00 -0.02 0.00 -0.01 -0.01 -0.02 0.00 0.00 0.02 -0.02## A3 0.00 0.00 -0.02 -0.01 -0.02 0.00 -0.01 0.00 0.00 0.00 -0.03## A4 0.01 -0.01 -0.03 0.00 0.00 0.00 0.00 -0.01 0.00 0.02 -0.01## A5 0.00 0.00 -0.01 0.00 0.00 0.00 -0.01 0.01 0.00 0.02 0.00## C1 0.01 -0.02 0.00 -0.01 -0.01 -0.01 -0.02 -0.01 0.00 -0.01 0.02## C2 0.01 0.00 0.00 0.00 0.00 -0.01 -0.01 -0.01 0.00 0.01 0.01## C3 0.01 -0.02 -0.01 0.00 0.01 0.00 0.01 0.02 0.02 0.00 0.00## C4 -0.01 0.01 0.01 0.00 0.01 0.01 0.01 -0.02 -0.02 0.01 0.00## C5 0.00 0.01 0.01 0.01 0.01 0.01 0.01 -0.01 -0.01 0.01 0.02## E1 0.00 0.01 0.00 0.00 0.01 0.01 0.00 0.00 0.01 0.00 -0.01## E2 0.00 0.02 0.01 0.02 0.00 0.00 0.01 -0.01 0.00 0.00 0.00## E3 0.02 0.00 -0.01 -0.02 -0.01 -0.01 -0.01 0.01 0.01 0.00 0.01## E4 0.01 -0.01 0.00 -0.02 0.01 0.01 0.03 0.02 0.00 0.01 0.01## E5 0.02 -0.02 -0.02 0.00 0.00 -0.01 0.00 0.00 0.01 0.00 0.00## N1 0.00 -0.01 0.01 0.00 0.00 0.00 -0.01 -0.01 -0.01 0.00 -0.01## N2 0.00 -0.01 0.01 -0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00## N3 0.01 -0.01 0.03 0.00 -0.01 0.00 0.00 0.00 0.00 0.01 0.00## N4 -0.01 0.01 0.02 0.00 -0.01 0.00 0.00 0.00 0.00 0.01 0.00## N5 0.00 0.01 0.00 0.01 -0.01 0.00 0.00 0.00 0.00 0.03 0.00## O1 0.00 0.00 0.01 0.00 0.00 0.00 0.01 0.01 0.03 0.00 0.02## O2 0.00 0.01 0.01 0.00 -0.01 0.00 0.00 0.00 0.00 0.02 0.00## O3 0.02 -0.02 0.00 0.00 -0.02 -0.01 0.00 0.00 0.01 0.00 0.03## O4 0.00 0.01 0.01 0.01 -0.01 0.00 0.01 -0.02 0.01 0.01 0.01## O5 0.00 0.02 0.01 0.00 0.01 0.02 0.01 0.01 -0.01 0.01 -0.01## gender 0.02 -0.01 -0.03 -0.01 0.01 0.00 0.01 0.02 0.00 0.01 -0.02## education 0.00 0.00 -0.01 0.00 0.00 -0.01 -0.01 0.00 -0.01 -0.01 0.01## age 0.00 0.02 0.00 0.02 -0.01 -0.01 0.00 -0.01 0.00 0.00 0.00## O3 O4 O5 gender education age## A1 0.00 0.01 0.01 0.01 0.00 -0.02## A2 0.02 0.03 -0.01 -0.03 -0.01 0.02## A3 0.01 0.01 -0.01 -0.02 -0.01 0.03## A4 0.03 0.02 -0.01 0.00 0.00 0.03## A5 0.02 0.01 -0.01 -0.01 0.00 0.03## C1 0.00 0.03 0.01 0.01 -0.01 0.00## C2 0.01 0.03 0.00 0.01 -0.01 0.02## C3 0.01 0.02 -0.01 0.01 -0.01 0.02## C4 -0.01 -0.02 0.01 -0.01 0.00 -0.03## C5 -0.01 0.00 0.01 0.00 -0.01 -0.01## E1 0.00 -0.01 0.01 0.02 0.00 0.01## E2 0.02 0.00 0.00 0.02 0.00 0.00## E3 -0.02 0.01 0.02 -0.01 0.00 0.02## E4 0.00 0.01 0.01 -0.03 -0.01 0.00## E5 0.00 0.01 0.00 -0.01 0.00 0.02## N1 -0.02 -0.01 0.01 0.01 0.00 -0.01## N2 -0.01 0.00 0.02 0.00 -0.01 -0.01## N3 0.00 0.01 0.01 0.01 -0.01 0.00## N4 0.00 -0.02 0.01 0.02 0.00 -0.01## N5 0.01 0.01 -0.01 0.00 -0.01 0.00## O1 0.00 0.01 0.01 0.01 -0.01 0.00## O2 0.03 0.01 -0.01 -0.02 0.01 0.00## O3 0.00 0.02 0.01 0.01 0.00 0.01## O4 0.02 0.00 0.00 0.03 -0.01 0.01## O5 0.01 0.00 0.00 -0.01 0.00 -0.02## gender 0.01 0.03 -0.01 0.00 0.00 0.00## education 0.00 -0.01 0.00 0.00 0.00 -0.01## age 0.01 0.01 -0.02 0.00 -0.01 0.00
For a single correlation, best practice is to visualize the relationship using a scatterplot. A best fit line is advised, as it can help clarify the strength and direction of the relationship.
http://guessthecorrelation.com/
A single correlation can be informative; a correlation matrix is more than the sum of its parts.
Correlation matrices can be used to infer larger patterns of relationships. You may be one of the gifted who can look at a matrix of numbers and see those patterns immediately. Or you can use heat maps to visualize correlation matrices.
library(corrplot)
corrplot(cor(bfi, use = "pairwise"), method = "square")
Descriptive statistic; describing the strength of association/relationship
Inferential statistic + hypothesis testing:
Descriptive statistic; describing the strength of association/relationship
Inferential statistic + hypothesis testing:
** Need to use a Fisher r to z' transformation
If we want to make calculations based on ρ≠0 then we will run into a skewed sampling distribution.
Skewed sampling distribution will rear its head when:
H0:ρ≠0
Calculating confidence intervals
Testing two correlations against one another
r to z':
z′=12ln1+r1−r
ln = natural log
No longer bounded by 1 & -1
library(psych)fisherz(r)fisherz2r(z)
Use when...
Spearman correlation coefficient
Point-biserial correlation coefficient
Phi ( ϕ ) coefficient
Regression!
Sums of squares SS=Σ(Xi−μx)2 Variance σ2=Σ(Xi−μx)2N=SSN Standard devation σ=√Σ(Xi−μx)2N=√SSN=√σ2
Sums of squares SS=Σ(Xi−¯X)2 Variance s2=Σ(Xi−¯X)2N−1=SSN−1 Standard devation s=√Σ(Xi−¯X)2N−1=√SSN−1=√s2
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |