ANOVA is a statistical method used to analyze the differences among group means in a sample. It tests whether there are any significant differences between the means of two or more groups.
Key Points:
Purpose: To determine if there are any significant differences between the means of two or more groups.
Types: One-way ANOVA (for one independent variable), Two-way ANOVA (for two independent variables).
Assumptions: Normally distributed data, homogeneity of variances (tested using Levene’s test), independence of observations.
R Packages: stats, car, afex
# Load required packagesoptions(repos =c(CRAN ="https://cloud.r-project.org"))library(stats) # Base R package for ANOVAlibrary(car) # For ANOVA with Type III sums of squares
Loading required package: carData
# One-way ANOVAmodel <-aov(mpg ~ cyl, data = mtcars)summary(model)
Df Sum Sq Mean Sq F value Pr(>F)
cyl 1 817.7 817.7 79.56 6.11e-10 ***
Residuals 30 308.3 10.3
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Two-way ANOVA (Factorial design)model <-aov(mpg ~ cyl + vs + cyl:vs, data = mtcars)summary(model)
Df Sum Sq Mean Sq F value Pr(>F)
cyl 1 817.7 817.7 77.035 1.58e-09 ***
vs 1 2.4 2.4 0.224 0.640
cyl:vs 1 8.7 8.7 0.823 0.372
Residuals 28 297.2 10.6
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Notes:
Factorial designs: involve manipulating multiple independent variables (IVs) across different levels to observe their combined effects on dependent variables (DVs).
RM ANOVA: is used when measurements are taken from the same subjects over multiple time points or conditions.
2. ANCOVA (Analysis of Covariance)
ANCOVA extends ANOVA by including one or more continuous variables (covariates) in addition to the categorical independent variable(s). It adjusts group means based on these covariates.
Key Points:
Purpose: To compare group means while statistically controlling for the effects of one or more covariates.
Assumptions: Linearity between covariates and dependent variable, homogeneity of regression slopes.
R Packages: car, lmtest
model <-lm(mpg ~ wt +factor(am), data = mtcars)summary(model)
Call:
lm(formula = mpg ~ wt + factor(am), data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.5295 -2.3619 -0.1317 1.4025 6.8782
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.32155 3.05464 12.218 5.84e-13 ***
wt -5.35281 0.78824 -6.791 1.87e-07 ***
factor(am)1 -0.02362 1.54565 -0.015 0.988
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.098 on 29 degrees of freedom
Multiple R-squared: 0.7528, Adjusted R-squared: 0.7358
F-statistic: 44.17 on 2 and 29 DF, p-value: 1.579e-09
Notes:
Covariate (CV): Acts to adjust group means based on pretest or unexplained exogenous variables. Must be continuous and have a linear relationship with the dependent variable (DV).
Reasons to use ANCOVA: Provides more power and adjusts for confounding variables.
Reasons not to use ANCOVA: Badly chosen variables may obscure real differences. Also, do not use ANCOVA if your covariate is related to group membership!!
3. MANOVA (Multivariate Analysis of Variance)
MANOVA is used when there are multiple dependent variables (DVs) and one or more independent variables (IVs). It tests the simultaneous effect of multiple IVs on multiple DVs.
Key Points:
Purpose: To test the joint effect of multiple independent variables on multiple dependent variables.
Assumptions: Multivariate normality, homogeneity of covariance matrices.
R Packages: MANOVA, car, multcomp
data("iris")model <-manova(cbind(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) ~ Species, data = iris)summary(model)
Df Pillai approx F num Df den Df Pr(>F)
Species 2 1.1919 53.466 8 290 < 2.2e-16 ***
Residuals 147
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Notes:
Reasons to use MANOVA: Tests interaction of multiple DVs and IVs, protects against type 1 error.
Reasons not to use MANOVA: Techniques do not answer all questions; still need separate ANOVAs for significance testing.
Roy-Bargmann Stepdown:
Roy-Bargmann stepdown procedure prioritizes dependent variables (DVs) in a sequence of ANOVAs and ANCOVAs to understand which variables explain group differences most significantly.
The highest priority DV is tested in a univariate ANOVA.
The next most important DVs are tested in an ANCOVA with the high priority DV acting as a CV.
Sometimes the lower order DV still has an effect but not in the stepdown analysis, this tells us that yes, the DV explains some of the variance, but has no unique variability with the treatment after adjusting with the higher order DV and it is not needed.
# dv1 and dv2 are dependent variables of interest# Univariate ANOVA for dv1model1 <-aov(dv1 ~ group, data = mydata)# ANCOVA for dv2 with dv1 as covariatemodel2 <-lm(dv2 ~ dv1 + group, data = mydata)
Notes
To find out which DV explains group differences in means, test them with separate ANOVAs.
To find out which groups caused an effect, use post-hoc comparisons
4. Repeated Measures ANOVA
RM ANOVA is used when measurements are taken from the same subjects over multiple time points or conditions.
Key Points:
When k = 2: Use paired t-test or ANCOVA
When k > 2: Use RM ANOVA or Profile Analysis
model <-aov(score ~ time + group + time:group +Error(subject/time), data = mydata)
Sphericity Violation and Corrections
Sphericity violation occurs when the variances of the differences between all possible pairs of within-subject conditions are not equal.
Violation leads to inflated type 1 error
Corrective measures such as Mauchly’s test and Greenhouse-Geisser correction (especially for small n) can be applied.
# Sphericity test and correctionmodel <-aov(score ~ condition +Error(subject/condition), data = mydata)summary(model)# Check for sphericity violationsummary(aov(score ~ condition +Error(subject/condition), data = mydata))# Greenhouse-Geisser correctionsummary(aov(score ~ condition +Error(subject/condition), data = mydata, Greenhouse-Geisser =TRUE))