Project Description

Considering my love for exercise and health, I found a dataset on physical activity and heart rate to use for a Bayesian analysis. I am interested in exploring the relationship between exercise and heart rate, which calls for a linear regression approach. To perform a Bayesian linear regression, I will use the brms package in R. This package provides an interface to fit complex Bayesian models using the probabilistic programming language Stan. Stan’s efficient Hamiltonian Monte Carlo algorithms can sample from the posterior distribution of the model parameters.

I could not have learned some of these steps by myself. An amazing resource to check out is bayesrulesbook.com; it will teach you so much about the Bayesian philosophy and techniques. If you curious about where I learned some of the steps below, check out Chapter 6!

1. Load Packages and Data

options(repos = c(CRAN = "https://cloud.r-project.org"))
library(brms)

Loading required package: Rcpp

Loading 'brms' package (version 2.21.0). Useful instructions
can be found by typing help('brms'). A more detailed introduction
to the package is available through vignette('brms_overview').


Attaching package: 'brms'

The following object is masked from 'package:stats':

    ar

library(bayesplot)

This is bayesplot version 1.11.1

- Online documentation and vignettes at mc-stan.org/bayesplot

- bayesplot theme set to bayesplot::theme_default()

   * Does _not_ affect other ggplot2 plots

   * See ?bayesplot_theme_set for details on theme setting


Attaching package: 'bayesplot'

The following object is masked from 'package:brms':

    rhat

data<-read.csv("exercise_data.csv")
head(data)

  physical_activity heart_rate age   weight gender
1                55   66.32434  22 66.45458   Male
2                69   69.98812  28 69.26444 Female
3                46   82.85591  64 58.31349   Male
4                61   53.97753  37 63.65252 Female
5                73   74.57795  19 69.71158   Male
6                63   70.95191  79 76.70696 Female

2. Create the Bayesian Linear Regression

# We can model heart rate as a function of physical activity, age, weight, and gender
model <- brm(
  heart_rate ~ physical_activity + age + weight + gender,
  data = data,
  family = gaussian(),
  prior = c(set_prior("normal(0, 10)", class = "b")) # set a normal prior of M = 0 and SD = 10
)

Compiling Stan program...

Start sampling


SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 6.3e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.63 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.045 seconds (Warm-up)
Chain 1:                0.024 seconds (Sampling)
Chain 1:                0.069 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 1e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.042 seconds (Warm-up)
Chain 2:                0.027 seconds (Sampling)
Chain 2:                0.069 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 8e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.049 seconds (Warm-up)
Chain 3:                0.029 seconds (Sampling)
Chain 3:                0.078 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 7e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.047 seconds (Warm-up)
Chain 4:                0.025 seconds (Sampling)
Chain 4:                0.072 seconds (Total)
Chain 4:

# Print a summary of the fitted model, we can assess our estimates and our credible intervals
summary(model)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: heart_rate ~ physical_activity + age + weight + gender 
   Data: data (Number of observations: 145) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept            75.58      9.77    56.75    94.34 1.00     6425     3275
physical_activity    -0.06      0.12    -0.29     0.17 1.00     5619     3328
age                   0.05      0.05    -0.04     0.14 1.00     5681     2843
weight                0.01      0.10    -0.18     0.20 1.00     6231     3241
genderMale           -0.83      1.72    -4.14     2.51 1.00     5424     3195

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma    10.68      0.63     9.52    12.03 1.00     4866     2718

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Important Note: Credible intervals crossing through 0 in Bayesian analysis indicate uncertainty or lack of evidence for a nonzero effect. Also, a replication of this study may reveal different conclusions.

3. Data Visualization

Trace Plots: Displays the sampled values of the parameters over iterations of the Markov Chain Monte Carlo sampling.
- Interpretation: Look for a fuzzy caterpillar where the chains mix well and cover the parameter space evenly. This suggests good mixing and convergence.

# Trace Plots
trace_plot <- mcmc_trace(model, size = .1)
print(trace_plot)

Density Plots: Shows the posterior distributions of the parameters.
- Interpretation: These should be smooth and unimodal. Multiple peaks might indicate issues with convergence or that the model has not properly identified the posterior distribution.

# Density Plots (Posterior distributions)
mcmc_dens(model) + 
  yaxis_text(TRUE)

Density Overlay Plot: Compares the density of the observed data to the density of the data generated by the model.
- Interpretation: The observed data (solid line) should ideally lie within the range of the simulated data (light blue lines). If the observed data fall outside this range, it may indicate that the model does not fit the data well.

# Posterior Predictive Checks
pp_check_plot <- pp_check(model, ndraws = 100)
print(pp_check_plot)