MIT 401 – BAYESIAN MODELLING
TUTORIAL 2: Bayesian Computing and R Programming
1 Overview
Bayesian modelling has gained significant traction in statistical analysis due to its flexibility
and robustness in handling uncertainty. In the context of R, Bayesian computation allows
researchers and practitioners to apply Bayesian methods effectively using a range of specialized
packages. This overview will cover two critical subtopics: an introduction to R and its Bayesian
packages, and setting up the Bayesian computing environment in R.
1.1 Introduction to R and Its Bayesian Packages
R is a powerful programming language and environment widely used for statistical computing
and data analysis. Its extensive ecosystem of packages makes it particularly well-suited for
implementing Bayesian methods. Some of the most notable packages for Bayesian modelling
in R include:
• rstan: This package provides an interface to Stan, a state-of-the-art platform for
statistical modelling. Stan is known for its efficient Markov Chain Monte Carlo
(MCMC) methods, enabling users to fit complex Bayesian models. With rstan, users
can specify models in a flexible syntax and leverage Stan’s sampling algorithms.
• brms: Built on top of rstan, the brms package allows users to fit Bayesian generalized
(non-)linear multivariate models using a formula syntax similar to that of the popular
lme4 package. This makes brms particularly approachable for users familiar with
traditional regression modelling in R, while still providing the benefits of Bayesian
inference.
• BayesFactor: This package simplifies the process of conducting Bayesian hypothesis
testing using Bayes factors, offering functions for model comparison and hypothesis
testing.
These packages, among others, provide a comprehensive toolkit for implementing Bayesian
analyses in R, catering to a wide range of applications from simple regression models to
complex hierarchical models.
1.2 Setting Up the Bayesian Computing Environment in R
To effectively utilize Bayesian methods in R, it is essential to set up a conducive computing
environment. This involves installing the necessary packages, configuring settings, and
understanding the computational requirements.
• Installation of Packages: Begin by installing the required packages from CRAN or
GitHub. For instance, installing rstan can be done with:
install.packages("rstan")
Similarly, brms can be installed using:
install.packages("brms")
install.packages("BH")
• Configuration: After installation, it is crucial to configure the environment to optimize
the performance of the Bayesian computations. This may involve adjusting options
such as parallel processing settings and ensuring that R is set up to utilize the
appropriate C++ toolchain for rstan.
• Exploration of Examples: Familiarizing oneself with example models and data sets
provided within the packages is an excellent way to understand how to implement
Bayesian modelling effectively. Both rstan and brms include vignettes that guide users
through the process of fitting models and interpreting results.
By setting up the Bayesian computing environment thoughtfully, users can leverage the full
power of Bayesian methods in their analyses, enabling robust statistical inferences and
improved decision-making.
1.3 Conclusion
Bayesian modelling in R is a powerful approach that allows for sophisticated analyses of
uncertainty and complex data structures. Through an introduction to key packages like rstan
and brms, along with guidance on setting up the Bayesian computing environment, researchers
can embark on their Bayesian journey with confidence. The subsequent sections will provide
detailed insights and practical examples, equipping users with the knowledge needed to apply
Bayesian techniques effectively in their work.
2 Introduction to R and Its Bayesian Packages
Bayesian statistics is a powerful framework for modelling uncertainty, and R provides a rich
ecosystem for implementing Bayesian methods. This section will explore the R programming
environment and introduce key packages for Bayesian analysis, particularly rstan and brms.
We will cover the installation of these packages, their basic functionalities, and provide
examples of how to use them for Bayesian modelling.
2.1 Overview of R for Bayesian Analysis
R is a widely used programming language for statistical computing and data analysis. Its
extensive collection of packages makes it particularly suitable for Bayesian inference, allowing
users to fit complex models and conduct advanced analyses.
2.1.1 Key Advantages of Using R for Bayesian Analysis:
• Flexibility: R allows users to specify a wide variety of models, from simple linear
regressions to complex hierarchical models.
• Community Support: The R community is vibrant, with numerous resources available
for learning and troubleshooting.
• Integration with Other Tools: R can easily interface with other software and
programming languages, enhancing its capabilities.
2.2 Overview of rstan
rstan is the R interface to Stan, a platform for statistical modelling that provides a flexible and
efficient way to perform Bayesian inference using Markov Chain Monte Carlo (MCMC)
methods.
2.2.1 Basic Features of rstan:
• Allows users to define custom models using Stan’s modelling language.
• Provides access to efficient sampling algorithms, such as the No-U-Turn Sampler
(NUTS).
2.2.2 Example: Fitting a Simple Linear Regression with rstan
Let’s fit a simple linear regression model using rstan. We will analyze the mtcars dataset,
predicting miles per gallon (mpg) based on weight (wt).
2.2.3 Model Specification in Stan Language:
# Define the Stan model as a string
stan_model_code <- "
data {
int<lower=0> N; // number of observations
vector[N] x; // weight
vector[N] y; // miles per gallon
}
parameters {
real alpha; // intercept
real beta; // slope
real<lower=0> sigma; // error term
}
model {
y ~ normal(alpha + beta * x, sigma); // likelihood
}
"
2.2.4 Prepare the Data:
# Prepare data for Stan
data_list <- list(N = nrow(mtcars),
x = mtcars$wt,
y = mtcars$mpg)
2.2.5 Fit the Model:
# Fit the model using Stan
stan_fit <- stan(model_code = stan_model_code,
data = data_list,
iter = 2000, # Number of iterations
chains = 4) # Number of chains
2.2.6 View the Results:
# Print the summary of the fitted model
print(stan_fit)
In this example:
• We define a simple linear regression model in Stan's language, specifying the data,
parameters, and likelihood.
• We prepare the mtcars dataset for the Stan model and fit the model using the stan()
function.
2.3 Overview of brms
The brms package provides a user-friendly interface for Bayesian modeling using a formula
syntax similar to that of the lm() function in base R. It is built on top of rstan, allowing for a
seamless experience when fitting models.
2.3.1 Key Features of brms:
• Supports a wide range of response distributions (e.g., Gaussian, Bernoulli, Poisson).
• Allows for complex models, including multilevel and generalized additive models.
2.3.1.1 Example: Fitting a Linear Regression with brms
Continuing with the mtcars dataset, we will fit a linear regression model using brms.
2.3.1.2 Fit the Model:
# Fit a linear regression model using brms
brms_fit <- brm(mpg ~ wt, data = mtcars,
family = gaussian(),
iter = 2000,
chains = 4)
2.3.1.3 View the Summary:
# Print the summary of the fitted model
summary(brms_fit)
2.4 Visualizing the Results:
We can visualize the fitted model along with the original data points:
# Load ggplot2 for visualization
library(ggplot2)
# Create a scatter plot with the regression line
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point(color = "blue") + # Points for the scatter plot
geom_smooth(method = "brms", color = "red") + # Add the fitted line from
the brms model
labs(title = "Bayesian Linear Regression: mpg vs. Weight",
x = "Weight (1000 lbs)",
y = "Miles per Gallon") +
theme_minimal() # Use a minimal theme
In this example:
• The brm() function is used to fit a linear model to the data.
• The summary() function provides detailed output, including estimates of the
coefficients and their uncertainties.
• We visualize the results using ggplot2, showing the relationship between wt and mpg.
2.5 Conclusion
In summary, R is a powerful tool for Bayesian analysis, equipped with packages like rstan
and brms that facilitate the implementation of complex models. rstan provides a direct
interface to Stan for advanced users who want to define custom models, while brms offers a
more accessible formula-based interface for fitting a wide range of Bayesian models.
By mastering these packages, researchers can effectively explore uncertainties in their data,
make informed inferences, and communicate results with clarity. Whether you are performing
simple linear regression or intricate hierarchical modelling, R’s Bayesian frameworks provide
the tools necessary to perform robust statistical analyses.
3 Stan's Modelling Language
Stan is a powerful probabilistic programming language specifically designed for Bayesian
statistical modelling and inference. It allows users to specify complex models using a concise
syntax, enabling efficient Markov Chain Monte Carlo (MCMC) sampling and optimization
techniques. This section will detail the components of Stan’s modelling language, provide
examples of how to write Stan models, and demonstrate their implementation in R using the
rstan package.
3.1 Structure of a Stan Model
A Stan model consists of four main blocks, each serving a distinct purpose:
• Data Block: This block defines the data that will be used in the model. It specifies the
types and constraints of the data.
• Parameters Block: This block defines the parameters that will be estimated during the
modeling process.
• Model Block: This block contains the model specification, including the likelihood
function and prior distributions for the parameters.
• Generated Quantities Block (optional): This block is used for generating derived
quantities based on the model parameters after fitting.
Here’s a basic structure of a Stan model:
data {
// Data declarations
}
parameters {
// Parameter declarations
}
model {
// Model specification
}
generated quantities {
// Derived quantities
}
3.1.1 Example: Simple Linear Regression
Let’s illustrate Stan’s modeling language by writing a simple linear regression model. We will
use the mtcars dataset, where we will predict miles per gallon (mpg) based on the weight of
the cars (wt).
3.1.1.1 Step 1: Define the Model in Stan Language
data {
int<lower=0> N; // Number of observations
vector[N] x; // Predictor variable (weight)
vector[N] y; // Response variable (mpg)
}
parameters {
real alpha; // Intercept
real beta; // Slope
real<lower=0> sigma; // Standard deviation of the error
}
model {
// Prior distributions
alpha ~ normal(0, 10); // Prior for alpha
beta ~ normal(0, 10); // Prior for beta
sigma ~ cauchy(0, 5); // Prior for sigma
// Likelihood
y ~ normal(alpha + beta * x, sigma); // Model specification
}
In this model:
• The data block declares the number of observations (N), the predictor (x), and the
response (y).
• The parameters block defines the parameters we want to estimate: alpha, beta, and
sigma.
• The model block specifies prior distributions for the parameters and the likelihood
function for the response variable.
3.1.1.2 Step 2: Preparing the Data in R
Now, we will prepare the data in R to pass to the Stan model.
# Load necessary library
library(rstan)
# Prepare data for Stan
data_list <- list(N = nrow(mtcars),
x = mtcars$wt,
y = mtcars$mpg)
3.1.1.3 Step 3: Fitting the Model in R
We can fit the model using the stan() function from the rstan package.
# Define Stan model as a string
stan_model_code <- "
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
model {
alpha ~ normal(0, 10);
beta ~ normal(0, 10);
sigma ~ cauchy(0, 5);
y ~ normal(alpha + beta * x, sigma);
}
"
# Fit the model using Stan
stan_fit <- stan(model_code = stan_model_code,
data = data_list,
iter = 2000,
chains = 4)
3.2 Analyzing the Results
After fitting the model, we can extract and analyze the results.
# Print the summary of the fitted model
print(stan_fit)
# Extract posterior samples
posterior_samples <- extract(stan_fit)
The print() function provides a summary of the posterior distributions for the parameters,
including means, standard deviations, and credible intervals.
3.3 Visualization of Results
We can visualize the results using ggplot2. For example, we can plot the fitted regression line
along with the data points.
# Load ggplot2 for visualization
library(ggplot2)
# Create a scatter plot with the regression line
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point(color = "blue") + # Points for the scatter plot
geom_smooth(method = "lm", color = "red") + # Add the fitted line from
the lm model
labs(title = "Linear Regression: mpg vs. Weight",
x = "Weight (1000 lbs)",
y = "Miles per Gallon") +
theme_minimal() # Use a minimal theme
3.4 Generated Quantities Block
The generated quantities block can be used to compute derived quantities after fitting the
model. For instance, we might want to compute the predicted values based on the fitted
parameters.
generated quantities {
vector[N] y_pred; // Predicted values
for (n in 1:N) {
y_pred[n] = normal_rng(alpha + beta * x[n], sigma); // Predicted
values using the model
}
}
To include this in your R analysis, you need to modify the Stan model code and extract y_pred
after fitting the model.
3.5 Conclusion
Stan’s modeling language enables users to specify complex Bayesian models efficiently.
Understanding its structure—data, parameters, model, and generated quantities—is crucial for
implementing Bayesian inference effectively. Through examples like simple linear regression,
users can see how to translate statistical models into Stan’s syntax and leverage R’s powerful
capabilities for fitting and analyzing Bayesian models.
By mastering Stan's modeling language, you can tackle a wide range of problems in Bayesian
statistics, from simple regressions to more complex hierarchical models, ultimately enhancing
your data analysis skills and insights.