PyMC & Bambi - Bayesian Modeling in Python

Advanced guide to probabilistic programming using PyMC and Bambi for Bayesian statistical modeling in Python.


Introduction to Bayesian Modeling

Bayesian statistics provides a framework for updating beliefs based on evidence. Unlike frequentist approaches, Bayesian methods incorporate prior knowledge and quantify uncertainty through probability distributions.

What is PyMC?

PyMC (Probabilistic Machine Learning in Python) is a Python library for probabilistic programming that allows you to build and fit Bayesian models using Markov Chain Monte Carlo (MCMC) methods.

PyMC3 is the current major version (as PyMC4), featuring automatic differentiation and efficient sampling algorithms.

Core Concepts

Prior Distributions

Likelihood

Posterior Distribution

Getting Started with PyMC

Installation

pip install pymc

Basic PyMC Model Structure

import pymc as pm
import numpy as np

# Example data
np.random.seed(42)
data = np.random.normal(0, 1, 100)

with pm.Model() as model:
    # Priors
    mu = pm.Normal('mu', mu=0, sigma=10)
    sigma = pm.HalfNormal('sigma', sigma=1)

    # Likelihood
    y = pm.Normal('y', mu=mu, sigma=sigma, observed=data)

    # Sample from posterior
    trace = pm.sample(1000, return_inferencedata=True)

Common Probability Distributions

Continuous Distributions

# Normal distribution
x = pm.Normal('x', mu=0, sigma=1)

# Beta distribution (for proportions)
theta = pm.Beta('theta', alpha=1, beta=1)

# Gamma distribution (for positive values)
rate = pm.Gamma('rate', alpha=2, beta=1)

# Exponential distribution
waiting_time = pm.Exponential('waiting_time', lam=1)

Discrete Distributions

# Binomial distribution
successes = pm.Binomial('successes', n=10, p=0.5)

# Poisson distribution
count = pm.Poisson('count', mu=3.5)

# Categorical distribution
categories = pm.Categorical('categories', p=[0.3, 0.4, 0.3])

MCMC Sampling Methods

Hamiltonian Monte Carlo (HMC)

No-U-Turn Sampler (NUTS)

Metropolis-Hastings

Model Diagnostics

Convergence Checking

# Trace plots
pm.plot_trace(trace)

# Autocorrelation plots
pm.plot_autocorr(trace)

# Effective sample size
az.summary(trace, round_to=2)

# R-hat statistic (should be close to 1.0)
az.summary(trace)['r_hat']

Posterior Analysis

import arviz as az

# Summary statistics
summary = az.summary(trace)

# Highest density intervals
hdi = az.hdi(trace, hdi_prob=0.95)

# Plot posterior distributions
az.plot_posterior(trace)

Common Bayesian Modeling Patterns

Linear Regression

with pm.Model() as linear_model:
    # Priors
    alpha = pm.Normal('alpha', mu=0, sigma=10)
    beta = pm.Normal('beta', mu=0, sigma=1)
    sigma = pm.HalfNormal('sigma', sigma=1)

    # Expected value
    mu = alpha + beta * x

    # Likelihood
    y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma, observed=y)

    trace = pm.sample(1000)

Hierarchical Models

with pm.Model() as hierarchical_model:
    # Hyper-parameters
    mu_mu = pm.Normal('mu_mu', mu=0, sigma=10)
    sigma_mu = pm.HalfNormal('sigma_mu', sigma=5)

    # Group-level parameters
    mu_groups = pm.Normal('mu_groups',
                         mu=mu_mu,
                         sigma=sigma_mu,
                         shape=n_groups)

    # Likelihood
    y = pm.Normal('y',
                  mu=mu_groups[group_idx],
                  sigma=1,
                  observed=data)

Time Series Models (ARIMA-like)

with pm.Model() as ar_model:
    # AR(1) coefficients
    phi = pm.Normal('phi', mu=0, sigma=1)
    sigma = pm.HalfNormal('sigma', sigma=1)

    # Likelihood with autoregressive structure
    for i in range(1, len(data)):
        mu_i = phi * data[i-1]
        data[i] = pm.Normal(f'obs_{i}',
                           mu=mu_i,
                           sigma=sigma,
                           observed=data[i])

Bambi - Bayesian Modeling for GLM

Bambi provides a high-level interface for fitting Bayesian generalized linear models, similar to statsmodels.

Basic Bambi Usage

import bambi as bmb

# Load data
data = pd.read_csv('data.csv')

# Specify model using formula syntax
model = bmb.Model('y ~ x1 + x2', data=data)

# Fit with MCMC
results = model.fit(draws=1000)

# Plot results
model.plot_priors()
model.plot_traces()

Model Families

Advanced Topics

Variational Inference

with model:
    # Approximate inference
    approx = pm.fit(method='advi', n=10000)
    trace = approx.sample(1000)

Model Comparison

# Leave-one-out cross-validation
loo1 = az.loo(trace1)
loo2 = az.loo(trace2)

# Compare models
az.compare({'model1': trace1, 'model2': trace2})

Predictive Modeling

# Posterior predictive samples
with model:
    ppc = pm.sample_posterior_predictive(trace)

# Plot predictions vs observed
az.plot_ppc(ppc)

Applications

A/B Testing

Customer Lifetime Value

Risk Assessment

Best Practices

Prior Specification

Model Building

Computation

Challenges and Limitations

Computational Complexity

Prior Choice Dependency

Bayesian modeling with PyMC and Bambi provides a powerful framework for probabilistic inference and uncertainty quantification. The flexibility of probabilistic programming allows for modeling complex phenomena while maintaining mathematical rigor.

This guide covers advanced concepts in probabilistic programming. For beginners, start with basic Bayesian concepts before diving into implementation.

Updated: January 15, 2025
Author: Danial Pahlavan
Category: Python Libraries