Exercise 8: Bayesian regression
We will now analyze data from the therapeutic clinical trial ALBI-ANRS 070 which compared the efficacy and tolerance of 3 anti-retroviral strategies among HIV-1 positive patients naive of any anti-retroviral treatment 1.
Load the data from the
albianrs_data.csv
available here (ProTip: setstringsAsFactors = TRUE
inread.delim()
)albi <- read.csv("albianrs_data.csv", stringsAsFactors = TRUE) summary(albi)
The following variables are available:
PatientID
: Patient IDViralLoad
: Plasma viral loadCD4
: CD4 T-cell lymphocites rate (in cell/mm3)CD4t
: transformed CD4 T-cell lymphocites rate (in cell/mm3) (CD4t
=CD4
1/4)CD4sup500
: binary variable indicating wether CD4 cell counts is above 500Treatment
: Treatment group \(1\) (d4t + ddI), \(2\) (alternance) ou \(3\) (AZT + 3TC))Day
: Day of visit (since inclusion)Visit
: Visit number
Below is a quantitative summary of the characteristics of those variables.
Variable | N = 9501 |
---|---|
ViralLoad | 2.71 (2.04, 3.69) |
CD4 | 465 (377, 585) |
CD4t | 4.64 (4.41, 4.92) |
CD4sup500 | 392 (41%) |
Treatment | |
AZT+3TC | 315 (33%) |
d4t+ddI | 322 (34%) |
switch | 313 (33%) |
Day | 84 (29, 140) |
Visit | |
0 | 146 (15%) |
1 | 140 (15%) |
2 | 136 (14%) |
3 | 130 (14%) |
4 | 128 (13%) |
5 | 135 (14%) |
6 | 135 (14%) |
1 Median (IQR) or n (%) |
NB: for simplicity, NA
values have been omitted.
First, we want to explain the CD4 rate (as a transformed variable) from the viral load
Display
CD4t
in scatterplot as a function of theViralLoad
and color the points according toTreatment
.library(ggplot2) library(MetBrewer) ggplot(albi, aes(y = CD4t, x = ViralLoad, color = Treatment)) + geom_point(alpha = 0.7) + MetBrewer::scale_color_met_d("Java") + theme_bw()
Write down the Bayesian mathematical model corresponding to a linear regression of
CD4t
against theViralLoad
.Write the corresponding
BUGS
model and save it in an external.txt
file.Create the corresponding
jags
object in and generate a Monte Carlo sample of size 1000 for the 3 parameters of interestBefore interpreting the results, check the convergence. What do you notice ?
Using the
window
function, remove the 500 first iterations as burn-in to reach Markov Chain convergence to its stationary distribution (the posterior). Is it sufficient to solve convergence issues ?Check the effective sample size of the Monte Carlo sample with the
effectiveSize()
function. Reduuce auto-corrlation by increasing thethin
parameter to 10 incoda.samples
. Check the impact on the effective sample
Compare those results with a frequentist analysis.
Perform a Bayesian logistic regression to study the impact of the treatment on the binary outcome
CD4sup500
adjusted on the viral load. Once you have a first estimation, try adding an interaction between the treatment and the viral load.So far we have ignored the longitudinal nature of our measurements. Now, lets add a random intercept in our logistic regression to account for the intra-patient correlation across visits.
ggplot(albi, aes(y = CD4, x = Day, color = Treatment)) + geom_hline(aes(yintercept = 500), color = "grey35", linetype = 2) + geom_line(aes(group = PatientID), alpha = 0.3) + MetBrewer::scale_color_met_d("Java") + theme_bw()
Jean-Michel Molina et al. “The ALBI Trial: A Randomized Controlled Trial Comparing Stavudine Plus Didanosine with Zidovudine Plus Lamivudine and a Regimen Alternating Both Combinations in Previously Untreated Patients Infected with Human Immunodeficiency Virus,” The Journal of Infectious Diseases 180, no. 2 (1999): 351–358, doi:10.1086/314891.↩︎