Exercise 8: Bayesian regression

We will now analyze data from the therapeutic clinical trial ALBI-ANRS 070 which compared the efficacy and tolerance of 3 anti-retroviral strategies among HIV-1 positive patients naive of any anti-retroviral treatment ¹.

Load the data from the albianrs_data.csv available here (ProTip: set stringsAsFactors = TRUE in read.delim())
```
albi <- read.csv("albianrs_data.csv", stringsAsFactors = TRUE)
summary(albi)
```
The following variables are available:

PatientID : Patient ID
ViralLoad: Plasma viral load
CD4: CD4 T-cell lymphocites rate (in cell/mm³)
CD4t: transformed CD4 T-cell lymphocites rate (in cell/mm³) (CD4t = CD4^1/4)
CD4sup500: binary variable indicating wether CD4 cell counts is above 500
Treatment : Treatment group $1$ (d4t + ddI), $2$ (alternance) ou $3$ (AZT + 3TC))
Day: Day of visit (since inclusion)
Visit: Visit number

Below is a quantitative summary of the characteristics of those variables.

Table 1: **Summary of the Albi-ANRS-070 trial data**
Variable	N = 950¹
ViralLoad	2.71 (2.04, 3.69)
CD4	465 (377, 585)
CD4t	4.64 (4.41, 4.92)
CD4sup500	392 (41%)
Treatment
AZT+3TC	315 (33%)
d4t+ddI	322 (34%)
switch	313 (33%)
Day	84 (29, 140)
Visit
0	146 (15%)
1	140 (15%)
2	136 (14%)
3	130 (14%)
4	128 (13%)
5	135 (14%)
6	135 (14%)
¹ Median (IQR) or n (%)

NB: for simplicity, NA values have been omitted.

First, we want to explain the CD4 rate (as a transformed variable) from the viral load
1. Display CD4t in scatterplot as a function of the ViralLoad and color the points according to Treatment.
```
library(ggplot2)
library(MetBrewer)
ggplot(albi, aes(y = CD4t, x = ViralLoad, color = Treatment)) + geom_point(alpha = 0.7) +
    MetBrewer::scale_color_met_d("Java") + theme_bw()
```
2. Write down the Bayesian mathematical model corresponding to a linear regression of CD4t against the ViralLoad.
3. Write the corresponding BUGS model and save it in an external .txt file.
4. Create the corresponding jags object in and generate a Monte Carlo sample of size 1000 for the 3 parameters of interest
5. Before interpreting the results, check the convergence. What do you notice ?
6. Using the window function, remove the 500 first iterations as burn-in to reach Markov Chain convergence to its stationary distribution (the posterior). Is it sufficient to solve convergence issues ?
7. Check the effective sample size of the Monte Carlo sample with the effectiveSize() function. Reduuce auto-corrlation by increasing the thin parameter to 10 in coda.samples. Check the impact on the effective sample
Compare those results with a frequentist analysis.
Perform a Bayesian logistic regression to study the impact of the treatment on the binary outcome CD4sup500 adjusted on the viral load. Once you have a first estimation, try adding an interaction between the treatment and the viral load.

So far we have ignored the longitudinal nature of our measurements. Now, lets add a random intercept in our logistic regression to account for the intra-patient correlation across visits.

ggplot(albi, aes(y = CD4, x = Day, color = Treatment)) + geom_hline(aes(yintercept = 500),
    color = "grey35", linetype = 2) + geom_line(aes(group = PatientID), alpha = 0.3) +
    MetBrewer::scale_color_met_d("Java") + theme_bw()

Jean-Michel Molina et al. “The ALBI Trial: A Randomized Controlled Trial Comparing Stavudine Plus Didanosine with Zidovudine Plus Lamivudine and a Regimen Alternating Both Combinations in Previously Untreated Patients Infected with Human Immunodeficiency Virus,” The Journal of Infectious Diseases 180, no. 2 (1999): 351–358, doi:10.1086/314891.↩︎

Last updated on June 2, 2024

Edit this page