Exercise 8: Bayesian regression

We will now analyze data from the therapeutic clinical trial ALBI-ANRS 070 which compared the efficacy and tolerance of 3 anti-retroviral strategies among HIV-1 positive patients naive of any anti-retroviral treatment 1.

  1. Load the data from the albianrs_data.csv available here (ProTip: set stringsAsFactors = TRUE in read.delim())

    albi <- read.csv("albianrs_data.csv", stringsAsFactors = TRUE)
    summary(albi)

    The following variables are available:

  • PatientID : Patient ID
  • ViralLoad: Plasma viral load
  • CD4: CD4 T-cell lymphocites rate (in cell/mm3)
  • CD4t: transformed CD4 T-cell lymphocites rate (in cell/mm3) (CD4t = CD41/4)
  • CD4sup500: binary variable indicating wether CD4 cell counts is above 500
  • Treatment : Treatment group \(1\) (d4t + ddI), \(2\) (alternance) ou \(3\) (AZT + 3TC))
  • Day: Day of visit (since inclusion)
  • Visit: Visit number

Below is a quantitative summary of the characteristics of those variables.

Table 1: Summary of the Albi-ANRS-070 trial data
Variable N = 9501
ViralLoad 2.71 (2.04, 3.69)
CD4 465 (377, 585)
CD4t 4.64 (4.41, 4.92)
CD4sup500 392 (41%)
Treatment
    AZT+3TC 315 (33%)
    d4t+ddI 322 (34%)
    switch 313 (33%)
Day 84 (29, 140)
Visit
    0 146 (15%)
    1 140 (15%)
    2 136 (14%)
    3 130 (14%)
    4 128 (13%)
    5 135 (14%)
    6 135 (14%)
1 Median (IQR) or n (%)

NB: for simplicity, NA values have been omitted.

  1. First, we want to explain the CD4 rate (as a transformed variable) from the viral load

    1. Display CD4t in scatterplot as a function of the ViralLoad and color the points according to Treatment.

      library(ggplot2)
      library(MetBrewer)
      ggplot(albi, aes(y = CD4t, x = ViralLoad, color = Treatment)) + geom_point(alpha = 0.7) +
          MetBrewer::scale_color_met_d("Java") + theme_bw()
    2. Write down the Bayesian mathematical model corresponding to a linear regression of CD4t against the ViralLoad.

    3. Write the corresponding BUGS model and save it in an external .txt file.

    4. Create the corresponding jags object in and generate a Monte Carlo sample of size 1000 for the 3 parameters of interest

    5. Before interpreting the results, check the convergence. What do you notice ?

    6. Using the window function, remove the 500 first iterations as burn-in to reach Markov Chain convergence to its stationary distribution (the posterior). Is it sufficient to solve convergence issues ?

    7. Check the effective sample size of the Monte Carlo sample with the effectiveSize() function. Reduuce auto-corrlation by increasing the thin parameter to 10 in coda.samples. Check the impact on the effective sample

  2. Compare those results with a frequentist analysis.

  3. Perform a Bayesian logistic regression to study the impact of the treatment on the binary outcome CD4sup500 adjusted on the viral load. Once you have a first estimation, try adding an interaction between the treatment and the viral load.

  4. So far we have ignored the longitudinal nature of our measurements. Now, lets add a random intercept in our logistic regression to account for the intra-patient correlation across visits.

    ggplot(albi, aes(y = CD4, x = Day, color = Treatment)) + geom_hline(aes(yintercept = 500),
        color = "grey35", linetype = 2) + geom_line(aes(group = PatientID), alpha = 0.3) +
        MetBrewer::scale_color_met_d("Java") + theme_bw()