Mediation analysis with duration data#

This notebook demonstrates mediation analysis when the mediator and outcome are duration variables, modeled using proportional hazards regression. These examples are based on simulated data.

[1]:

import numpy as np
import pandas as pd

import statsmodels.api as sm
from statsmodels.stats.mediation import Mediation

Make the notebook reproducible.

[2]:

rs = np.random.default_rng(3424)

Specify a sample size.

[3]:

n = 1000

Generate an exposure variable.

[4]:

exp = rs.normal(size=n)

Generate a mediator variable.

[5]:

def gen_mediator():
    mn = np.exp(exp)
    mtime0 = -mn * np.log(rs.uniform(size=n))
    ctime = -2 * mn * np.log(rs.uniform(size=n))
    mstatus = (ctime >= mtime0).astype(int)
    mtime = np.where(mtime0 <= ctime, mtime0, ctime)
    return mtime0, mtime, mstatus

Generate an outcome variable.

[6]:

def gen_outcome(otype, mtime0):
    if otype == "full":
        lp = 0.5 * mtime0
    elif otype == "no":
        lp = exp
    else:
        lp = exp + mtime0
    mn = np.exp(-lp)
    ytime0 = -mn * np.log(rs.uniform(size=n))
    ctime = -2 * mn * np.log(rs.uniform(size=n))
    ystatus = (ctime >= ytime0).astype(int)
    ytime = np.where(ytime0 <= ctime, ytime0, ctime)
    return ytime, ystatus

Build a dataframe containing all the relevant variables.

[7]:

def build_df(ytime, ystatus, mtime0, mtime, mstatus):
    df = pd.DataFrame(
        {
            "ytime": ytime,
            "ystatus": ystatus,
            "mtime": mtime,
            "mstatus": mstatus,
            "exp": exp,
        }
    )
    return df

Run the full simulation and analysis, under a particular population structure of mediation.

[8]:

def run(otype):

    mtime0, mtime, mstatus = gen_mediator()
    ytime, ystatus = gen_outcome(otype, mtime0)
    df = build_df(ytime, ystatus, mtime0, mtime, mstatus)

    outcome_model = sm.PHReg.from_formula(
        "ytime ~ exp + mtime", status="ystatus", data=df
    )
    mediator_model = sm.PHReg.from_formula("mtime ~ exp", status="mstatus", data=df)

    med = Mediation(
        outcome_model,
        mediator_model,
        "exp",
        "mtime",
        outcome_predict_kwargs={"pred_only": True},
    )
    med_result = med.fit(n_rep=20, rng=rs)
    print(med_result.summary())

Run the example with full mediation

[9]:

run("full")

                          Estimate  Lower CI bound  Upper CI bound  P-value
ACME (control)            0.783217        0.669836        0.901194      0.0
ACME (treated)            0.783217        0.669836        0.901194      0.0
ADE (control)             0.025815       -0.070741        0.111009      0.5
ADE (treated)             0.025815       -0.070741        0.111009      0.5
Total effect              0.809033        0.701157        0.924868      0.0
Prop. mediated (control)  0.963520        0.870958        1.087129      0.0
Prop. mediated (treated)  0.963520        0.870958        1.087129      0.0
ACME (average)            0.783217        0.669836        0.901194      0.0
ADE (average)             0.025815       -0.070741        0.111009      0.5
Prop. mediated (average)  0.963520        0.870958        1.087129      0.0

Run the example with partial mediation

[10]:

run("partial")

                          Estimate  Lower CI bound  Upper CI bound  P-value
ACME (control)            1.202634        1.041637        1.492226      0.0
ACME (treated)            1.202634        1.041637        1.492226      0.0
ADE (control)             0.960218        0.858490        1.086519      0.0
ADE (treated)             0.960218        0.858490        1.086519      0.0
Total effect              2.162852        2.015172        2.425144      0.0
Prop. mediated (control)  0.555877        0.497044        0.628792      0.0
Prop. mediated (treated)  0.555877        0.497044        0.628792      0.0
ACME (average)            1.202634        1.041637        1.492226      0.0
ADE (average)             0.960218        0.858490        1.086519      0.0
Prop. mediated (average)  0.555877        0.497044        0.628792      0.0

Run the example with no mediation

[11]:

run("no")

                          Estimate  Lower CI bound  Upper CI bound  P-value
ACME (control)            0.028193       -0.056908        0.094842      0.5
ACME (treated)            0.028193       -0.056908        0.094842      0.5
ADE (control)             0.937539        0.872684        1.036476      0.0
ADE (treated)             0.937539        0.872684        1.036476      0.0
Total effect              0.965732        0.874654        1.100320      0.0
Prop. mediated (control)  0.046258       -0.064971        0.097052      0.5
Prop. mediated (treated)  0.046258       -0.064971        0.097052      0.5
ACME (average)            0.028193       -0.056908        0.094842      0.5
ADE (average)             0.937539        0.872684        1.036476      0.0
Prop. mediated (average)  0.046258       -0.064971        0.097052      0.5