This vignette introduces mediator and provides an overview for mediation analysis under the counterfactual framework.


Data

In this vignette, we will use randomly generated data - example250.rda. The data is an extended version of the data provided in the SAS macro %mediation, with the same columns but 250 rows.

head(mediation_example)
#>   x m_01          m y          c
#> 1 0    1 0.00965707 0  0.5529378
#> 2 0    1 0.30360478 0 -0.7385979
#> 3 0    1 0.51954681 0  1.1788648
#> 4 0    0 0.18600953 1  2.8974657
#> 5 0    0 0.12708356 0 -1.2719255
#> 6 0    0 0.12181193 0  1.9804641

dim(mediation_example)
#> [1] 250   5

Introduction

In this vignette, we will :

  1. review terminology associated with mediation analysis
  2. look at the basic usage of mediator
  3. compare mediator to it’s SAS macro counterpart - %mediation
  4. provide the equations used to calculate various estimates

Terminology – currently taken from Valeri and VanderWeele

Controlled Direct Effect (CDE) expresses how much the outcome would change on average if the mediator were controlled at level m uniformly in the population but the treatment were changed from level a=0 to level a=1.

Natural Direct Effect (NDE) expresses how much the outcome would change if the exposure were set at level a=1 versus level a=0 but for each individual the mediatior were kept at the level it woud have taken in the absense of the exposure.

Natural Indirect Effect (NIE) expresses how much the outcome would change on average if the exposure were controlled at level a=1, but the mediator were changed from the level if would take if a=0 to the level it would take if a=1.

Total Effect how much the outcome would change overall for a change in the exposure from level a=0 to level a=1.

Proportion Mediated the extent to which the total effect of the exposure on the outcome operates through the mediator.

Usage

mediator returns a data frame containing the point estimates and CIs for CDE, NDE, NIE and TE as well as the point estimates for the PM. The function assumes appropriate modeling (for the outcome and mediator models) on the part of the user and allows (but does not require) interaction between the exposure and the mediator. A basic example of function is shown below:

mediator(data = mediation_example,
         out.model = lm(y ~ x  + m + c + x*m, data = mediation_example),
         med.model = lm(m ~ x + c, data = mediation_example),
         treat = "x",
         a = 1, a_star = 0,
         m = 0,
         boot_rep = 0)

Setting boot_rep = 0, tells the function to use the Delta method for calculating confidence intervals. a and a_star represent the level of exposure and its comparison level, which for binary exposures are often 1 and 0, but can be set at any pair of numeric values. In the above example, a, a_star, m and boot_rep are all set to their default values.

mediator vs %mediation

mediator is related to the SAS/SPSS macros developed by Valeri and VanderWeele, %mediation, however minor differences exist between the programs which sometimes results in variation in estimates and confidence intervals.

When the macros calculate the CI using the delta method, they use the hard coded values 1.96 and -1.96 while mediator uses c(-1,1)*qnorm(.975). Languages also differ with respect to rounding.

The %mediation takes roughly 7.05 seconds to run, calculating the CI using the delta method, and roughly 276.55 seconds using bootstrapping, while mediator takes 0.01 seconds for the delta method and 0.27 seconds for bootstrapping (using the mediation_example.rda dataset with 250 observations and 100 replicates).

When using bootstrapping, %mediation bootstraps both the effect estimates and CI, while the R function mediator only uses the bootstrap for obtaining the CI. Users should use the bootstrap function with caution, especially when using small datasets. We recommend using a minimum 10000 replicates. Using more replicates will take more time, however it will provide more precise estimates. This is especially important when using small datasets, as using a small number of replicated could make the CI wonky.

Below are examples of results from mediator (left) and %mediation (right) for comparison.

Continuous outcome and mediator

mediator(data = mediation_example,
         out.model = lm(c ~ x + m + x*m, data = mediation_example),
         med.model = lm(m ~ x, data = mediation_example),
         treat = "x")
%mediation(data = mediation_example, yvar = c, avar = x, mvar = m, cvar = , a0 = 0, a1 = 1, m = 0, yreg = linear, mreg = linear, interaction = true, casecontrol =, output =, c =, boot =, cens=);
run;
Effect Estimate Lower 95% CI Upper 95% CI
CDE -0.176 -0.517 0.165
NDE 0.014 -0.252 0.281
NIE -0.014 -0.151 0.123
Total Effect 0.001 -0.258 0.26
Proportion Mediated -21.708 NA NA
Effect Estimate Lower 95% CI Upper 95% CI
CDE -0.176 -0.517 0.165
NDE 0.014 -0.252 0.281
NIE -0.014 -0.113 0.085
Total Effect 0.001 -0.258 0.26
Proportion Mediated -21.7 NA NA

Binary outcome and mediator

mediator(data = mediation_example,
         out.model = glm(y ~ x + m_01 + c + x*m_01, 
                         data = mediation_example, family = "binomial"),
         med.model = glm(m_01 ~ x + c, data = mediation_example, family = "binomial"),
         treat = "x")
%mediation(data = mediation_example, yvar = y, avar = x, mvar = m_01, cvar = c, a0 = 0, a1 = 1, m = 0, yreg = logistic, mreg = logistic, interaction = true, casecontrol=,output=,c=,boot=,cens=);
run;
Effect Estimate Lower 95% CI Upper 95% CI
CDE 0.279 0.058 1.342
NDE 0.517 0.225 1.186
NIE 1.049 0.905 1.216
Total Effect 0.542 0.235 1.253
Proportion Mediated -0.055 NA NA
Effect Estimate Lower 95% CI Upper 95% CI
CDE 0.279 0.058 1.342
NDE 0.517 0.225 1.186
NIE 1.049 0.905 1.216
Total Effect 0.542 0.235 1.253
Proportion Mediated -0.055 NA NA

Continous outcome and binary mediator

mediator(data = mediation_example,
         out.model = lm(c ~ x + m_01 + x*m_01, data = mediation_example),
         med.model = glm(m_01 ~ x, data = mediation_example, family = "binomial"),
         treat = "x")
%mediation(data = mediation_example, yvar = c, avar = x, mvar = m_01, cvar = , a0 = 0, a1 = 1, m = 0, yreg = linear, mreg = logistic, interaction = true, casecontrol=, output=, c=, boot=, cens=);
run;
Effect Estimate Lower 95% CI Upper 95% CI
CDE 0.018 -0.35 0.386
NDE 0 -0.283 0.284
NIE 0 -0.017 0.017
Total Effect 0.001 -0.257 0.258
Proportion Mediated 0.598 NA NA
Effect Estimate Lower 95% CI Upper 95% CI
CDE 0.018 -0.35 0.386
NDE 0 -0.257 0.258
NIE 0 -0.017 0.017
Total Effect 0.001 -0.257 0.258
Proportion Mediated 0.598 NA NA

Binary outcome and continuous mediator

mediator(data = mediation_example,
         out.model = glm(y ~ x + m + c + x*m, 
                         data = mediation_example, family = "binomial"),
         med.model = lm(m ~ x + c, data = mediation_example),
         treat = "x")
%mediation(data = mediation_example, yvar = y, avar = x, mvar = y, cvar = c, a0 = 0, a1 = 1, m = 0, yreg = logistic, mreg = linear, interaction = true, casecontrol=, output=, c=, boot=, cens=);
run;
Effect Estimate Lower 95% CI Upper 95% CI
CDE 1.005 0.314 3.222
NDE 0.62 0.224 1.716
NIE 0.803 0.506 1.274
Total Effect 0.498 0.103 2.403
Proportion Mediated 0.244 NA NA
Effect Estimate Lower 95% CI Upper 95% CI
CDE 1.005 0.314 3.221
NDE 0.62 0.224 1.715
NIE 0.803 0.506 1.274
Total Effect 0.498 0.207 1.2
Proportion Mediated 0.244 NA NA

Equations

Continuous outcome and mediator

Models for Y (outcome) and M (mediator), correctly specified

\[ E[Y|a,m,c] = \theta_{0} + \theta_{1}a + \theta_{2}m + \theta_{3}am + \theta'_{4}c \]

\[ E[M|a,c] = \beta_{0} + \beta_{1}a + \beta'_{2}c \]

Average controlled direct effect, natural direct effect and natural indirect effect

\[ E[Y_{am} - Y_{a*m} | c] = (\theta_{1} + \theta_{3}m)(a-a^{*}) \]

\[ E[Y_{aM_{a^{*}}} - Y_{a^{*}M_{a^{*}}} | c] = \{\theta_{1} + \theta_{3}(\beta_{0} + \beta_{1}a^{*} + \beta'_{2}c)\}(a-a^{*})\]

\[ E[Y_{aM_{a}} - Y_{aM_{a^{*}}} | c] = (\theta_{2}\beta_{1} + \theta_{3}\beta_{1}a)(a-a^{*})\]

Binary outcome and continuous mediator

Models for Y (outcome) and M (mediator), correctly specified

\[ logit\{P(Y=1|a,m,c)\} = \theta_{0} + \theta_{1}a + \theta_{2}m + \theta_{3}am + \theta'_{4}c\]

\[ E[M|a,c] = \beta_{0} + \beta_{1}a + \beta'_{2}c \]

Controlled direct effect, natural direct effect and natural indiect effect on the odds ratio scale

\[ OR^{CDE}(m) = exp[(\theta_{1} + \theta_{3}m)(a-a^{*})] \]

\[ OR^{NDE} = exp\{(\theta_{1} + \theta_{3}\beta_{0} + \theta_{3}\beta_{1}a^{*} + \theta_{3}\beta'_{2}C + \theta_{3}\theta_{2}\sigma^{2})(a-a^{2})\}\]

\[ OR^{NIE} = exp\{(\theta_{2}\beta_{1} + \theta_{3}\beta_{1}a)(a-a^{*})\}\]

Continuous outcome and binary mediator

Models for Y (outcome) and M (mediator), correctly specified

\[ E[Y|a,m,c] = \theta_{0} + \theta_{1}a + \theta_{2}m + \theta_{3}am + \theta'_{4}c \]

\[ logit\{P(M = 1|a,c)\} = \beta_{0} + \beta_{1}a + \beta'_{2}c\]

Average controlled direct effect, natural direct effect and natural indirect effect

\[ E[Y_{am} - Y_{a*m} | c] = (\theta_{1} + \theta_{3}m)(a-a^{*}) \]

\[ E[Y_{aM_{a^{*}}} - Y_{a^{*}M_{a^{*}}} | c] = \{\theta_{1}(a-a^{*})\} + \{\theta_{3}(a-a^{*})\} \frac{exp[\beta_{0} + \beta_{1}a^{*} + \beta'_{2}c]}{1+exp[\beta_{0} + \beta_{1}a^{*} + \beta'_{2}c]}\]

\[ E[Y_{aM_{a}} - Y_{aM_{a^{*}}}|c] = (\theta_{2} + \theta_{3}a)\{\frac{exp[\beta_{0} + \beta_{1}a + \beta'_{2}c]}{1+exp[\beta_{0} + \beta_{1}a + \beta'_{2}c]} - \frac{exp[\beta_{0} + \beta_{1}a^{*} + \beta'_{2}c]}{1+exp[\beta_{0} + \beta_{1}a^{*} + \beta'_{2}c]}\}\]

Binary outcome and mediator

Models for Y (outcome) and M (mediator), correctly specified

\[ logit\{P(Y=1|a,m,c)\} = \theta_{0} + \theta_{1}a + \theta_{2}m + \theta_{3}am + \theta'_{4}c\]

\[ logit\{P(M = 1|a,c)\} = \beta_{0} + \beta_{1}a + \beta'_{2}c\]

Controlled direct effect, natural direct effect and natural indiect effect on the odds ratio scale

\[ OR^{CDE}(m) = exp[(\theta_{1} + \theta_{3}m)(a-a^{*})]\]

\[ OR^{NDE} = \frac{exp(\theta_{1}a)\{1+exp(\theta_{2} + \theta_{3}a + \beta_{0} + \beta_{1}a^{*} + \beta'_{2}c)\}}{exp(\theta_{1}a^{*})\{1+exp(\theta_{2} + \theta_{3}a^{*} + \beta_{0} + \beta_{1}a^{*} + \beta'_{2}c)\}}\]

\[ OR^{NIE} = \frac{\{1 + exp(\beta_{0} + \beta_{1}a^{*} + \beta'_{2}c)\}\{1 + exp(\theta_{2} + \theta_{3}a + \beta_{0} + \beta_{1}a + \beta'_{2}c)\}}{\{1 + exp(\beta_{0} + \beta_{1}a + \beta'_{2}c)\}\{1 + exp(\theta_{2} + \theta_{3}a + \beta_{0} + \beta_{1}a^{*} + \beta'_{2}c)\}}\]