This vignette introduces mediator
and provides an overview for mediation analysis under the counterfactual framework.
In this vignette, we will use randomly generated data - example250.rda
. The data is an extended version of the data provided in the SAS macro %mediation
, with the same columns but 250 rows.
In this vignette, we will :
mediator
mediator
to it’s SAS macro counterpart - %mediation
Controlled Direct Effect (CDE) expresses how much the outcome would change on average if the mediator were controlled at level m uniformly in the population but the treatment were changed from level a=0 to level a=1.
Natural Direct Effect (NDE) expresses how much the outcome would change if the exposure were set at level a=1 versus level a=0 but for each individual the mediatior were kept at the level it woud have taken in the absense of the exposure.
Natural Indirect Effect (NIE) expresses how much the outcome would change on average if the exposure were controlled at level a=1, but the mediator were changed from the level if would take if a=0 to the level it would take if a=1.
Total Effect how much the outcome would change overall for a change in the exposure from level a=0 to level a=1.
Proportion Mediated the extent to which the total effect of the exposure on the outcome operates through the mediator.
mediator
returns a data frame containing the point estimates and CIs for CDE, NDE, NIE and TE as well as the point estimates for the PM. The function assumes appropriate modeling (for the outcome and mediator models) on the part of the user and allows (but does not require) interaction between the exposure and the mediator. A basic example of function is shown below:
mediator(data = mediation_example,
out.model = lm(y ~ x + m + c + x*m, data = mediation_example),
med.model = lm(m ~ x + c, data = mediation_example),
treat = "x",
a = 1, a_star = 0,
m = 0,
boot_rep = 0)
Setting boot_rep = 0
, tells the function to use the Delta method for calculating confidence intervals. a
and a_star
represent the level of exposure and its comparison level, which for binary exposures are often 1 and 0, but can be set at any pair of numeric values. In the above example, a
, a_star
, m
and boot_rep
are all set to their default values.
mediator
is related to the SAS/SPSS macros developed by Valeri and VanderWeele, %mediation
, however minor differences exist between the programs which sometimes results in variation in estimates and confidence intervals.
When the macros calculate the CI using the delta method, they use the hard coded values 1.96 and -1.96 while mediator
uses c(-1,1)*qnorm(.975)
. Languages also differ with respect to rounding.
The %mediation
takes roughly 7.05 seconds to run, calculating the CI using the delta method, and roughly 276.55 seconds using bootstrapping, while mediator
takes 0.01 seconds for the delta method and 0.27 seconds for bootstrapping (using the mediation_example.rda
dataset with 250 observations and 100 replicates).
When using bootstrapping, %mediation
bootstraps both the effect estimates and CI, while the R function mediator
only uses the bootstrap for obtaining the CI. Users should use the bootstrap function with caution, especially when using small datasets. We recommend using a minimum 10000 replicates. Using more replicates will take more time, however it will provide more precise estimates. This is especially important when using small datasets, as using a small number of replicated could make the CI wonky.
Below are examples of results from mediator
(left) and %mediation
(right) for comparison.
mediator(data = mediation_example,
out.model = lm(c ~ x + m + x*m, data = mediation_example),
med.model = lm(m ~ x, data = mediation_example),
treat = "x")
%mediation(data = mediation_example, yvar = c, avar = x, mvar = m, cvar = , a0 = 0, a1 = 1, m = 0, yreg = linear, mreg = linear, interaction = true, casecontrol =, output =, c =, boot =, cens=);
run;
|
|
mediator(data = mediation_example,
out.model = glm(y ~ x + m_01 + c + x*m_01,
data = mediation_example, family = "binomial"),
med.model = glm(m_01 ~ x + c, data = mediation_example, family = "binomial"),
treat = "x")
%mediation(data = mediation_example, yvar = y, avar = x, mvar = m_01, cvar = c, a0 = 0, a1 = 1, m = 0, yreg = logistic, mreg = logistic, interaction = true, casecontrol=,output=,c=,boot=,cens=);
run;
|
|
mediator(data = mediation_example,
out.model = lm(c ~ x + m_01 + x*m_01, data = mediation_example),
med.model = glm(m_01 ~ x, data = mediation_example, family = "binomial"),
treat = "x")
%mediation(data = mediation_example, yvar = c, avar = x, mvar = m_01, cvar = , a0 = 0, a1 = 1, m = 0, yreg = linear, mreg = logistic, interaction = true, casecontrol=, output=, c=, boot=, cens=);
run;
|
|
mediator(data = mediation_example,
out.model = glm(y ~ x + m + c + x*m,
data = mediation_example, family = "binomial"),
med.model = lm(m ~ x + c, data = mediation_example),
treat = "x")
%mediation(data = mediation_example, yvar = y, avar = x, mvar = y, cvar = c, a0 = 0, a1 = 1, m = 0, yreg = logistic, mreg = linear, interaction = true, casecontrol=, output=, c=, boot=, cens=);
run;
|
|
Models for Y (outcome) and M (mediator), correctly specified
\[ E[Y|a,m,c] = \theta_{0} + \theta_{1}a + \theta_{2}m + \theta_{3}am + \theta'_{4}c \]
\[ E[M|a,c] = \beta_{0} + \beta_{1}a + \beta'_{2}c \]
Average controlled direct effect, natural direct effect and natural indirect effect
\[ E[Y_{am} - Y_{a*m} | c] = (\theta_{1} + \theta_{3}m)(a-a^{*}) \]
\[ E[Y_{aM_{a^{*}}} - Y_{a^{*}M_{a^{*}}} | c] = \{\theta_{1} + \theta_{3}(\beta_{0} + \beta_{1}a^{*} + \beta'_{2}c)\}(a-a^{*})\]
\[ E[Y_{aM_{a}} - Y_{aM_{a^{*}}} | c] = (\theta_{2}\beta_{1} + \theta_{3}\beta_{1}a)(a-a^{*})\]
Models for Y (outcome) and M (mediator), correctly specified
\[ logit\{P(Y=1|a,m,c)\} = \theta_{0} + \theta_{1}a + \theta_{2}m + \theta_{3}am + \theta'_{4}c\]
\[ E[M|a,c] = \beta_{0} + \beta_{1}a + \beta'_{2}c \]
Controlled direct effect, natural direct effect and natural indiect effect on the odds ratio scale
\[ OR^{CDE}(m) = exp[(\theta_{1} + \theta_{3}m)(a-a^{*})] \]
\[ OR^{NDE} = exp\{(\theta_{1} + \theta_{3}\beta_{0} + \theta_{3}\beta_{1}a^{*} + \theta_{3}\beta'_{2}C + \theta_{3}\theta_{2}\sigma^{2})(a-a^{2})\}\]
\[ OR^{NIE} = exp\{(\theta_{2}\beta_{1} + \theta_{3}\beta_{1}a)(a-a^{*})\}\]
Models for Y (outcome) and M (mediator), correctly specified
\[ E[Y|a,m,c] = \theta_{0} + \theta_{1}a + \theta_{2}m + \theta_{3}am + \theta'_{4}c \]
\[ logit\{P(M = 1|a,c)\} = \beta_{0} + \beta_{1}a + \beta'_{2}c\]
Average controlled direct effect, natural direct effect and natural indirect effect
\[ E[Y_{am} - Y_{a*m} | c] = (\theta_{1} + \theta_{3}m)(a-a^{*}) \]
\[ E[Y_{aM_{a^{*}}} - Y_{a^{*}M_{a^{*}}} | c] = \{\theta_{1}(a-a^{*})\} + \{\theta_{3}(a-a^{*})\} \frac{exp[\beta_{0} + \beta_{1}a^{*} + \beta'_{2}c]}{1+exp[\beta_{0} + \beta_{1}a^{*} + \beta'_{2}c]}\]
\[ E[Y_{aM_{a}} - Y_{aM_{a^{*}}}|c] = (\theta_{2} + \theta_{3}a)\{\frac{exp[\beta_{0} + \beta_{1}a + \beta'_{2}c]}{1+exp[\beta_{0} + \beta_{1}a + \beta'_{2}c]} - \frac{exp[\beta_{0} + \beta_{1}a^{*} + \beta'_{2}c]}{1+exp[\beta_{0} + \beta_{1}a^{*} + \beta'_{2}c]}\}\]
Models for Y (outcome) and M (mediator), correctly specified
\[ logit\{P(Y=1|a,m,c)\} = \theta_{0} + \theta_{1}a + \theta_{2}m + \theta_{3}am + \theta'_{4}c\]
\[ logit\{P(M = 1|a,c)\} = \beta_{0} + \beta_{1}a + \beta'_{2}c\]
Controlled direct effect, natural direct effect and natural indiect effect on the odds ratio scale
\[ OR^{CDE}(m) = exp[(\theta_{1} + \theta_{3}m)(a-a^{*})]\]
\[ OR^{NDE} = \frac{exp(\theta_{1}a)\{1+exp(\theta_{2} + \theta_{3}a + \beta_{0} + \beta_{1}a^{*} + \beta'_{2}c)\}}{exp(\theta_{1}a^{*})\{1+exp(\theta_{2} + \theta_{3}a^{*} + \beta_{0} + \beta_{1}a^{*} + \beta'_{2}c)\}}\]
\[ OR^{NIE} = \frac{\{1 + exp(\beta_{0} + \beta_{1}a^{*} + \beta'_{2}c)\}\{1 + exp(\theta_{2} + \theta_{3}a + \beta_{0} + \beta_{1}a + \beta'_{2}c)\}}{\{1 + exp(\beta_{0} + \beta_{1}a + \beta'_{2}c)\}\{1 + exp(\theta_{2} + \theta_{3}a + \beta_{0} + \beta_{1}a^{*} + \beta'_{2}c)\}}\]