Combines parameter estimates and standard errors from models fitted on m multiply imputed datasets using Rubin's rules (Rubin, 1987). Degrees of freedom are adjusted using the Barnard-Rubin (1999) small-sample correction.
This function works with any fitted model that supports coef and
vcov methods (e.g., lm, glm,
survival::coxph, etc.).
Results are validated against pool from the mice package
for lm, glm (logistic and Poisson), weighted regression, interactions,
and varying numbers of imputations.
pool(fits, dfcom = NULL)a list of fitted model objects of length m >= 2.
Each model must support coef() and vcov() methods.
All models must have the same number of coefficients.
a positive integer or Inf. The complete-data degrees of freedom.
If NULL (default), it is extracted from the fitted models via
df.residual. Set to Inf to skip the Barnard-Rubin
small-sample correction.
A data.frame with one row per parameter and columns:
Coefficient name.
Number of imputations.
Pooled estimate (average across m models).
Pooled standard error (sqrt(t)).
t-statistic (estimate / std.error).
Two-sided p-value from a t-distribution with df degrees of freedom.
Degrees of freedom (Barnard-Rubin adjusted).
Relative increase in variance due to nonresponse: (1 + 1/m) * b / ubar.
Proportion of total variance attributable to missingness: (1 + 1/m) * b / t.
Fraction of missing information.
Within-imputation variance (average of the m variance estimates).
Between-imputation variance (variance of the m point estimates).
Total variance: ubar + (1 + 1/m) * b.
Complete-data degrees of freedom used.
Lower bound of the 95% confidence interval.
Upper bound of the 95% confidence interval.
Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons.
Barnard, J. and Rubin, D.B. (1999). Small-sample degrees of freedom with multiple imputation. Biometrika, 86(4), 948-955.
library(miceFast)
set.seed(123)
data(air_miss)
# Step 1: Generate m = 5 completed datasets using fill_NA with a stochastic model
completed <- lapply(1:5, function(i) {
dat <- air_miss
dat$Ozone <- fill_NA(
x = dat,
model = "lm_bayes",
posit_y = "Ozone",
posit_x = c("Solar.R", "Wind", "Temp")
)
dat
})
# Step 2: Fit a model on each completed dataset
fits <- lapply(completed, function(d) {
lm(Ozone ~ Solar.R + Wind + Temp, data = d)
})
# Step 3: Pool using Rubin's rules
pool(fits)
#> Pooled results from 5 imputed datasets
#> Rubin's rules with Barnard-Rubin df adjustment
#>
#> term estimate std.error statistic df p.value
#> (Intercept) -42.28051 22.45982 -1.882 79.65 6.342e-02
#> Solar.R 0.06555 0.02846 2.303 16.30 3.475e-02
#> Wind -3.97599 0.62976 -6.314 82.98 1.279e-08
#> Temp 1.43015 0.26194 5.460 49.73 1.514e-06