Pool results from models fitted on multiply imputed datasets

Combines parameter estimates and standard errors from models fitted on m multiply imputed datasets using Rubin's rules (Rubin, 1987). Degrees of freedom are adjusted using the Barnard-Rubin (1999) small-sample correction.

This function works with any fitted model that supports coef and vcov methods (e.g., lm, glm, survival::coxph, etc.).

Results are validated against pool from the mice package for lm, glm (logistic and Poisson), weighted regression, interactions, and varying numbers of imputations.

pool(fits, dfcom = NULL)

Arguments

fits: a list of fitted model objects of length m >= 2. Each model must support coef() and vcov() methods. All models must have the same number of coefficients.
dfcom: a positive integer or Inf. The complete-data degrees of freedom. If NULL (default), it is extracted from the fitted models via df.residual. Set to Inf to skip the Barnard-Rubin small-sample correction.

Value

A data.frame with one row per parameter and columns:

term: Coefficient name.
m: Number of imputations.
estimate: Pooled estimate (average across m models).
std.error: Pooled standard error (sqrt(t)).
statistic: t-statistic (estimate / std.error).
p.value: Two-sided p-value from a t-distribution with df degrees of freedom.
df: Degrees of freedom (Barnard-Rubin adjusted).
riv: Relative increase in variance due to nonresponse: (1 + 1/m) * b / ubar.
lambda: Proportion of total variance attributable to missingness: (1 + 1/m) * b / t.
fmi: Fraction of missing information.
ubar: Within-imputation variance (average of the m variance estimates).
b: Between-imputation variance (variance of the m point estimates).
t: Total variance: ubar + (1 + 1/m) * b.
dfcom: Complete-data degrees of freedom used.
conf.low: Lower bound of the 95% confidence interval.
conf.high: Upper bound of the 95% confidence interval.

References

Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons.

Barnard, J. and Rubin, D.B. (1999). Small-sample degrees of freedom with multiple imputation. Biometrika, 86(4), 948-955.

Examples

library(miceFast)
set.seed(123)
data(air_miss)

# Step 1: Generate m = 5 completed datasets using fill_NA with a stochastic model
completed <- lapply(1:5, function(i) {
  dat <- air_miss
  dat$Ozone <- fill_NA(
    x = dat,
    model = "lm_bayes",
    posit_y = "Ozone",
    posit_x = c("Solar.R", "Wind", "Temp")
  )
  dat
})

# Step 2: Fit a model on each completed dataset
fits <- lapply(completed, function(d) {
  lm(Ozone ~ Solar.R + Wind + Temp, data = d)
})

# Step 3: Pool using Rubin's rules
pool(fits)
#> Pooled results from 5 imputed datasets
#> Rubin's rules with Barnard-Rubin df adjustment
#> 
#>         term  estimate std.error statistic    df   p.value
#>  (Intercept) -42.28051  22.45982    -1.882 79.65 6.342e-02
#>      Solar.R   0.06555   0.02846     2.303 16.30 3.475e-02
#>         Wind  -3.97599   0.62976    -6.314 82.98 1.279e-08
#>         Temp   1.43015   0.26194     5.460 49.73 1.514e-06

Pool results from models fitted on multiply imputed datasets

Arguments

Value

References

See also

Examples