{missRanger} uses the {ranger} package to do fast missing value imputation by chained random forest. As such, it serves as an alternative implementation of the beautiful ‘MissForest’ algorithm, see vignette.
The main function missRanger()
offers the option to
combine random forest imputation with predictive mean matching. This
firstly avoids the generation of values not present in the original data
(like a value 0.3334 in a 0-1 coded variable). Secondly, this step tends
to raise the variance in the resulting conditional distributions to a
realistic level, a crucial element to apply multiple imputation
frameworks.
# From CRAN
install.packages("missRanger")
# Development version
::install_github("mayer79/missRanger") devtools
We first generate a data set with about 10% missing values in each
column. Then those gaps are filled by missRanger()
. In the
end, the resulting data frame is displayed.
library(missRanger)
# Generate data with missing values in all columns
<- generateNA(iris, seed = 347)
irisWithNA
# Impute missing values
<- missRanger(irisWithNA, pmm.k = 3, num.trees = 100)
irisImputed
# Check results
head(irisImputed)
head(irisWithNA)
head(iris)
# Replace random forest by extremely randomized trees
<- missRanger(
irisImputed_et
irisWithNA, pmm.k = 3,
splitrule = "extratrees",
num.trees = 100
)
# Using the pipe...
|>
iris generateNA() |>
missRanger(pmm.k = 5, verbose = 0) |>
head()
Check out the vignettes for more info.