@datagrok/impute v1.0.4
Impute
Impute is a package for the Datagrok platfrom. It provides the user with a selection of tools designed to impute an incomplete dataset in an orderly predictive fashion, so as to retain the rest of the feature values associated with the missing instances.
Usage guide
This package is structured as a two-step wizard. Firstly, the user selects which columns they wish to impute and which columns they wish to be the source of inference. Upon completing the preliminary visual analysis (assisted by matrix, cluster and correlation plots) and discarding the undesired columns the user is presented with a choice of algorithms. These are split into three groups as follows:
- For numeric data: applicable to both discrete and continuous variables (but not factors);
- For categorical data: applicable to all categorical variables, both string and numeric;
- For mixed data: applicable to any combination of variable data types;
There is no need to preprocess or factorize the variables in advance. All the transformations (along with normalization and centering, where applicable) are performed internally. If there is a requiremenet for user input it will be explicitly stated so under the 'algorithm parameters' tab. Some algorithms might still perform their function if the wrong data type is selected, however the quality of the resulting inference will be sub-standard.
Methods summary
Some of these methods work better than others, depending on the missingness pattern, feature data types and the hidden deletion mechanisms, etc. A clear understanding of the dataset and all of its aforementioned properties is required to achieve reliable results. Please carefully review the presented selection of models and pay close attention to the associated hyper-parameters and their effects on data (note: some adjustable elements are set to sensible defaults and hidden from the user).
Algorithm | Numeric data | Categorical data | Mixed data |
---|---|---|---|
Hmisc::aregImpute:pmm | ✓ | ✓ | ✓ |
Hmisc::aregImpute:regression | ✓ | ||
Hmisc::aregImpute:pmm | ✓ | ||
VIM::kNN | ✓ | ✓ | ✓ |
mice::mice | ✓ | ✓ | ✓ |
missMDA::PCA | ✓ | ||
missMDA::FAMD | ✓ | ✓ | ✓ |
missMDA::MCA | ✓ | ||
pcaMethods::nipals | ✓ | ||
pcaMethods::ppca | ✓ | ||
pcaMethods::bpca | ✓ | ||
pcaMethods::nlpca | ✓ | ||
missForest::missForest | ✓ | ✓ | ✓ |