rmcfs: An R Package for Monte Carlo Feature Selection and Interdependency Discovery
- Resource Type
- Authors
- Jacek Koronacki; Michał Dramiński
- Source
- Journal of Statistical Software; Vol 85 (2018); 1-28
Journal of Statistical Software, Vol 85, Iss 1, Pp 1-28 (2018)
- Subject
- 0301 basic medicine
Statistics and Probability
Clustering high-dimensional data
Computer science
high-dimensional problems
Monte Carlo method
Feature selection
Q325
Machine Learning
Bioinformatics
Biology
Genetics
computer.software_genre
Ranking (information retrieval)
Set (abstract data type)
03 medical and health sciences
feature selection
Component (UML)
ID graph
lcsh:Statistics
lcsh:HA1-4737
Categorical variable
030104 developmental biology
Graph (abstract data type)
MCFS-ID
Java
R
Data mining
Statistics, Probability and Uncertainty
computer
Software
- Language
- ISSN
- 1548-7660
We describe the R package rmcfs that implements an algorithm for ranking features from high dimensional data according to their importance for a given supervised classification task. The ranking is performed prior to addressing the classification task per se. This R package is the new and extended version of the MCFS (Monte Carlo feature selection) algorithm where an early version was published in 2005. The package provides an easy R interface, a set of tools to review results and the new ID (interdependency discovery) component. The algorithm can be used on continuous and/or categorical features (e.g., gene expression and phenotypic data) to produce an objective ranking of features with a statistically well-defined cutoff between informative and non-informative ones. Moreover, the directed ID graph that presents interdependencies between informative features is provided.