Evaluating reliability of tree-patterns in extreme-K categorical samples problems.
- Resource Type
- Article
- Authors
- Chou, Elizabeth; Hsieh, Yin-Chen; Enriquez, Sabrina; Hsieh, Fushing
- Source
- Journal of Statistical Computation & Simulation. Dec 2021, Vol. 91 Issue 18, p3828-3849. 22p.
- Subject
- *BINARY sequences
*HIERARCHICAL clustering (Cluster analysis)
*MULTINOMIAL distribution
*FAULT trees (Reliability engineering)
*DATA analysis
*HISTOGRAMS
*BINARY codes
- Language
- ISSN
- 0094-9655
Exploratory Data Analysis (EDA) approaches are adopted to address the difficult extreme-K categorical sample problem. Due to observed data's categorical nature, all comparisons among populations are performed by comparing their distributions in the form of a histogram with symbolic bins. A distance measure is designed to evaluate the discrepancy between two symbol-based histograms to facilitate Hierarchical Clustering (HC) algorithms. The resultant binary HC-tree then serves as the basis for our EDA task of discovering tree-patterns of interest. Since each population-leaf's location within a binary HC-tree's geometry is expressed through a binary code sequence, a binary code segment characterizes all commonly shared tree-patterns for all members. We then generate a large ensemble of mimicries of the observed dataset based on multinomial distributions and construct a large ensemble of binary HC-trees. Upon each identified tree-pattern which we determined based on the observed dataset, we evaluate its reliability and uncertainty through two histograms. [ABSTRACT FROM AUTHOR]