Derivation of minimum best sample size from microarray data sets: A Monte Carlo approach
- Resource Type
- Conference
- Authors
- Bi, Chengpeng; Becker, Mara; Leeder, Steve
- Source
- 2011 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2011 IEEE Symposium on. :1-6 Apr, 2011
- Subject
- Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Accuracy
Support vector machines
Monte Carlo methods
Training
Testing
Logistics
Mathematical model
- Language
NCBI has been accumulating a large repository of microarray data sets, namely Gene Expression Omnibus (GEO). GEO is a great resource enabling one to pursue various biological and pathological questions. The question we ask here is: given a set of gene signatures and a classifier, what is the best minimum sample size in a clinical microarray research that can effectively distinguish different types of patient responses to a therapeutic drug. It is difficult to answer the question since the sample size for most microarray experiments stored in GEO is very limited. This paper presents a Monte Carlo approach to simulating the best minimum microarray sample size based on the available data sets. Support Vector Machine (SVM) is used as a classifier to compute prediction accuracy for different sample size. Then, a logistic function is applied to fit the relationship between sample size and accuracy whereby a theoretic minimum sample size can be derived.