Reconstructing latent periods in genome sequences with insertions and deletions
- Resource Type
- Conference
- Authors
- Arora, Raman; Dewey, Colin; Sethares, William A.
- Source
- 2009 IEEE International Workshop on Genomic Signal Processing and Statistics Genomic Signal Processing and Statistics, 2009. GENSIPS 2009. IEEE International Workshop on. :1-4 May, 2009
- Subject
- Computing and Processing
Signal Processing and Analysis
Genomics
Bioinformatics
Sequences
Hidden Markov models
Maximum likelihood estimation
DNA
Iterative algorithms
Fourier transforms
Random variables
Biomedical engineering
- Language
- ISSN
- 2150-3001
2150-301X
Tandem and latent repeats in genome sequences provide insight into its various structural and functional roles. Such regions in genome sequences are modeled as cyclostationary processes, generated by a collection of information sources in a cyclic manner. The maximum likelihood (ML) estimates can be easily generated for the cyclostationary profiles and for the statistical period of such subsequences. However, in the presence of insertions and deletions, the ML estimators suffer greatly in their ability to accurately identify the periods. This paper extends the cyclic model to a profile hidden Markov model (PHMM) to account for insertions and deletions. An iterative algorithm is developed to learn parameters of the PHMM and Viterbi algorithm is employed to learn the most likely path through the state space. This reconstructs likely insertions and deletions in the sequence and results in better estimates of the statistical period and cyclostationary profiles than the ML approach. Experimental results are provided with simulated sequences as well as with chromosome 1 sequence from human genome.