Statistical methods for DNA sequences : detection of recombination and distance estimation
- Resource Type
- Electronic Thesis or Dissertation
- Authors
- McGuire, Grainne
- Source
- Subject
- 572.85
- Language
- English
Two problems in phylogenetics are considered here: the detection of evidence of recombination in DNA sequence multiple alignments and the improved estimation of confidence intervals for genetic distance estimators. Recombination between distinct species can result in mosaic sequences which often invalidate a simple tree-like model for between-species relationships. A graphical method based on pairwise distances and least squares is proposed as an initial scan of data sets for evidence of recombination prior to a phylogenetic analysis. A Bayesian model of recombination for data sets with a small number of species is described, which allows Hidden Markov model theory to be used to carry out computations (e.g., the calculation of the maximum a posteriori estimate). Accurate estimation of confidence intervals for genetic distance estimators is important for comparing the relative rates of nucleotide substitution in different regions of DNA or for estimating the time since the most recent common ancestor. Two approximations to the sampling distributions of distance estimators are proposed. The first is a transformation of a normal density and may be applied to one-parameter models of nucleotide substitution only; this yields very accurate approximations to confidence intervals for a large range of distances. The second is the saddlepoint approximation which has a wider range of applicability (applicable to some two and three parameter models) and also performs well for a range of distances.