Genome sequencing can offer critical insight into pathogen spread in viral outbreaks, but existing transmission inference methods use simplistic evolutionary models and only incorporate a portion of available genetic data. Here, we develop a robust evolutionary model for transmission reconstruction that tracks the genetic composition of within-host viral populations over time and the lineages transmitted between hosts. We confirm that our model reliably describes within-host variant frequencies in a dataset of 134,682 SARS-CoV-2 deep-sequenced genomes from Massachusetts, USA. We then demonstrate that our reconstruction approach infers transmissions more accurately than two leading methods on synthetic data, as well as in a controlled outbreak of bovine respiratory syncytial virus and an epidemiologically-investigated SARS-CoV-2 outbreak in South Africa. Finally, we apply our transmission reconstruction tool to 5,692 outbreaks among the 134,682 Massachusetts genomes. Our methods and results demonstrate the utility of within-host variation for transmission inference of SARS-CoV-2 and other pathogens, and provide an adaptable mathematical framework for tracking within-host evolution.
Competing Interests: DECLARATIONS OF INTERESTS P.C.S. is a co-founder and shareholder of Sherlock Biosciences and Delve Bio and is a non-executive board member and shareholder of Danaher Corporation. P.C.S. is an inventor on patents related to diagnostics and Bluetooth-based contact tracing tools and technologies filed with the USPTO and other intellectual property bodies. A patent application has been filed on inventions described in this manuscript. All other authors declare no competing interests.