Objectives:Develop a natural language processing (NLP) system for identification of patients with peripheral artery disease (PAD).Background:Despite its high prevalence and clinical impact, research on PAD remains limited due to poor accuracy of billing codes. Ankle and toe-brachial index (ABI, TBI) can be used to identify PAD patients with high accuracy using electronic health record (EHR) data.Methods:A random sample of 800 ABI test reports from 94 Veterans Affairs (VA) facilities during 2015-2017 were selected and annotated by clinical experts. We trained the NLP system using random forest models and optimized it through sequential iterations of 10-fold cross validation and error-analysis on 600 test reports and evaluated its final performance on a separate set of 200 reports. We also assessed the accuracy of NLP-extracted ABI and TBI values for identifying patients with PAD in a separate cohort undergoing ABI testing.Results:The NLP system had an overall precision (positive predictive value) of 0.85, recall (sensitivity) of 0.93 and F1-measure (overall accuracy) of 0.89 to correctly identify ABI/TBI values and laterality. Among 261 patients with ABI testing (49% PAD), the NLP system achieved a positive predictive value of 92.3%, sensitivity of 83.1% and specificity of 93.1% to identify PAD when compared to a structured chart review (Table). The above findings were consistent in a range of sensitivity analysis (Table).Conclusion:We successfully developed and validated an NLP system for identifying patients with PAD within the VA’s EHR. Our findings have broad implications for PAD research and quality improvement efforts.