IntroductionAsymptomatic left ventricular dysfunction (ALVD) carries an increased risk for overt heart failure and mortality, yet treatable to mitigate disease progression. An artificial intelligence (AI)-enabled 12-lead electrocardiogram (ECG) model demonstrated promise in ALVD screening but an unexpected drop in performance was observed in external validation. We thus sought to train ade novomodel for ALVD detection and investigate its performance across multiple institutions and across a broader set of patient strata.MethodsECGs taken within 14 days of an echocardiogram were obtained from 4 academic hospitals from the United States (BWH, MGH and UCSF) and Japan (Keio). Four AI models were trained to detect patients with ejection fraction < 40% using ECG from each of the 4 institutions. All the models were then evaluated on the held-out test dataset from the same intuition and all data from the 3 external institutions. Subgroup analyses stratified by patient characteristics and common overt ECG abnormalities were performed.ResultsA total of 221,846 ECGs were identified from the 4 institutions. While the BWH-trained and Keio-trained model yielded similar accuracy on its internal test data (AUROC 0.913 and 0.914 respectively), external validity was worse for Keio-trained model (AUROC: 0.905-0.915 for BWH-trained and 0.849-0.877 for Keio-trained model). Although ECG abnormalities including atrial fibrillation, left bundle branch block and paced rhythm reduced detection, the models performed robustly across patient characteristics and other ECG features.ConclusionDifferent dataset produces models with different performance highlighting the importance of external validation and extensive stratification analysis.