ObjectivesThe aims of this study are firstly to investigate the diagnostic and triage performance of symptom checkers, secondly to assess their potential impact on healthcare utilisation and thirdly to investigate for variation in performance between systems.SettingPublicly available symptom checkersParticipantsPublicly available symptom-checkers were identified. A standardised set of 50 clinical vignettes was developed and systematically run through each system by a non-clinical researcher.Primary and secondary outcome measuresSystem accuracy was assessed by measuring the percentage of times the correct diagnosis was a) listed first, b) within the top five diagnoses listed and c) listed at all. The safety of the disposition advice was assessed by comparing it with national guidelines for each vignette.ResultsTwelve tools were identified and included. Mean diagnostic accuracy of the systems was poor, with the correct diagnosis being listed first on 37.7% (Range 22.2 to 72.0%) of occasions and present in the top five diagnoses on 51.0% (Range 22.2 to 84.0%). 51.0% of systems suggested additional resource utilisation above that recommended by national guidelines (range 18.0% to 61.2%). Both diagnostic accuracy and appropriate resource recommendation varied substantially between systems.ConclusionsThere is wide variation in performance between available symptom checkers and overall performance is significantly below what would be accepted in any other medical field, though some do achieve a good level of accuracy and safety of disposition. External validation and regulation are urgently required to ensure these public facing tools are safe.Strengths and LimitationsData collection was undertaken by non-clinically trained staff to replicate patient behaviour and there was random sampling to test the inter-rater reliabilityClinical vignettes were agreed by a clinical team consisting of a GP, a pharmacist and a hospital emergency care consultantCurrent UK guidelines were used to assess service utilisation. Where symptom checkers were developed outside of the UK the disposition advice may be unlikely to be aligned due to different jurisdictionsThis research was a limited indirect study on the variety of terms and language patients might use in their interactions with these systemsThere was no assessment of how a clinician would diagnose and triage a patient presenting with the vignette symptoms