In this work, we propose a framework that enables collection of large-scale, diverse sign language datasets that can be used to train automatic sign language recognition models. The first contribution of this work is SDTRACK, a generic method for signer tracking and diarisation in the wild. Our second contribution is to show how SDTRACK can be used to automatically annotate 90 hours of British Sign Language (BSL) content featuring a wide range of signers, and including interviews, monologues and debates. Using SDTRACK, this data is annotated with 35K active signing tracks, with corresponding video-level signer identifiers and subtitles, and 40K automatically localised sign labels.