A system, method, and computer program product for recognizing entity names from a plurality of documents. Embodiments of the methods comprise selecting a selection of documents from a plurality of documents, the selection of documents sharing a common pattern in their titles. The method further comprises determining a name candidate for each document in the selection by applying the common pattern to the title of the document, and matching the name candidates with a collection of entity names (the white list). Responsive to determining a match between the name candidates and the entity names in the white list, the method determines that the name candidates are valid entity names. In one embodiment, the name candidates are added to the white list after being determined to be valid entity names.