Objective: Executive deficits are frequent sequelae of neurological disorders, but their adequate neuropsychological assessment is still contended. In line with this, results on psychometric properties of the Tower of London task planning task are equivocal and furthermore lacking completely for adult clinical populations. Methods: We used a structurally balanced item set implemented in the Tower of London (Freiburg version, TOL-F) that accounts for major determinants of problem difficulty beyond the commonly used minimum number of moves to solution. Split-half reliability, internal consistency, and criterion validity of TOL-F accuracy were assessed in patients with stroke (N=60), Parkinson syndrome (N=51), and mild cognitive impairment (N=29), and healthy adults (N=155). Results: Across samples, mean split-half and lower-bound indices of reliability of accuracy scores were adequate (r ≥ .7) or higher. Compared to a subset of well-matched healthy controls, deficits in planning accuracy emerged for all three clinical samples. Conclusions: Based on consistently adequate reliability and a good criterion validity of accuracy scores, the TOL-F demonstrates its utility for testing planning ability in clinical samples and healthy adults. Using item sets that systematically account for several determinants of task difficulty can thus significantly enhance the contended reliability of executive tasks. [ABSTRACT FROM AUTHOR]