Agitation is a key behavioural and psychological symptom exhibited by people with dementia. These behaviours can put the patient with dementia and others' health and safety at risk. Surveillance cameras installed in long-term care facilities provide an opportunity to monitor patients continuously and flag behaviours of risks, including agitation. However, agitation behaviours occur rarely and diversely, leading to very small training data. Therefore, an anomaly detection approach is more suitable for this problem. In this paper, we train three baseline spatio-temporal convolutional autoencoders (on raw video, skeletons and segmentation mask) on 21 hours of normal activities and tested it on 9 hours of labelled normal and agitation data collected from a real patient in a dementia unit. The deployment of anomaly detection-based classifiers is challenging in real-world due to the absence of a validation set to obtain an operating threshold to regulate true positive and false positive rates. We present a new approach to create a proxy validation set for unseen agitation events utilizing the outliers within normal activities, and trained two separate autoencoders on normal and outliers activities. Then, we present 11 empirical thresholding approaches (existing, adapted and new) using either only normal training data or the proxy validation set. Our results showed consistently across raw video, skeletons and segmentation masks input that incorporating a proxy validation set improved performance both in terms of geometric means and Matthew's correlation coefficient. This paper highlights the real-world deployment challenges and assessment of the limit of true positives or false positives that can be acceptable in a clinical care environment.