Automatic recognition of human activities is important for the development of next generation video-surveillance systems. In this paper we address the specific problem of automatically detecting violent interpersonal acts in monocular colour video streams. Unlikely previous approaches, only little knowledge is assumed about the acquisition setup and about the content of the acquired scenes. So the proposed approach is suitable in a wide range of practical cases. Reliability and general-purpose applicability is achieved by analysing low-level features (like the spatial-temporal behaviour of coloured stains), and by measuring some warping and motion parameters. In this way it is not necessary to extract accurate target silhouettes, that is a critical task because of occlusions and overcrowding that are typical during interpersonal contacts. A suitable index called Maximum Warping Energy (MWE) has been defined to describe the localized spatial-temporal complexity of colour conformations. Our experiments show that aggressive activities give significantly higher MWE values if compared with safe actions like: walking, running, embracing or handshaking. So it is possible to distinguish violent acts from normal behaviours even in presence of many people and crowded environments. Homography is used to improve robustness by verifying the real targets nearness. False interactions because of perspective-induced occlusions are discarded.