Recent advances in Artificial Intelligence (AI) have paved the way for the development of new generations of self-adaptive systems that embed learning behaviours. Often these systems make use of Machine Learning (ML) models and algorithms, others make use of symbolic reasoning, or a combination of the two. A problem common to all these solutions is the difficulty in establishing clear conformance criteria that can be used to reliably assess whether an AI-based software system (and, in particular, ML-based) is behaving as intended, i.e., according to its specification. Research communities from different areas are investigating innovative V&V approaches in order to assess evolving AI systems against their expected functionalities. This empirical study identifies, collects and categorises relevant research papers on testing and formal verification of AI-based software systems. In total, we have considered a set of 78 fully qualified primary studies from the digital library Scopus. For each of them, we have mapped their key aspects into a classification framework that supports their comparison across a set of common dimensions.