We look at the distribution of the Bayesian evidence for mock realizations of supernova and baryon acoustic oscillation data. The ratios of Bayesian evidences of different models are often used to perform model selection. The significance of these Bayes factors are then interpreted using scales such as the Jeffreys or Kass \& Raftery scale. First, we demonstrate how to use the evidence itself to validate the model, that is to say how well a model fits the data, regardless of how well other models perform. The basic idea is that if, for some real dataset a model's evidence lies outside the distribution of evidences that result when the same fiducial model that generates the datasets is used for the analysis, then the model in question is robustly ruled out. Further, we show how to assess the significance of a hypothetically computed Bayes factor. We show that the range of the distribution of Bayes factors can greatly depend on the models in question and also the number of data points in the dataset. Thus, we have demonstrated that the significance of Bayes factors needs to be calculated for each unique dataset.
Comment: 9 pages, 7 figures