Consensus contours are often used to reduce annotator error in segmentation data. Here, we investigate the use of multi-annotator labels, in simulated data, for training auto-segmentation models. We compare convolutional neural networks (CNNs) trained on all observer annotations, STAPLE and majority voting estimates. We generate annotation sets by simulating observer delineations with varying noise and bias. By altering the quantity of observers and their relative biases, we investigate the impact of bias on CNN performance.We find that models trained on STAPLE contours are significantly worse when presented with biased annotations. CNNs trained on all annotations performed the best and were able to implicitly account for biased annotations in the training set.