Modern combine harvesters can collect geo-located real-time yield measurement while harvesting. This data can be used to train Machine Learning models that predict the yield at sub-field level based on remote sensing input data. The performance of these models is, however, highly dependent on the quality of the yield data. It is therefore important to develop automatic cleaning techniques to correct for common errors in combine harvester yield maps. In this work, we compare different combinations of data cleaning techniques by evaluating their impact on the yield-prediction model performance at field and sub-field level. Our findings indicate that basic cleaning techniques such as absolute thresholds are sufficient at the field level, whereas the performance at the sub-field level is enhanced through the utilization of more intricate statistical cleaning methods.