Subjective speech quality scores needed to train models for the automatic evaluation of telecommunication systems have generally been collected by conducting demanding laboratory tests. Alternatively, crowdsourcing has emerged as a valid method to address user-centered studies to a large pool of users over the Internet. However, crowdsourcing users often do not follow the instructions and may execute the assigned task in noisy environments. The validity of the data collected in a disturbed environment is questionable, especially in speech quality assessment and other audio-related tasks. This work investigates the applicability of different ensemble-based and non-linear models to correct the bias found in speech quality ratings given to a German speech dataset in noisy crowdsourcing environments. Such a model would help to avoid throwing away quality scores that could be used. The model was trained with data collected in a speech quality assessment study conducted in a simulated crowdsourcing environment in the laboratory. Two groups of listeners rated the quality of speech stimuli in the presence of environmental noise at different levels. The noise under test was street traffic, and the levels ranged from 36dBA to 65. 5dBA. A fine-tuned gradient boosting regressor yielded the best results with a R 2 score of 0.90 and RMSE of 0.416.