When using near-infrared spectral modeling to determine the protein content of wheat, spectral instrument and background noise will reduce the prediction accuracy of the model. In order to make the prediction results not affected by the data itself, this paper study the effect of the data preprocessing method on modeling. The experiment shows that the best results can be obtained using the KS-MC-PLSR algorithm on this dataset, the R2 and RMSE of the experimental test set are 0.9902 and 0.1685, respectively.