Most worldwide industrial wastewater, including in China, is still directly discharged to aquatic environments without adequate treatment. Because of a lack of data and few methods, the relationships between pollutants discharged in wastewater and those in surface water have not been fully revealed and unsupervised machine learning techniques, such as clustering algorithms, have been neglected in related research fields. In this study, real-time monitoring data for chemical oxygen demand (COD), ammonia nitrogen (NH3-N), pH, and dissolved oxygen in the wastewater discharged from 2213 factories and in the surface water at 18 monitoring sections (sites) in 7 administrative regions in the Yangtze River Basin from 2016 to 2017 were collected and analyzed by the partitioning around medoids (PAM) and expectation−maximization (EM) clustering algorithms, Welch t-test, Wilcoxon test, and Spearman correlation. The results showed that compared with the spatial cluster comprising unpolluted sites, the spatial cluster comprised heavily polluted sites where more wastewater was discharged had relatively high COD (>100 mg L−1) and NH3-N (>6 mg L−1) concentrations and relatively low pH (