Soybean [Glycine max (L.) Merr.] was domesticated from wild soybean (G. soja Sieb. and Zucc.) and has been further improved as a dual-use seed crop to provide highly valuable oil and protein for food, feed, and industrial applications. However, the underlying genetic and molecular basis remains less understood. Having combined high-confidence bi-parental linkage mapping with high-resolution association analysis based on 631 whole sequenced genomes, we mapped major soybean protein and oil QTLs on chromosome15 to a sugar transporter gene (GmSWEET39). A two-nucleotide CC deletion truncating C-terminus of GmSWEET39 was strongly associated with high seed oil and low seed protein, suggesting its pleiotropic effect on protein and oil content. GmSWEET39 was predominantly expressed in parenchyma and integument of the seed coat, and likely regulates oil and protein accumulation by affecting sugar delivery from maternal seed coat to the filial embryo. We demonstrated that GmSWEET39 has a dual function for both oil and protein improvement and undergoes two different paths of artificial selection. A CC deletion (CC-) haplotype H1 has been intensively selected during domestication and extensively used in soybean improvement worldwide. H1 is fixed in North American soybean cultivars. The protein-favored (CC+) haplotype H3 still undergoes ongoing selection, reflecting its sustainable role for soybean protein improvement. The comprehensive knowledge on the molecular basis underlying the major QTL and GmSWEET39 haplotypes associated with soybean improvement would be valuable to design new strategies for soybean seed quality improvement using molecular breeding and biotechnological approaches. Author summary: We map highly effective protein and oil QTLs to a seed coat-preferentially expressed sugar transporter (GmSWEET39) gene by a combination of association analysis based on 631 whole-genome sequencing data and a bi-parental linkage mapping, proving that GmSWEET39 has pleiotropic associations with two of the most important soybean traits, seed protein and oil. A 2-bp (CC) deletion in GmSWEET39 is associated with higher seed oil and lower seed protein, and has been extensively selected and used worldwide, likely for higher oil. The intensive use or fixation of the CC deletion in soybean breeding result in low protein in current soybean cultivars, which is a big problem facing the current soybean industry. The knowledge about the genetic basis and identification of two major haplotypes for high protein and high oil should be highly significant to address the issue related to low protein in the current soybean industry and meet dramatically increasing need for plant-based protein in the food industry. Our successful integrative and "big-data-driven" approach, which uses the huge amount of genome and transcriptome sequencing and phenotypic data available in the community, should provide an effective case study for post-genomic data-driven research. [ABSTRACT FROM AUTHOR]