Expanded understanding of the principles dictating the earliest stages of embryonic development will benefit mechanistic studies of cell lineage restriction and facilitate the use of stem cells for disease modeling and regenerative medicine efforts. While a complex transcriptional hierarchy within the early stages of embryonic development has been well documented for many lineages, including cardiac [1] knowledge of the events occurring at the cell surface remains limited for most lineages. Due to the relevance of cell surface proteins for defining how cells can sense and respond to their microenvironment, mapping of cell surface proteins would provide critical insight regarding signaling events in early development as well as provide a means for accessible, non-genetic cell identification and tracking through the use of immunophenotyping [2]. While a cell surface lineage map has been established for hematopoietic stem cell subpopulations [3] similarly complete surface protein maps are not yet available for other stem cell progeny. A major hurdle to the development of surface marker maps is the technical challenge of discovering new, cell type-specific markers. While some progress has been made using transcriptomics, antibody screens, and proteomics [4–7], gene expression patterns correlate poorly with protein abundance on the cell surface [8] and limitations to antibody screens include the limited availability and specificity of monoclonal antibodies that recognize surface exposed epitopes. Due to the limited amount of primary tissue available to study early developmental stages, discovery-based proteomic analysis is more practically achieved using comparable stage embryo-derived stem cell culture models. In this study, three cell lines and one in vitro differentiated cell type representing distinct cell fates and stages in mouse embryogenesis were assessed. The earliest developmental stages included the pluripotent epiblast and extraembryonic primitive endoderm-like, represented by embryonic stem cells (mESC) and extraembryonic endoderm (XEN) cells, respectively. Later developmental stages include mESC-derived cardiomyocytes (mESC-CM) and embryonic fibroblasts (MEFs). Using a selective approach that exploits the prediction that >90% of cell surface proteins are glycosylated [9], we have generated unique views of surface N-glycoproteomes on these four cell types. Our approach, termed Cell Surface Capture (CSC-Technology), is an antibody-independent strategy that uses affinity enrichment of N-glycoproteins and mass spectrometry to achieve >85% specificity for cell surface proteins while simultaneously determining N-glycosite occupancy and membrane topology [10–12]. The benefits of CSC-Technology include its high specificity for surface-accessible proteins and its ability to directly verify the extracellular domains by identifying the site of N-glycosylation, thereby avoiding reliance on database annotations and/or prediction algorithms to determine protein localization. We have previously published qualitative N-glycoproteomes of mESC, miPSC, C2C12 myoblasts, human fibroblasts, hiPSC and hESC [8, 10, 12]. The current dataset extends qualitative descriptions of the cell surface N-glycoproteome to MEFs and mESC-CM and reports a quantitative comparison of mESC and XEN based on stable isotope labeling of amino acids in cell culture (SILAC) analysis. The quantitative comparison of XEN and mESC is expected to shed light on markers for identifying and isolating lineage restricted progenitors from blastocysts, and the analyses of MEFs and mESC-CMs are included to benefit the future development of markers for identifying and isolating cardiomyogenic progeny derived from pluripotent stem cells. Mouse ESC (R1[13]), XEN (X10[14]) and mouse embryonic fibroblasts (STO) were cultured as described [12, 15–17], with the exception that mESC and XEN were cultured in SILAC media containing dialyzed serum [18]. Under these conditions, mESC stain positive for OCT4 and CD49f and XEN are positive for GATA4 (Fig 1A,B). For SILAC, mESC were cultured with heavy stable isotope versions of lysine (13C6, 15N2) and arginine (13C6, 15N4), whereas XEN were cultured in light. For cardiac differentiation, mESCs (syNP4 subclone of R1 [19]) were differentiated as described [20] and puromycin-selected CMs were robustly positive for reference cardiac markers ACTN1, TNNI3, TNNT2, and MLC2v by day 17 of differentiation (Figure 2C), which was the time point used for proteomic analysis. For each cell type, approximately 1E8 cells per biological replicate (n≥3) were taken through the CSC-Technology workflow as reported [10–12, 21] with details provided in the supplement, with the exception of the mESC/XEN SILAC which used 1E6 cells per replicate, due to slower growth rates in the SILAC media (e.g. dialyzed FBS). For flow cytometry, cells were stained as described [12] with antibody details provided in the supplemental methods. Data were acquired on a BD LSRII flow cytometer (BD Biosciences) and analyzed using FCSExpress V3 (DeNovo Software) and histograms represent an average of at least three biological replicates. Figure 1 Surface N-glycoproteins on mESC and XEN Figure 2 Surface N-glycoproteins on cardiomyocytes 165 cell surface N-glycoproteins were identified in the mESC:XEN SILAC comparison (Table S1). Six proteins were found exclusively in XEN (F3, PDGFRa, PVR, TEK, SLCO3A1, PTH1R) and 24 (ALCAM, ALPL, BST3, CD38, CNNM2, DSG2, FAT1, FN1, GLG1, GPC3, H2–K1, LNPEP, LPAR4, PVRL1, S1PR2, SLC46A1, SLC5A1, SLC6A6, SLCO4A1, ST14, THY1, TSPAN31, SFP11, LY75) were found exclusively in mESCs. The data for PDGFRa, ALPL, and THY1 are consistent with known expression patterns [4, 14] and serve as internal controls that confirm data reliability. Furthermore, SILAC ratios for CD24, CD31, CD71, CD90, and CD140a were consistent with flow cytometry results (Figure 1B,C). In comparison to a report by Rugg-Gunn et al., [4] that used a spectral counting approach to compare membrane proteins in mESC vs XEN, the quantitative ratios are consistent with those previously reported (e.g. ALCAM, ALPL, BST2, GPC3, LAMA1, and PECAM1), but an additional 55 proteins were uniquely identified here, including CD24 (Figure 1C,D). A subset of 51 proteins and their SILAC ratios are summarized in Table 1A, including 25 unique to this study and 26 that were found with ratios consistent with Rugg- Gunn et al. It is noted that the limited number of identifications made in the SILAC comparison of mESC:XEN is attributed to the fact that fewer cells were used for labeling, as a result of practical limitations to expanding the cells in SILAC media. 560 cell surface N-glycoproteins were identified on mESC-CM and 453 were identified on MEFs, altogether including more >6500 N-glycopeptides representing extracellular-exposed protein domains (Table S2–S3). The CSC-Technology revealed SIRPA. on the surface of mESC-CMs (Figure 2), in contrast to a previous report that identified it on human ESC-CMs but was unable to confirm the presence of SIRPA on mESC-CM using antibody and microarray approaches [6]. Seven additional proteins (CRB2, DSC2, ITGA8, MSLN, NRP1, PRNP, and PTPRB) were identified via the CSC-Technology and flow cytometry analysis is consistent with their presence on the cell surface. When compared to our CSC-Technology analysis of MEFs, mESC, miPSC, human fibroblasts, hESC, and hiPSC [8, 12], 79 proteins were unique to mESC-CM, and of these, 14 were similarly identified in human ESC-CMs previously [5] (Table 1B). A benefit of the CSC-Technology is its high specificity for authentic cell surface proteins [11, 12]. All SILAC data for selected markers proved consistent with quantitative comparisons achieved by flow cytometry and other published data when available. This partially validated, new dataset should, therefore, be invaluable for future lineage tracing experiments and immunophenotyping assays. It should however be noted that lineage tracing and quantitation based on CSC-Technology data may be affected by alterations in N-glycosylation status, as only N-glycosylated peptides were detected in this study. These cell surface proteomes will continue to evolve as additional discovery strategies are applied, including those not limited to N-glycoproteins [4, 5, 21]. Moreover, there are numerous valuable studies of the N-glycoproteome of other cell types including, but not limited to, pancreatic beta cells, the developing mouse brain cell, rat heart, microglia, and breast cancer [22–26], that are also expected to prove valuable in the long term as efforts continue towards identifying cell surface markers that are lineage, cell type, or disease state specific. Knowledge of timing and abundance of the proteins present at the cell surface during embryonic development should contribute to a greater understanding of the cellular mechanisms that are involved in lineage and cell fate specification, as cell surface proteins are the gateway for receiving exogenous signals. Specifically relevant for cardiac biology, the discovery of stage-specific markers within the cardiomyogenic lineage will facilitate the analysis of molecular events that regulate critical cell fate decisions at the single cell level during very early development, in a similar way to how immunophenotyping has enabled the systematic study of molecular events regulating hematopoietic stem cell lineage specification. As monoclonal antibodies suitable for immunophenotyping are lacking for a majority of the cell surface proteins identified here, the extensive collection of extracellular epitopes defined here will facilitate the development of monoclonal antibodies that recognize the extracellular domain of surface proteins, allowing their use in live cell sorting schemes.