Additional file 2: Figure S1. Mutation rates of the CRC patients. This plot shows number of mutations in each CRC sample. Figure S2. Identify best-fitted distribution to discover significant genes. This plot shows our comparison of different distribution techniques to fit the number of mutations in the genes and identify significantly mutated genes. Figure S3. Identify best-fitted distribution to discover significant gene-motifs. This plot shows our comparison of different distribution techniques to fit the number of mutations in the gene-motifs and identify significantly mutated gene-motifs. Figure S4. An overview of gene-motifs concept. We first identify 382 significantly mutated coding genes in colorectal cancers (candidate genes). We then used Fisher exact test to identify those motifs that significantly mutated within candidate genes. Figure S5. Selected 3131 features in two most significant PCAs before scaling. This plot shows two principal components (PCs) that demonstrates the potential discrimination that can be obtained from our identified features. Figure S6. Illustration of patients in two first PCAs of features. Distribution of CRC samples through 3131 gene-motif features by PCA analysis. Figure S7. Correlation between our identified signatures and Alexandrov's signatures in each CRC subtype separately. Figure S8. Mutational load of protein coding genes in each subtype separately. Each bar chart shows fraction of samples with mutation in a gene. Figure S9. Mutational load of long non-coding RNA genes in each subtype separately. Each bar chart shows fraction of samples with mutation in a lncRNA. Figure S10. Mutation rates in coding and lncRNA genes in each subtype. Red color indicates average number of mutations in lncRNA genes and green color indicates average number of mutations in coding genes. Figure S11. Consequence type analysis. Figure shows fraction of mutations in different consequence types for each subtype. Figure S12. Mutation rate in transcripts in genes TTN, PCDHA2, BRAF, APC. Figure shows mutational rate in different transcripts of genes TTN, PCDHA2, BRAF, and APC across the CRC subtypes identified in this study. Figure S13. Analysis of age distribution of CRC samples in the identified subtypes. Figure S14. Evaluation plot for deciphering 3-mer mutational signatures in the CRC samples. We used the CANCERSIGN tool [57] to identify mutational signatures in CRC samples. The evaluation plot of deciphering 3-mer mutational signatures become optimized for seven signatures.