Multi-modal data refers to the combination of different modalities of data that often arise in real-world applications, which generate a large amount of data. Semantic communication based on deep learning technology, has been developed to deliver information meaning rather than the raw data, which makes reducing the amount of data transmitted significantly possible. By exploring the correlation between multi-modal data, we propose a multi-modal semantic communication system, named MSC. Particularly, a shared and private (S&P) semantic representation model is proposed to replace the traditional independent semantic representation for every modality, where shared semantics depicts the common understanding conveyed by multiple modalities, and private semantics presents state details specific to each modality. Moreover, our method employs pre-trained GAN networks to generate shared semantic information across modal spaces. To acquire private semantic information that can be highly compressed, we leverage shared semantic information for residual coding alongside the source data. In this way, semantic redundancy could be reduced by fully utilizing the correlation between multi-modal data. Simulation results show that MSC considering semantic correlation can achieve superior performance compared to the independent transmission of each modality, especially under the low signal-to-noise (SNR) training, and the data transmission volume can be reduced while the reconstruction quality is comparable.