Real-time driver emotion recognition and timely risk warning can effectively reduce the incidence of traffic accidents. However, existing emotion recognition methods obtain emotion features from human physiological signals and are unsuitable for complex scenarios in the Internet of Vehicles (loV). Moreover, the existing methods in the IoV cannot fully use the resources of edge devices for mining the driver's personalities, resulting in limited accuracy. To address the problem, we propose a novel Edge-Cloud Collaborative Multimodal Emotion Recognition Framework (ECMER). The driver's facial expression and audio data are loaded to the edge for preliminary calculation, including coarse-grained facial expression recognition and driver's personality features extraction, which is uploaded to the cloud for cross-fusion. Specifically, a personality-coupled driver's emotion recognition method is proposed, and the Big Five Model is introduced from the psychological perspective. The facial expression features contained in images and audio features in videos are employed to calculate the driver's personality features, which are further fused with multimodal features. Subsequently, a hierarchical multi-granularity driver emotion recognition method is designed where the real-time coarsegran-ularity driver emotion recognition is conducted by edge devices to reduce the data transmission pressure and cloud computing load. The empirical results on real-world datasets demonstrate that the performance of driver emotion recognition under this architecture is improved.