Traditionally, most parallel graph algorithms have been implemented based on the BSP paradigm and using MPI two-sided communication as the underlying communication method. However, for certain graph algorithms characterized by irregular communication patterns and complex operations on remote processes, the BSP model cannot effectively exploit the algorithm's inherent asynchrony. In contrast, the model based on active messages is better suited for implementing flexible, scalable, and highly concurrent graph algorithms. In this study, we optimize graph matching algorithm that are widely applied in fields such as sparse linear algebra, community detection, and graph partitioning using the active message paradigm. Additionally, we have developed the portable active message framework RMA-AM, built on MPI RMA operations. What sets our framework apart from other active message frameworks is that it is based on a process-based asynchronous progress model, addressing the multithread overhead issue typical of traditional thread-based approaches. The evaluation results on Tianhe-2 demonstrate that our proposed optimization achieves a speedup of up to 5.1x compared to traditional BSP-style implementation based on MPI two-sided communication, and a 3x speedup compared to traditional BSP-style implementation based on MPI one-sided communication.