3D mantle convection simulations are important for understanding Earth's dynamics, assessing geological hazards and exploring and exploiting Earth's natural resources. However, these simulations are typically characterized by intensive computation and memory access, requiring efficient utilization of parallel hardware systems. This paper shares our experiences in optimizing CitcomCU, an open-source parallel finite element software for simulating 3D mantle convection with thermochemical convection capabilities, on multicore CPUs. We designed an new data storage layout for CitcomCU and proposed an optimized symmetric Gauss-Seidel (SYMGS) algorithm, and combined SYMGS and sparse matrix-vector multiplication (SpMV) through kernel fusion to reduce the overhead of computation and memory access. Additionally, we employ the block multicoloring (BMC) method to parallelize our optimized SYMGS algorithm. Evaluation performed on three architectures, including two ARMv8 and one x86 systems, demonstrates significant performance improvements achieved by our techniques. Our approach delivers a speedup of 2.97-6.99x for SYMGS and 2.48-5.66x for CitcomCU across these three platforms.