Training deep learning (DL) models consumes a huge amount of time and energy in cloud servers and edge devices, requiring energy- efficient processors [1 –5] to meet the rapid-growing demand for AI. Training processors either utilize a high-precision floating-point (FP) format to provide robust training results, or a low-precision format to increase efficiency but fail in accuracy. Mixed precision training (MPT) is promising to achieve both high accuracy and high efficiency. Manual mixed precision [5] is usually a coarse-grained mapping (per layer), which limits training accuracy. Automatic precision search [6] provides accurate and fine-grained precision mapping, but the high search latency slowdown the overall training process.