In recent years, always-on chips have become prevalent in keyword spotting and face recognition [1]. These chips require minimal power consumption, enabling them to operate over long periods in battery-limited conditions. To function effectively, always-on chips must perform various signal processing tasks, such as MFCC feature extraction and image filtering [2]. Traditional designs commonly employ specialized ASIC circuits and DSPs to process signals, as illustrated in Fig. 1. ASICs prioritize low power consumption over versatility, which limits their practicality in real-world scenarios [3]. DSP-based solutions can be further categorized into general-purpose scalar DSPs and parallel SIMD DSPs. Scalar DSPs, characterized by substantial energy consumption on instruction control, tend to be relatively inefficient [4], [5]. For the parallel SIMD DSPs, their hundred-milliwatt power consumption fundamentally violates the requirement of an always-on system, though they can provide superior computing performance [6]. Consequently, no existing hardware solution has achieved the optimal combination of versatility, low power consumption, and high energy efficiency.