Always-on, voice-activated tinyML systems, like those implementing keyword spotting (KWS), demand low power consumption and a small footprint. In certain instances, sub-V energy-harvesting sources restrict the available supply voltage to below 0.5V [1]. Most KWS designs focus on optimizing the audio feature extraction (FEx) unit, which dominates the overall power and area. Analog FEx using multi-channel Gm-C bandpass filters (BPFs) and analog rectifiers [2], [3] can be as much as 10× more power efficient than digital FEx for a comparable silicon area [4]. However, analog FEx circuits have not demonstrated KWS with more than four keywords. They also suffer from a large footprint, challenging technology migration and limited dynamic range (DR) at low supply voltage, while speech signals have inherently a high DR. These limitations ultimately lead to the use of time domain (TD) [5], [6], or partial TD [7] alternatives. In [5], a 0.5V solar-powered TD-FEx employs a voltage-to-time converter (VTC) followed by ring oscillator (RO)-based BPFs to achieve 86% classification accuracy on 10 keywords, but it consumes an order of magnitude more power (9.3μW) than the existing state of the art [2], [3]. In [7], Gm-C BPFs followed by VTCs are used to enable time domain rectification at 0.4V, but the analog part operates with at least 0.6V supply. A solution operating at 0.4V is presented in [6]. It uses injection-locked oscillators (ILOs)-based bandpass filters to process the signal directly in the phase domain, but KWS was not demonstrated due to the limited filter selectivity. To achieve high speech recognition accuracy, a quality factor (Q) of at least 4.05 is required for 16 log-spaced channels (125Hz-5kHz with -3dB crossover), which is not achieved in [2]–[6] (Q