An embedded 2-port (dual-port) SRAM is one of the major challenges to achieve maximum frequency (f MAX ) operation and memory cell density for high-performance computing (HPC) applications: such as massively parallel-processing, imaging, video graphics, and deep-learning AI processors. In addition to high-speed operation, area scaling is also important, because embedded SRAM capacity tends to increase with technology nodes, occupying a huge portion in recent SoC. Timing-sliced double-pumped 2-port SRAMs, using a typical single-port (SP) 6T bit cell, have been proposed [1–4]; thereby, enabling consecutive read and write operations. Though it is limited for both ports to synchronize to one clock phase and there is a certain cycle time penalty compared to other 8T 2-port SRAMs, there is a compact area benefit and no read access time degradation. A double-pumped 6T SRAM with a folded-BL multi-bank architecture is a better solution as shown in Fig. 15.3.1 for HPC systems from a performance and memory-density point of view, rather than other types of SRAM architecture [3–6].