Recently SRAM-based digital compute-in memory (D-CIM) [1] has demonstrated excellent energy/area efficiency, with full precision of 4b/8b integer multiply-accumulate operations, it has better programmability, hardware reuse and scalability, in addition, it can effectively leverage technology scaling for better PPA. Nonetheless, several new challenges remain, including huge peak currents resulting from high parallel operation, long delays in adder trees, and scalable architectures that support various neural network topologies. In this paper, we detail proposed solutions to address the new challenges and present measurement results for a SRAM-based 64x64 CIM manufactured by 12nm CMOS process.