The natural bijection between a proposed circuit design and its graph representation shall allow any graph optimization algorithm deploying into many-core systems efficiently. However, this process suffers from the exponentially growing overhead and heavy memory footprint with the signal propagation. To conquer the unique challenge, we systematically study the simulation with millions of gates, and identify that the processing complexity could grow exponentially from the signal inputs, the skewness of the computational graph stays. Thus, we present ZhouBi, a fast and scalable gate-level simulation framework to fully exploit the parallelism from many-core systems. ZhouBi contributes in threefolds, (I) a graph representation that colors gate-level netlists and identifies skew partitions based on the graph skewness; (II) A set of heuristic algorithms that picks opportunistic and conservative algorithms to accelerate the simulation; (III) A system facility that supports selective mapping between simulation and many-core, providing a tradeoff between the risk of concurrent simulation fail and performance gain. We have prototyped ZhouBi and evaluated it with practical baselines. ZhouBi can achieve a 27.6× performance gain, as compared to the state-of-the-practice Veriwell without compromising any correctness. Our framework supports large graphs enabling scale-out gate-level simulations for chip design.