The increasing complexity of convolutional neural networks (CNNs) has fueled a huge demand for compression. Nonetheless, network pruning, as the most effective knob, fails to deliver Pareto-optimal networks. To tackle this issue, we introduce a novel pruning-free compression framework dubbed Domino, pioneering to revisit the trade-off dilemma between accuracy and efficiency from a fresh perspective of linearity and non-linearity. Specifically, Domino leverages two predictors, including one vanilla latency predictor and one meta-accuracy predictor, to identify the less important non-linear building blocks, which are then grafted with the linear counterparts. And next, the grafted network is trained on target task to obtain decent accuracy, after which the grafted linear building block that contains multiple consecutive linear layers is reparameterized into one single linear layer to boost the efficiency on target hardware without degrading the accuracy on target task. Extensive experiments on two popular Nvidia Jetson embedded platforms (i.e., Xavier and Nano) and two representative networks (i.e., MobileNetV2 and ResNet50) clearly demonstrate the superiority of Domino. For example, Domino-Aggressive achieves +10.6%/+8.8% higher top-l/top-5 accuracy on ImageNet than ${\mathrm {MobileNetV}} 2 \times 0.2$, while bringing $\times 1.9/\times 1.3$ speedup on Xavier/Nano.