Lightweight neural network models are commonly designed in real-time scenarios to meet the requirement of fast processing. During the deployment of the inference flow, lightweight models are frequently built by existing frameworks such as TensorFlow and OpenVINO. Since these frameworks are heavy and always need to create a call stack from the program entry to model execution, causing a lot of time consumption. The inference speed cannot be effectively improved especially with high latency requirement. To address the problem, we propose a novel lightweight model deployment pipeline to promote efficient inference on hardware. Our method optimizes primitives of executable operations to take full advantages of the hardware. The executable graph is finally created, significantly reducing the time of stack calling. Experimental results demonstrate that our method is superior to TensorFlow and OpenVINO in the aspect of the inference speed, and can be applied for the construction of lightweight models.