In mobile speech communication, the speech quality can be severely degraded when the mobile device users are in a noisy acoustic environment. To suppress environmental noises, deep learning based monaural speech separation methods have achieved remarkable progress on boosting the performance of the separation accuracy. However, the latency and computational cost of these methods remain far insufficient for mobile devices. Performance and power constraints make it still challenging to deploy such methods on mobile devices due to their high computational complexity. In this paper, we present VoiceBit, an efficient and light-weight human voice separation framework for real-time speech sep-aration on mobile devices. Specifically, we propose a light-weight speech separation network with reduced computation complexity and memory footprint for minimal compromise in accuracy, to segregate human voice and interfering noises directly from time-domain signals. Furthermore, we present a set of parallel optimizations to accelerate the operations in VoiceBit. Our experiment results show that VoiceBit achieves significant speedup and energy efficiency compared with state-of-the-art frameworks.