We studied the real-time implementation of continuous speech recognition algorithm on a mobile CPU, Intel PXA270, platform. Especially, the optimization of fast likelihood computation, which takes the largest part in most speech recognizers, is conducted by employing SIMD (Single Instruction Multiple Data) programming and software pipelining. The overhead of exhaustive memory accesses is also minimized by placing frequently used acoustic model data at the fast internal SRAM. The number of execution cycles for the fast likelihood computation of the 1000-word vocabulary Resource Management (RM) task has been reduced by 56.42%. The resulting performance shows approximately four times faster processing speed than the real-time implementation requirement on a 520MHz Intel XScale-based system.