Continuous phase modulation (CPM) has been widely used in telecommunication and aeronautical telemetry system due to its high power and spectral efficiency. Non-coherent detection is a kind of feasible CPM detection method if there is large phase noise or residual carrier frequency offset. Near-optimal non-coherent sequence detection, which is based on Viterbi algorithm, has a performance close to that of maximum-likelihood sequence detection (MLSD). However, near-optimal non-coherent detection of CPM is very complicated to be implemented, and it is even more difficult to achieve a high speed due to feedback in the process of calculating branch metrics, add-compare-select (ACS), updating the beginning phase and phase reference symbol. In this paper, we first review near-optimal non-coherent detection of CPM, and then we present a high speed design of FPGA implementation for near-optimal non-coherent detection of a 64 state, 4-ary, length-3T raised cosine (3RC) CPM on Xilinx XC7VX690T device. In our scheme, the frequency pulse truncation is used to simplify the CPM detection. We propose a feedback processing in parallel for different states of CPM trellis to achieve a low latency. Furthermore, we develop a recursive implementation architecture for the process of updating the beginning phase. We achieve the CPM bit rate of 25Mbps for the on-chip processing clock of 100MHz. The results of simulation shows near-optimal non-coherent detection performs much better the coherent detection when residual carrier frequency offset or phase noise exists. It is found that the implementation can achieve a degradation in the Eb/N0 from computer simulation of as small as 0.3 dB for an average BER=10 -5 .