Modern graphics processing units work efficiently in the RF Pulse Design for MR image reconstruction, which could make the reconstruction of images quicker with higher signal to noise ratio (SNR) and higher resolution. This paper introduces using GPU techniques, such as multithreads, sharedMem, coalesce, and constMem to optimize the performance of the conjugate gradient least square algorithm for reconstructing MR images. Compared to a full CPU implementation, a version utilizing the conjugate gradient least square algorithm on the NVIDIA Tesla C 1060 is significantly faster. The total image reconstruction time, including all CPU to GPU setup and overhead, also improves dramatically.