As an effective solution for hiding memory access latency, data prefetching, including hardware prefetching and software prefetching, is widely used to alleviate "memory wall" problem. Current software prefetching typically prefetches data to L1 cache. However, this strategy suffers from issues like inaccurate timeliness and over prefetch. To address this issue, we propose CSPM, a coordinated software prefetching mechanism for multi-level caches. To further improve memory performance, CSPM inserts prefetch instructions to multi cache levels according to access pattern and cache utilization, instead of only inserting prefetch instructions to L1 Cache. In this way, CSPM allows coordinately prefetch data to different cache levels. We implement CSPM based on the software prefetching framework in the GCC compiler, and uses STREAM and SPECfp 2006 benchmark suites to evaluate the effectiveness of CSPM. Results show that, compared to only prefetching data to L1 cache, CSPM delivers an average speedup of 1.37x, 2.49x, and 1.04x for STREAM under single core, STREAM under multiple cores, and SPECfp2006, respectively.