We present a baseline MPEG-4 AVC (advanced video coding) decoder based on an optimized platform-based design methodology. With this methodology, we jointly optimize the software and hardware design of the decoder. Overall decoding throughput is increased by synchronizing the software and the dedicated co-processors. The synchronization is achieved at macroblock-level pipelining. In addition, we optimize the decoder software by enhancing the frame buffer management, boundary padding, and content aware inverse transform. To speed up motion compensation and inverse transform, which are the most computationally intensive modules, two dedicated acceleration modules are realized. For comparison, the proposed prototype decoder and MPEG-4 AVC reference decoder are evaluated on an ARM platform, which is one of most popular portable devices. Our experiments show that the throughput of the MPEG-4 reference decoder can be improved by 6 to 7 times. On an ARM966 board, the optimized software without hardware acceleration can achieve a decoding rate up to 5 frames per second (fps) for QCIF video sequences. With the dedicated accelerators, the overall throughput is increased by about 30% to reach 6.6 fps on the average and is up to 10.3 fps for slow motion video sequences.