Autonomous driving is a complex function consisting of multiple parallel AI tasks running at the same time for information sensing, fusion, and decision making. To process the complex computing tasks, an autonomous vehicle is typically equipped with different processing units at the same time, such as CPU, GPU, FPGA, of the different computing capabilities. As the AI tasks have different requirements on the computing resources, a fundamental issue is how to optimally allocate the real-time computation tasks to different processing units (as known as device placement) on board towards maximum utility for autonomous driving. Towards the issue, this paper develops a reinforcement learning algorithm which is based on the proximal policy optimization (PPO) specifically for finding the optimal device placement for running a neural network model. A sequence-to-sequence model is proposed to allocate the operations of a neural network model on appropriate computing units in the vehicle. The execution time and energy consumption of the placement solution is used as a reward signal to further optimize the parameters. We implement our algorithm in different bench-marks, and compare it with different baselines. Experiments have demonstrated that our algorithm can find the optimal device placement position, and its performance is better than previous state-of-the-art RL algorithm as well as traditional methods.