In this paper, we consider massive multiple-input multiple-output (MIMO) communication systems with a uniform planar array (UPA) at the base station (BS) and investigate the downlink precoding with imperfect channel state information (CSI). By exploiting both instantaneous and statistical CSI, we aim to design precoding vectors to maximize the ergodic rate subject to a total transmit power constraint. By maximizing an upper bound of the ergodic rate instead, we leverage the corresponding Lagrangian formulation and identify the structural characteristics of the optimal precoder as the solution to a generalized eigenvalue problem. As such, the high-dimensional precoder design problem turns into a low-dimensional power control problem. The Lagrange multipliers play a crucial role in determining both precoder directions and power parameters, yet are challenging to be solved directly. To figure out the Lagrange multipliers, we develop a deep learning approach underpinned by a properly designed neural network that learns directly from CSI. With the offline pre-trained neural network, the online computational complexity of precoding is substantially reduced compared with the existing iterative algorithm while maintaining nearly the same performance.