One of the important tasks of auditory attention decoding is to identify the attended speaker’s direction from the listener’s EEG signals. Compared to rule-based methods, deep neural networks (DNNs) have recently shown significantly better identification accuracy, especially with short decision windows. However, existing DNN-based solutions rely on the direct application of the convolution neural networks and the attention mechanism, and the spatial information of the EEG electrodes is not fully exploited. In this paper, we propose a learnable spatial mapping (LSM) mechanism to transform EEG channels into a 2D form, which can be combined with the spatial attention mechanism to better extract the inherent coherence among the electrodes. Apart from validating the benefit of the proposed method in the public KUL dataset, we also test its performance in our newly collected NJU dataset, which is more challenging with more alternatives of directions of competing speakers.