Video anomaly detection aims to identify the anomalies that deviate from normal behaviors, which is an essential but challenging task. Existing deep learning methods mainly learn normality on normal data by autoencoder and expect to identify anomalies by comparing the errors of reconstruction or prediction. Due to the powerful generalization ability of deep autoencoder, some abnormal samples can still be reconstructed well. Moreover, the previous methods cannot fully utilize appearance and motion information, and ignore the spatial-temporal consistency. To address these problems, we propose an Appearance-Motion United Memory Autoencoder (AMUM-AE) framework. The proposed method adopts a two-stream network to dissociate appearance and motion features, and utilizes the prediction method in each branch. To better learn various normal patterns, a united memory module is introduced to bridge the relationship between appearance and motion information. We also utilize the RGB difference method instead of the optical flow method to reduce the computation time. The extensive experimental results on two benchmark datasets demonstrate the effectiveness of the AMUM-AE framework. Our method outperforms the state-of-the-art methods with AUC of 96.6% and 86.2% on the UCSD Ped 2 and Avenue datasets, respectively.