Most of current best performing Acoustic Scene Classification (ASC) systems utilize Mel scale spectrograms with Convolutional Neural Networks (CNNs). Mel scale is a common way to suit frequency warping of human ears, with strict decreasing frequency resolution on low to high frequency range. However, we find that significant frequency bins are located at mid to high frequency range for some acoustic scenes, such as travelling by bus, tram or train. In this paper, we show that a better frequency warping scale for ASC can be automatically learned from raw spectrograms, using Kullback-Leibler (KL) divergence scale. Our KL scale spectrograms with CNN method is evaluated on two public ASC datasets. The results show that we outperform the Mel scale method on both datasets. In addition, we also employ a Conditional Generative Adversarial Nets (Conditional-GAN) model for data augmentation, to prevent overfitting problem and allow further improvements on ASC.