Spiking neural networks (SNNs) are considered to be biologically plausible and can yield high energy efficiency when implemented on neuromorphic hardware due to their highly sparse asynchronous binary event-driven nature. Recently, surrogate gradient (SG) approaches have enabled SNNs to be trained from scratch with backpropagation (BP) algorithms under a deep learning framework. However, a popular SG approach known as straight-through estimator (STE), which only propagates the same gradient information, does not take into account the activation differences between the membrane potentials and output spikes. To address this issue, we propose surrogate gradient scaling (SGS), which scales up or down the gradient information of the membrane potential according to the sign of the gradient of the spiking neuron output and the difference between the membrane potential and the output of the spiking neuron. This SGS approach can also be applied to unimodal functions that propagate different gradient information from the output spikes to the input membrane potential. In addition, SNNs trained directly from scratch suffer from poor generalization performance, and we introduce Lipschitz regularization (LR), which is incorporated into the loss function. It not only improves the generalization performance of SNNs but also makes them more robust to noise. Extensive experimental results on several popular benchmark datasets (CIFAR10, CIFAR100 and CIFAR10-DVS) show that our approach not only outperforms the SOTA but also has lower inference latency. Remarkably, our SNNs can lead to 34×××, 29×××, and 17××× computation energy savings compared to standard Artificial neural networks (ANNs) on above three datasets.