Videos have a higher dimensionality compared with images, making adversarial video attacks more challenging. We propose a gradient-based method for self-adaptive white-box video keyframe selection and video adversarial example generation, taking advantage of that perturbations are transferable between video frames. More specifically, a gradient-based method is proposed to determine different video frames’ contribution to classification results. Based on the weights of different frames and the given boundary values, the proposed method adaptively selects a subset of frames as keyframes for perturbation. Experimental results of attacking two widely used video classification models on UCF-101 and HMDB-51 datasets show that the proposed method effectively improves the generation efficiency as well as the steganography of adversarial video examples, leading to a reduction of more than 21% of the required number of iterations and more than 25% of the average perturbation size for the untargeted attack.