With the development of Generative AI technologies, video style transfer has become a popular extra challenge of style transfer. Compared to traditional images style transfer tasks, video tasks bring new challenges in maintaining temporal consistency and desirable stylized results. To solve the problem, we proposed a two-stage network combining Style Attention and Contrast learning(SACNet). Specifically, in the first stage, we proposed an adaptive filter incorporating attention mechanism to deal with texture artifacts and edge artifacts. The filter adjusts the features both inner channels and inter channels. In the second stage, we proposed a Temporal-consistency Contrastive Loss Module(TCLM) to solve the problems of stability and flicker artifacts for stylized videos. The TCLM is defined by single-frame local patches with looser constraints. In addition, color loss function was introduced to adjust the stylized color saturation. In contrast to the current SOTA models for video style transfer, SAC Net achieved an average LPIPS by 0.152 and a temporal-consistency loss by 0.034 on the MPI -Sintel dataset, it respectively achieved reductions by 5 % and 20 % on average. It was verified that SACN et could perform well in various style transfer tasks for Images and videos.