In recent years, the increasing demand for video summarization has led to a lot of research on video summarization, but there is still room for improvement in its accuracy. This paper proposes a video summarization that uses semantic segmentation model to improve its accuracy. This method employs semantic segmentation models such as U-Net and SegNet to estimate frame importance based on the similarity between video summarization and semantic segmentation. We train on the video summarization datasets, TvSum and SumMe, and compare the proposed method with the conventional methods, FCSN and vsLSTM. We compare the F-values as quantitative evaluation and the output videos as qualitative evaluation. The F-score shows that the proposed model is superior to existing methods by about 10%. Also, by comparing the generated moving images, the proposed model produces more qualitative output.