eArticles

Home

eArticles

검색결과 돌아가기

검색화면

Export 프린트

TAVM: A Novel Video Summarization Model Based on Text, Audio and Video Frames

Resource Type: Conference
Authors: Shambharkar, Prashant Giridhar; Goel, Ruchi
Source: 2023 3rd International Conference on Technological Advancements in Computational Sciences (ICTACS) Technological Advancements in Computational Sciences (ICTACS), 2023 3rd International Conference on. :878-882 Nov, 2023
Subject: Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineering Profession
General Topics for Engineers
Photonics and Electrooptics
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Analytical models
Computational modeling
Decision making
Streaming media
Benchmark testing
Transformers
Task analysis
Video Summarization
Multimedia Analysis
Key-frames
BEiT(BERT Pre-Training of Image Transformers) vision transformer
Language

Online Access

Full Text (IEEE)

초록

In today's digital world, the task of video summarization has gained immense importance within the realm of multimedia analysis. This relevance is largely driven by the exponential expansion in multimedia content consumption, encompassing audio, video, and images, which is readily available on-demand through various digital platforms. Automatic video summary is the process of creating a brief synopsis that summarizes the video by displaying its most useful and relevant elements, so consumers may rapidly comprehend the primary concept of a video without having to watch the entire material. Currently, the selection of the video segments to be included in the final summary is done in a variety of ways. The task is to analyze the video's multimedia data in search of relevant clues that will aid in decision-making. The proposed method TAVM (text, audio, and video mode) in this paper will provide the video summary using different multimedia elements of text, audio, and frames. The proposed TAVM method can be separated into three parts. The process begins with Video Processing, where the BEiT vision transformer is employed to recognize objects within the chosen frame. Following that, Audio Processing comes into play, which uses speech-to-text converters to transcribe the audio content. Finally, in the last step, the Summary Builder utilizes the GPT-3-based OpenAI API to generate a summary of the content. The experimental analysis on the benchmark dataset SumMe demonstrates the effectiveness of the proposed approach.

공지

DAU Library

eArticles

요약정보

TAVM: A Novel Video Summarization Model Based on Text, Audio and Video Frames

Online Access

초록