Automatic authoring of MTV-style home video using music tempo and visual tempo analysis is investigated in this research. The music tempo is extracted using the onset analysis. The frame-level visual tempo is detected based on the motion degree between consecutive frames. The object-level visual tempo is performed based on the tension analysis of facial expression. Finally, the authoring methodology is presented, which consists of music and visual tempo matching to product MTV-Style video. Experiment results using baby home video are given to demonstrate the performance of the proposed algorithm.
Intelligent video pre-processing and authoring techniques that facilitate people to create MTV-style music video clips are investigated in this research. First, we present an automatic approach to detect and remove bad shots often occurring in home video, such as video with poor lighting or motion blur. Then, we consider the generation of MTV-style video clips by performing video and music tempo analysis and seeking an effective way in matching these two tempos. Experiment results are given to demonstrate the feasibility and efficiency of the proposed techniques for home video editing.
Intelligent video pre-processing and authoring techniques that facilitate people to create MTV-style music video clips are investigated in this research. First, we present an automatic approach to detect and remove bad shots often occurring in home video, such as video with poor lighting or motion blur. Then, we consider the generation of MTV-style video clips by performing video and music tempo analysis and seeking an effective way in matching these two tempos. Experiment results are given to demonstrate the feasibility and efficiency of the proposed techniques for home video editing.
A skimming system for movie content exploration is proposed using story units extracted via general tempo analysis of audio and visual data. Quite a few schemes have been proposed to segment video data into shots with low-level features, yet the grouping of shots into meaningful units, called story units here, is important and challenging. In this work, we detect similar shots using key frames and include these similar shots as a node in the scene transition graph. Then, an importance measure is calculated based on the total length of each node. Finally, we select sinks and shots according to this measure. Based on these semantic shots, a meaningful skims can be successfully generated. Simulation results will be presented to show that the proposed video skimming scheme can preserve the essential and significant content of the original video data.
Story units are extracted by general tempo analysis including tempos analysis including tempos of audio and visual information in this research. Although many schemes have been proposed to successfully segment video data into shots using basic low-level features, how to group shots into meaningful units called story units is still a challenging problem. By focusing on a certain type of video such as sport or news, we can explore models with the specific application domain knowledge. For movie contents, many heuristic rules based on audiovisual clues have been proposed with limited success. We propose a method to extract story units using general tempo analysis. Experimental results are given to demonstrate the feasibility and efficiency of the proposed technique.
KEYWORDS: Video, Visualization, Information visualization, Image segmentation, Feature extraction, Data processing, Sensors, Cameras, System integration, Visual information processing
A robust TV commercial detection system is proposed in this research. Even though several methods were investigated to address the TV commercial detection problem and interesting results were obtained before, most previous work focuses on features within a short temporal window. These methods are suitable for on-line detection, but often result in higher false alarm rates as a trade-off. To reduce the false alarm rate, we explore audiovisual features in a larger temporal window. Specifically, we group shots into scenes using audio data processing, and then obtain features that are related to commercial characteristics from scenes. Experimental results are given to demonstrate the effectiveness of the proposed system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.