KEYWORDS: Video, Visualization, Feature extraction, Transformers, Video coding, Education and training, Visual process modeling, Information visualization, Data modeling, Video processing
Video description has become a research hotspot in recent years because of its wide application value. Single visual feature information can not accurately guide the generation of accurate video description, resulting in the mismatch between the generated description text and video content. To solve this problem, a video description text generation algorithm combining visual and voice features is proposed, which enhances the accuracy of the generated description text by combining visual and voice features. First, the vision transformer model is used to extract the visual feature vector, and the Mel-Frequency Cepstral Coefficients is used to extract the audio feature vector. After the two feature vectors are spliced, the average pooling process is performed to obtain the global feature information; Secondly, the processed feature information is sent to the transformer encoder for encoding. Finally, the encoded results are sent to the transformer decoder to finally generate the video description text. The transformer framework contains a multi head self-attention mechanism, which can focus on more important video feature information while acquiring temporal feature information, making the generated text description more accurate. The method proposed in this paper has been tested on the public data sets MSRVTT and MSVD and has achieved good results in four different evaluation standards.
Current ECG classification models focus on the performance of the classification and do not focus on the interpretability of the classification results. This paper proposes an interpretation method for ECG classification results based on deep learning. This method determines the key heartbeat and key ECG time by replacing the heartbeat with a normal heartbeat, setting the fixed-width ECG data segment to zero, and analyzing the changes in the classification result. The classification contribution value of the segment to the classification result, and the heartbeat and electrocardiogram time segment with a larger contribution value to the classification result become the key heartbeat and key time segment for the classification. The experimental results show that the etiological explanation established by this method is highly consistent with the doctor's explanation, which partially solves the interpretation problem of the ECG classification results based on deep learning.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.