KEYWORDS: Visual process modeling, Visualization, Image processing, Data modeling, Information visualization, Image understanding, Digital image processing
The significant progress of vision-language pre-trained models (VLMs) and large language models (LLMs) have provided a feasible new mode for image captioning, which relies on VLMs to process images and then utilizes LLMs to generate captions, simplifying the caption generation process and making it lighter. Based on this mode, to address the issues of deviation between the generated captions and the content of image expressions, incomplete descriptive information, and resource hungry in previous Chinese image captioning models, we propose a Chinese image captioning model via double decoding based on Visual Prompts, DDVP. This model employs the CLIP model as the encoder, the GPT2 model as the decoder, and introduces Visual Prompts, which are keywords related to image content. The model adopts a double decoding approach, first decoding to generate Visual Prompts and then decoding to generate the final captions based on the Visual Prompts. Through evaluation, we have demonstrated that our model achieves competitive results on the AIC-ICC dataset, and while maintaining fluency, the generated captions of DDVP can also cover the information in the image more comprehensively and accurately.
The recurrent neural network model based on attention mechanism has achieved good results in the text summarization generation task, but such models have problems such as insufficient parallelism and exposure bias. In order to solve the above problems, this paper proposes a two-stage Chinese text summarization generation method based on Transformer and temporal convolutional network. The first stage uses a summary generation model that fuses Transformer and a temporal convolutional network, and generates multiple candidate summaries through beam search at the decoding end. In the second stage, contrastive learning is introduced, and the candidate summaries are sorted and scored using the Roberta model to select the final summary. Through experiments on the Chinese short text summarization dataset LCSTS, ROUGE was used as the evaluation method to verify the effectiveness of the proposed method on Chinese text summarization.
As one of the extremely important components on the transmission tower, the insulator has two functions of electrical insulation and wire fixing, which directly affects the operation of the power system. Defects in insulators can impair the service life of transmission lines. UAV aerial photography of electric power towers has problems such as small number of defective insulator samples, small area, large aspect ratio of insulator strings, and variable inclination angle, coupled with the influence of environmental factors such as light, interference, distance, etc., which lead to low detection accuracy of insulator defects. Aiming at the above problems, an improved YOLOv5 insulator defect detection algorithm is proposed. First, screen the aerial images and use data augmentation to obtain a sufficient number of defective insulator images to enrich the dataset and avoid model overfitting. Secondly, the convolutional attention module CBAM is introduced to improve the expression ability of defect insulator features and strengthen the network's ability to identify targets. Finally, the Leaky ReLU activation function of the hidden layer of the original YOLOv5 algorithm is replaced by the Mish function to improve the generalization ability of the network. The experimental results show that compared with the original YOLOv5 algorithm, the average precision mAP (IOU=0.5) of the improved algorithm is increased by 7.8%, which effectively improves the problems of false detection and missed detection in the original algorithm. Compared with other mainstream object detection algorithms, the algorithm proposed in this paper has better detection effect on insulator defects.
Sequence-to-sequence models provide a feasible new approach for generative text summarization, but these models are not able to accurately reproduce factual details and subject information. To address the problem of unconstrained and uncontrollable content generation of generative text summarization models, this paper proposes a generative summarization method KGIT that uses Transformer as a skeleton and incorporates both BERT pre-training model and keyword information. The model uses a comprehensive keyword extraction algorithm, uses two results extracted by LSTM and TextRank as vocabularies respectively, and uses pointers keywords are selected and the extracted keywords are used as the guiding information to generate the summary based on the guiding information. KGIT model can associate the source text and keywords to avoid generating a summary of irrelevant topics. The ROUGE value is used as the evaluation criterion for text summaries, and the summaries generated by the KGIT model can contain more key information and are more accurate and readable when compared with the mainstream summary generation models on the NLPCC2017 Chinese news summary dataset.
Poetry and couplets, as a valuable part of human cultural heritage, carry traditional Chinese culture. Auto-generation couplet and poetry writing are challenges for NLP. This paper proposed a new multi-task neural network model for the automatic generation of poetry and couplets. The model used seq2seq encoding and decoding structure, which combined attention mechanism, self-attention mechanism and multi-task learning parameter sharing. The encoding part used two BiLSTM networks to learn the similar characteristics of ancient poems and couplets, one for encoding keywords and the other for encoding generated poems or couplet sentences. The decoding parameters were not shared. It consisted of two LSTM networks which decode the output of ancient poems and couplets, respectively, in order to preserve the different semantic and grammatical features of ancient poems and couplets. Poetry and couplets have many similar characteristics, and multi-task learning can learn more features through related tasks, making the model more generalized. Therefore, we used multi-task model to generate poems and couplets, which is significantly better than single-task model. Also our model introduced a self-attention mechanism to learn the dependency and internal structure of words in sentences. Finally, the effectiveness of the method was verified by automatic and manual evaluations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.