Paper
29 April 2022 Chinese-Vietnamese cross-language topic discovery method based on generative adversarial networks
LinJie Xia, Zhengtao Yu, Shengxiang Gao
Author Affiliations +
Proceedings Volume 12247, International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2022); 122471J (2022) https://doi.org/10.1117/12.2636826
Event: 2022 International Conference on Image, Signal Processing, and Pattern Recognition, 2022, Guilin, China
Abstract
The cross-language news topic discovery task aims to cluster news texts in different languages that describe the same topic and classify the topic in the form of keywords. At present, most cross-language topic discovery methods are based on machine translation or external resources like bilingual dictionaries and parallel sentences to solve cross-language problems. However, Vietnamese is a low resource language and it is difficult and expensive to manually annotate ChineseVietnamese bilingual aligned corpora. To solve this problem, this paper proposes a Chinese-Vietnamese cross-language topic discovery method based on generative adversarial networks (GAN). Firstly, News texts are represented as vectors by BERT, and then the bilingual vectors are mapped to the same semantic space by GAN. Finally, k-means clustering algorithm is used to cluster the representation vectors and extract the topics. Experiments on the Chinese-Vietnamese bilingual news topic discovery corpus show that the proposed method is superior to the baseline.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
LinJie Xia, Zhengtao Yu, and Shengxiang Gao "Chinese-Vietnamese cross-language topic discovery method based on generative adversarial networks", Proc. SPIE 12247, International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2022), 122471J (29 April 2022); https://doi.org/10.1117/12.2636826
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Gallium nitride

Performance modeling

Computer programming

Model-based design

Associative arrays

Distributed interactive simulations

Roads

Back to Top