Paper
2 May 2023 Chinese spelling error correction by multi-task learning with pronunciation gap predictor
Hao Pan, Junmin Wu
Author Affiliations +
Proceedings Volume 12642, Second International Conference on Electronic Information Engineering, Big Data, and Computer Technology (EIBDCT 2023); 126421J (2023) https://doi.org/10.1117/12.2674737
Event: Second International Conference on Electronic Information Engineering, Big Data and Computer Technology (EIBDCT 2023), 2023, Xishuangbanna, China
Abstract
Chinese Spell Check (CSC) aims to detect and correct spelling errors in Chinese text, almost all of which are related to phonetic or visual similarity. Large-scale pre-trained models (PLMs) are currently making substantial progress on the CSC task. However, when correcting errors, PLMs tend to select those words that are semantically sound or expressively fluent, sometimes ignoring pronunciation similarities. Meanwhile, the models lack knowledge of pronunciation differences. To address this problem, we propose a multi-task learning model to help enhance the CSC task. The auxiliary task is to estimate the degree of pronunciation gap between the original input and the corresponding correct text from the granularity of each word. Specifically, we use the edit distance of Pinyin to measure the degree of pronunciation discrepancy. The edit distance scheme we use is modified, due to the specificity of the Pinyin structure. Experiments on a open available benchmark dataset demonstrate the effectiveness of our strategy.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Hao Pan and Junmin Wu "Chinese spelling error correction by multi-task learning with pronunciation gap predictor", Proc. SPIE 12642, Second International Conference on Electronic Information Engineering, Big Data, and Computer Technology (EIBDCT 2023), 126421J (2 May 2023); https://doi.org/10.1117/12.2674737
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Machine learning

Education and training

Ablation

Semantics

Back to Top