Presentation + Paper
21 April 2020 A comparison of language representation models on small text corpora of scientific and technical documents
Michael T. Gorczyca, Tavish M. McDonald, Thadeous A. Goodwyn, Peter F. David
Author Affiliations +
Abstract
Text mining for the identification of emerging technology is becoming increasingly important as the number of scientific and technical documents grows. However, algorithms for developing text mining models require a large amount of training data, which carries heavy costs associated with data annotation and model development. The need for avoiding these associated costs has in part motivated recent work in text mining, which indicate value in leveraging language representation models (LRMs) on domain-specific text corpora for domain-specific tasks. However, these results are demonstrated predominantly on large text corpora, which do not address concerns associated with the ability of LRMs to transfer to domains where training data may be scarce. Due to this, we benchmarked the performance of LRMs on identifying quantities and units of measure from text when the number of training samples is small.
Conference Presentation
© (2020) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Michael T. Gorczyca, Tavish M. McDonald, Thadeous A. Goodwyn, and Peter F. David "A comparison of language representation models on small text corpora of scientific and technical documents", Proc. SPIE 11413, Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications II, 114131T (21 April 2020); https://doi.org/10.1117/12.2557891
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Machine learning

Analytics

Neural networks

Back to Top