A comparison of language representation models on small text corpora of scientific and technical documents

Michael T. Gorczyca; Tavish M. McDonald; Thadeous A. Goodwyn; Peter F. David

doi:10.1117/12.2557891

21 April 2020 A comparison of language representation models on small text corpora of scientific and technical documents

Michael T. Gorczyca, Tavish M. McDonald, Thadeous A. Goodwyn, Peter F. David

Author Affiliations +

Proceedings Volume 11413, Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications II; 114131T (2020) https://doi.org/10.1117/12.2557891
Event: SPIE Defense + Commercial Sensing, 2020, Online Only

Abstract

Text mining for the identification of emerging technology is becoming increasingly important as the number of scientific and technical documents grows. However, algorithms for developing text mining models require a large amount of training data, which carries heavy costs associated with data annotation and model development. The need for avoiding these associated costs has in part motivated recent work in text mining, which indicate value in leveraging language representation models (LRMs) on domain-specific text corpora for domain-specific tasks. However, these results are demonstrated predominantly on large text corpora, which do not address concerns associated with the ability of LRMs to transfer to domains where training data may be scarce. Due to this, we benchmarked the performance of LRMs on identifying quantities and units of measure from text when the number of training samples is small.

Conference Presentation

Citation Download Citation

Michael T. Gorczyca, Tavish M. McDonald, Thadeous A. Goodwyn, and Peter F. David "A comparison of language representation models on small text corpora of scientific and technical documents", Proc. SPIE 11413, Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications II, 114131T (21 April 2020); https://doi.org/10.1117/12.2557891

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available