Human-machine collaboration to disambiguate entities in unstructured text datasets

Jack H. Davenport

doi:10.1117/12.2304929

27 April 2018 Human-machine collaboration to disambiguate entities in unstructured text datasets

Jack H. Davenport

Proceedings Volume 10653, Next-Generation Analyst VI; 106530M (2018) https://doi.org/10.1117/12.2304929
Event: SPIE Defense + Security, 2018, Orlando, FL, United States

Abstract

Creating network graphs is a manual, time consuming process for an intelligence analyst. Beyond the traditional big data problem, individuals are often referred to by shifting titles and multiple names as they advance in their organizations over time; this reality makes simple string or phonetic comparison methods to search for entities insufficient. Conversely, automated methods for relationship extraction and entity disambiguation typically produce questionable results as ground truth with no way for users to vet results, correct mistakes or influence the algorithm’s future results. We present an Entity Disambiguation tool, DAC Resolution and DISambiguation (DRADIS), which aims to bridge this gap between human-centric and machine-centric methods. DRADIS automatically extracts entities from multi-source datasets and models them as a complex set of attributes and relationships. Entities are disambiguated across the corpus using a hierarchical model executed in Spark allowing it to scale to operational data volumes. Resolution results are presented to the analyst complete with sourcing information for each mention and relationship allowing analysts to quickly vet the correctness of results as well as correct resolution mistakes by splitting and merging clusters. Vetted results are used by the system to refine the underlying model for future runs allowing analysts to course correct the general model to better deal with their operational data. Providing analysts with the ability to validate and correct the model to produce a system they can trust enables them to better focus their time on producing higher quality analysis products.

Conference Presentation

Citation Download Citation

Jack H. Davenport "Human-machine collaboration to disambiguate entities in unstructured text datasets", Proc. SPIE 10653, Next-Generation Analyst VI, 106530M (27 April 2018); https://doi.org/10.1117/12.2304929

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available