Presentation + Paper
12 April 2021 Emergent reinforcement learning behaviors through novel testing conditions
Erin Zaroukian, Anjon Basak, Piyush K. Sharma, Rolando Fernandez, Derrik E. Asher
Author Affiliations +
Abstract
In adversarial multi-agent paradigms, it is often difficult to describe what is learned by reinforcement-learning agents and to measure the robustness of the learned policies. To achieve a better view, we explore the effect of manipulating agent capabilities and policies in a predator-prey pursuit task. In these experiments, we trained a single prey using multi-agent reinforcement learning with three slower predators, then tested the prey against three faster predators using fixed “interceptor” strategies (head to the closest possible intersection with the prey assuming the prey maintains its current velocity) instead of their learned policies. While the prey’s performance was impressive under these novel conditions, it varied widely. Initial locations and velocities (randomized during training and testing) were limited in explaining differences in prey’s performance across test conditions. Nevertheless, visual inspection indicates that more successful prey quickly begin a circling pattern, whereas less successful prey often become cornered and double back into predator collisions. To quantify this behavior, we computed windowed entropy measures of the prey’s angle relative to the arena origin to show when an agent transitioned in and out of this unsuccessful behavior. Ultimately, these transitions suggest that circling is triggered by coming close to the center of the arena. By varying agents’ capabilities and predator policies upon evaluation, we achieve a more comprehensive view of the prey’s learned policy, and we suggest that these windowed entropy measures, along with correlations between entropy and performance, result in a quantification of the learned policy.
Conference Presentation
© (2021) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Erin Zaroukian, Anjon Basak, Piyush K. Sharma, Rolando Fernandez, and Derrik E. Asher "Emergent reinforcement learning behaviors through novel testing conditions", Proc. SPIE 11746, Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications III, 117460T (12 April 2021); https://doi.org/10.1117/12.2585627
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Head

Optical inspection

Back to Top