We develop a hierarchical approach for controlling a team of aircraft in multi-agent adversarial environments. Each individual aircraft is equipped with a high-level agent that is solely responsible for target assignment decisions, and a low-level agent that generates actions based only on the selected target. We use distributed deep reinforcement learning to train the high-level agents, and neuroevolution to train the low-level agents. This approach leverages centralized training for decentralized execution to enable individual autonomy when communication is limited. Simulation results confirm the superiority of our proposed approach as compared to non-hierarchical multi-agent reinforcement learning methods.
Next-generation autonomous vehicles will require a level of team coordination that cannot be achieved using traditional Artificial Intelligence (AI) planning algorithms or Machine Learning (ML) algorithms alone. We present a method for controlling teams of military aircraft in air battle applications by using a novel combination of deep neuroevolution with an allocation-based task assignment algorithm. We describe the neuroevolution techniques that enable a deep neural network to evolve an effective policy, including a novel mutation operator that enhances the stability of the evolution process. We also compare this new method to policy gradient Reinforcement Learning (RL) techniques that we have utilized in previous work, and explain why neuroevolution presents several benefits in this particular application domain. The key analytical result is that neuroevolution makes it easier to select long sequences of actions following a consistent pattern, such as a continuous turning maneuver that occurs frequently in air engagements. We additionally describe multiple ways in which this neuroevolution approach can be integrated with allocation algorithms such as the Kuhn- Munkres Hungarian algorithm. We explain why gradient-free methods are particularly amenable to this hybrid approach and open up exciting new algorithmic possibilities. Since neuroevolution requires thousands of training episodes, we also describe an asynchronous parallelization scheme that yields order of magnitude speedup by evaluating multiple individuals from the evolving population simultaneously. Our deep neuroevolution approach out-performs human-programmed AI opponents with a win rate greater than 80% in multi-agent Beyond Visual Range air engagement simulations developed using AFSIM.†
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.