Collision-free navigation is an important research direction for multi-robot systems, in which the two core problems are navigating to the target point and avoiding other robots. Many researchers use deep reinforcement learning as navigation strategy to realize multi-robot collision avoidance navigation. However, most of them use raw sensor information or global state information of the agent as the neural network input, which is not conducive to extending the navigation strategy to a larger space. This paper proposes an improved deep reinforcement learning navigation strategy, which enables robots to learn navigation and collision avoidance strategies more accurately. This strategy converts the interactive environment state from the global coordinate representation to the relative vector representation, and attenuates the influence of the rear irrelevant agents on the collision avoidance strategy. Experimental results show that the proposed method outperforms existing learning-based methods in three indicators: success rate, additional time to reach the target, and model convergence speed.
Emotion is an essential aspect of human life, and effectively identifying corresponding emotions from different scenarios will help promote the development of human-computer interaction systems. Therefore, emotion classification has gradually become a challenging and popular research field. Compared with text emotion analysis, emotion analysis of audio data is still relatively immature. Traditional audio sentiment analysis research is based on feature information such as MFCC, MFSC, etc. while using time-memory models such as LSTM and RNN for emotion analysis. Due to the rapid development of transformers and attention mechanisms, many scholars have shifted their research from the RNN family to the transformer family or deep learning models with attention mechanisms. Therefore, this paper proposes a method to convert audio data into a spectrogram and use a vision transformer model based on transfer learning for emotion classification. This paper conducts experiments on the IEMOCAP dataset and the MELD dataset. The experimental results show that the emotion classification accuracy of the Vision transformer in the IEMOCAP and the MELD datasets reach 56.18% and 37.1%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.