Effective grasp detection method based on Swin transformer

Jing Zhang; Yulin Tang; Yusong Luo; Yukun Du; Mingju Chen

doi:10.1117/1.JEI.33.3.033008

8 May 2024 Effective grasp detection method based on Swin transformer

Jing Zhang, Yulin Tang, Yusong Luo, Yukun Du, Mingju Chen

Author Affiliations +

Journal of Electronic Imaging, Vol. 33, Issue 3, 033008 (May 2024). https://doi.org/10.1117/1.JEI.33.3.033008

Abstract

Grasp detection within unstructured environments encounters challenges that lead to a reduced success rate in grasping attempts, attributable to factors including object uncertainty, random positions, and differences in perspective. This work proposes a grasp detection algorithm framework, Swin-transNet, which adopts a hypothesis treating graspable objects as a generalized category and distinguishing between graspable and non-graspable objects. The utilization of the Swin transformer module in this framework augments the feature extraction process, enabling the capture of global relationships within images. Subsequently, the integration of a decoupled head with attention mechanisms further refines the channel and spatial representation of features. This strategic combination markedly improves the system’s adaptability to uncertain object categories and random positions, culminating in the precise output of grasping information. Moreover, we elucidate their roles in grasping tasks. We evaluate the grasp detection framework using the Cornell grasp dataset, which is divided into image and object levels. The experiment indicated a detection accuracy of 98.1% and a detection speed of 52 ms. Swin-transNet shows robust generalization on the Jacquard dataset, attaining a detection accuracy of 95.2%. It demonstrates an 87.8% success rate in real-world grasping testing on a visual grasping system, confirming its effectiveness for robotic grasping tasks.

Citation Download Citation

Jing Zhang, Yulin Tang, Yusong Luo, Yukun Du, and Mingju Chen "Effective grasp detection method based on Swin transformer," Journal of Electronic Imaging 33(3), 033008 (8 May 2024). https://doi.org/10.1117/1.JEI.33.3.033008

Received: 14 November 2023; Accepted: 18 April 2024; Published: 8 May 2024

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $24.00

Non-members: $28.00 ADD TO CART

JOURNAL ARTICLE
22 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Transformers

Object detection

Education and training

Feature extraction

Detection and tracking algorithms

Data modeling

Windows

Show All Keywords

Keywords/Phrases

Search In:

Publication Years