Global contextual features are essential in computer vision tasks. Traditional convolutional networks are limited by the size of the convolutional kernel, resulting in a limited receptive field for each layer of the network. To address this issue, transformers introduced global attention, which has demonstrated excellent performance in natural language processing and has been widely applied in visual tasks. However, both convolutional networks and transformer models are constrained by the arrangement of data in Euclidean space, making it challenging to effectively extract features of irregular objects and complex scenes. We propose a graph neural network-based module (Graphion) for extracting global contextual features. Graphion maps the feature maps into a non-Euclidean space, establishes adjacency relationships among image patches, and uses a graph neural network for message passing and aggregation operations on the obtained node features, enabling the learning of correlation information from the graph structure. Consequently, it facilitates more efficient feature extraction of irregular object instances. Graphion is flexible and portable, allowing integration into visual feature extraction backbones. Extensive experiments were conducted on multiple datasets, including MS COCO, ADE20K, and ImageNet64. The proposed Graphion method demonstrates outstanding performance in object detection, segmentation, and classification. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Feature extraction
Object detection
Visualization
Image segmentation
Neural networks
Performance modeling
Data modeling