Deep learning has made substantial progress in crowd density estimation, but there are still some problems in existing methods, such as large population density, background interference, and scale change, which makes it difficult to count people. To solve the above problems, we proposed a crowd counting method based on a cross column fusion attention mechanism. First, the first ten layers of VGG16 with good migration ability and feature extraction ability are used as the front-end network to preliminarily extract human head features. Then, a cross column fusion attention module is designed. In this module, feature maps are fused across columns to make the network contain richer deep and shallow features. At the same time, to alleviate the background interference, the attention mechanism is used to guide the network to focus on the head position in the picture, and different weights are assigned to different positions according to the attention score map, so as to highlight the crowd and weaken the background, and finally get a high-quality density map. In addition, a shallow convolution module is designed as another branch. The output feature map of the shallow convolution module and the output feature map of the attention module of cross column fusion are fused to solve the problem of scale change effectively. Finally, in the last layer of the network, the convolution layer of 1 × 1 is used to replace the full connection layer, and fewer network parameters are used to reduce the calculation and the population density map is regressed. The experimental results show that the mean absolute error and mean square error of the proposed algorithm are significantly reduced compared with the comparison algorithm. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Convolution
Head
Image fusion
Image processing
Feature extraction
Distortion
Performance modeling