Crowd counting is still a challenging task due to the variability of the distance scale, crowd occlusion, and complex background information. However, the deep convolution neural network has been proved to be effective in solving these problems. By loading input images, the network generates predicted density maps, and the average absolute error between the predicted density maps and given ground truth (GT) maps is a solid standard for evaluating the quality of the network. We propose a mask-based generative adversarial network (MBGAN) structure to generate accurate predicted density maps. The network consists of two parts: the generator and the discriminator. In the generator, we embed a fundamental feature extracting module, multiple level dilated convolution blocks, a predicted mask, and shortcut connection operations. The discriminator is mainly used to distinguish whether the density map comes from the generator or the GT and urges the generator to produce the density map that can confuse itself. The training of the proposed MBGAN model is through the joint action of density loss and adversarial loss. In the training strategy, we use the cross training of the generator and discriminator. Through experiments on five available datasets, the MBGAN achieved state-of-the-art performances that outperform other advanced methods. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Convolution
Gallium nitride
Head
Performance modeling
Network architectures
Neural networks
Image quality