Estimating human body pose and shape from a single-view image has been highly successful, but most existing methods require a model with a large number of parameters that are difficult to run on low performance devices. Light weight networks are struggle to extract sufficient information for human pose and shape estimation, making accurate prediction challenging. In this paper, we propose a lightweight model for predicting human body shape and pose parameters of a parametric human body model. Our method comprises a lightweight multi-stage encoder based on Litehrnet and Shufflenet, and a decoder composed of cascaded MLPs based on human kinematic tree, which achieves comparable performance to HMR while the model size is only one-ninth of HMR. In addition, our model can achieve an inference speed of 19.2 times per second on the Qualcomm Snapdragon 888+
KEYWORDS: Feature extraction, Image restoration, Visual process modeling, Image compression, Object detection, Machine vision, Education and training, Semantics, Image segmentation, Human vision and color perception
Latent representation features in deep learning (DL) exhibit excellent potential for visual data applications. For example, in traffic monitoring and video surveillance, the features simultaneously perform image analysis for machine vision and image reconstruction for human viewing. However, the existing deep features that appeal to machine and human receivers are always combinations of separated pieces and specific features. Due to these features being extracted from different branches in collaboration frameworks, the inherent relations between machine and human vision are insufficiently explored. Therefore, to obtain one set of representative and generic features, we propose a dynamic groupwise splitting network based on image content to explore and extract generic features for the two different receivers. First, we analyze the characteristics of the latent features and adopt intermediate features as the base features. Then, a feature classification and transformation mechanism based on image content is proposed to enhance the base features for further image reconstruction and analysis. Consequently, an end-to-end model with multimodel cascading and multistage training realizes both machine and human vision tasks. Extensive experiments show that our human–machine vision collaboration framework has high practical value and performance.
Objective quality assessment plays a vital role in the evaluation and optimization of panoramic video. However, most of the current methods only consider the structural distortion caused by the projection format, and do not consider the effect of clarity on quality evaluation. For this reason, we propose a new objective video quality assessment method for panoramic video. First, the source image and the distorted image are down-sampled to obtain five sets of images with different scales. Second, calculate WS-SSIM at different scales. Finally, according to the degree of influence of different scales on the subjective evaluation, different coefficients are assigned to the corresponding WS-SSIM, and the overall score is calculated. Experiments on the database established in our laboratory have proved its effectiveness through comparison.
Virtual reality (VR) refers to a technology that allows people to experience the virtual world in an artificial environment. As one of the most important forms of VR media content, panoramic video can provide viewers with 360-degree free viewing angles. However, the acquisition, stitching, transmission and playback of panoramic video may damage the video quality and seriously affect the viewer's quality of experience. Therefore, how to improve the display quality and provide users with a better visual experience has become a hot topic in this field. When watching the videos, people pay attention to the salient areas, especially for the panoramic videos that people can choose the regions of interest freely. Considering this characteristic, the saliency information needs to be utilized when performing quality assessment. In this paper, we use two cascaded networks to calculate the quality score of panoramic video without reference video. First, the saliency prediction network is used to compute the saliency map of the image, and the patches with higher saliency are selected through the saliency map. In this way, we can exclude the areas in the panoramic image that have no positive effect on the quality assessment task. Then, we input the selected small salient patches into the quality assessment network for prediction, and obtain the final image quality score. Experimental results show that the proposed method can achieve more accurate quality scores for the panoramic videos compared with the state-of-the-art works due to its special network structure.
Collaborative intelligence is a new strategy to deploy deep neural network model for AI-based mobile devices, which runs a part of model on the mobile to extract features, the rest part in the cloud. In such case, feature data but not the raw image needs to be transmitted to cloud, and the features uploaded to cloud need have generalization capability to complete multitask. To this end, we design an encoder-decoder network to get intermediate deep features of image, and propose a method to make the features complete different tasks. Finally, we use a lossy compression method for intermediate deep features to improve transmission efficiency. Experimental results show that the features extracted by our network can complete input reconstruction and object detection simultaneously. Besides, with the deep-feature compression method proposed in our work, the quality of reconstructed image is good in visual and index of quantitative assessment, and object detection also has a good result in accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.