Industry 4.0 marks a shift toward fully automated digital production, where intelligent systems manage processes in realtime and interact continuously with their environment. Central to this evolution is robotic technology, which enhances productivity and precision in manufacturing. A key aspect of this advanced production model is human-robot interaction, where operators and robots work together on complex tasks. Ensuring safe collaboration between humans and robots is a primary objective. This paper proposes a method for human gesture recognition based on multi-sensor data fusion. By incorporating data from multiple sensors, we achieve a more complete and robust representation of gestures. Our approach involves an algorithm that classifies human movements in real-time using visual data. The process consists of several steps: data preprocessing, feature extraction, data integration, and gesture classification. By employing machine learning and deep learning techniques for feature extraction and analysis, we aim to achieve high accuracy in recognizing gestures.
Automatic restoration of damaged or missing pixels is a key problem in image reconstruction for various applications such as retouching, image restoration, image coding, and computer vision. This paper presents a novel approach for reconstructing texture and edge regions, focusing on achieving fine detail in image completion. The proposed method employs spatial reconstruction based on a geometric model, incorporating contour and exemplar-based texture analysis. We propose a technique for restoring object boundaries in images by constructing composite curves using cubic splines and anisotropic gradients. The shape-dependent gradients utilize the distinct forms in the structural pattern to encode both textural and contour information. Additionally, we search for similar patches, fuse them, and apply a deep neural network. We evaluate our model end-to-end on publicly available datasets, demonstrating that it outperforms current state-of-the-art techniques both quantitatively and qualitatively.
A system for determining the distance from the robot to the scene is useful for object tracking, and 3-D reconstructions may be desired for many manufacturing and robotic tasks. While the robot is processing materials, such as welding parts, milling, drilling, etc., fragments of materials fall on the camera installed on the robot, introducing unnecessary information when building a depth map, as well as the emergence of new lost areas, which leads to incorrect determination of the size of objects. There is a problem comprising a decrease in the accuracy of planning the movement trajectory caused by wrong sections on the depth map because of erroneous distance determination to objects. We present an approach combining defect detection and depth reconstruction algorithms. The first step for image defect detection is based on a convolutional auto-encoder (U-Net). The second step is a depth map reconstruction using a spatial reconstruction based on a geometric model with contour and texture analysis. We apply contour restoration and texture synthesis for image reconstruction. A method is proposed for restoring the boundaries of objects in an image based on constructing a composite curve by cubic splines. Our technique outperforms the state-of-the-art methods quantitatively in reconstruction accuracy on the RGB-D benchmark for evaluating manufacturing vision systems.
Images captured in surveillance systems suffer from low contrast and faint color. Recently, plenty of dehazing algorithms have been proposed to enhance visibility and restore color. We present a new image enhancement algorithm based on multi-scale block-rooting processing. The basic idea is to apply the frequency domain image enhancement approach for different image block scales. The parameter of transform coefficient enhancement for every block is driven through optimization of measure of enhancement. The main idea is that enhancing the contrast of an image would create more high-frequency content in the enhanced image than the original image. To test the performance of the proposed algorithm, the public database O-HAZE is used.
Augmented Reality (AR) applications demand realistic rendering of virtual content in a variety of environments, so they require an accurate description of the 3-D scene. In most case AR system is equipped with Time-of-Flight (ToF) cameras to provide real-time scene depth maps, but they have problems that affect the quality of depth data, which ultimately makes them difficult to use for AR. Such defects appear because of poor lighting, specular or fine-grained surfaces of objects. As a result, the effect of increasing the boundaries of objects appears, and the overlapping of objects makes it impossible to distinguish one object from another. The article presents an approach based on a modified algorithm for searching for similar blocks using the concept of anisotropic gradient. A proposed modified exemplar block-based algorithm uses the autoencoder-learned local image descriptor for image inpainting, that extract the features of images, and the depth image by a decoding network. The encoder consists of a convolutional layer and a dense block, which also consists of convolutional layers. We also show the application for the proposed vision system using depth inpainting for virtual content reconstruction in augmented reality. Analysis of the results of the study shows that the proposed method allows you to correctly restore the boundaries of objects on the image of the depth map. Our system quantitatively outperforms state-of-the-art methods in terms of reconstruction accuracy in the real and simulated benchmark datasets.
Automation of production processes using robots is a priority for the development of many industrial enterprises. Robotization is aimed at freeing a person from dangerous or routine work. At the same time, robots are able to perform tasks more efficiently than human, and the collaboration of a human and a robot allows to combine the strengths and effectiveness of robots and human cognitive ability into a single flexible system, and as a result, organize flexible methods of automation and reconfiguration of production processes. In this work, we focused on the implementation of the method of interaction between a person and a robot based on the recognition of gesture commands of a human-operator. An approach based on extraction a human skeleton and classification using a neural network is proposed as a method for recognizing actions. To test the effectiveness of the proposed algorithm, the possibility of transmitting gesture commands to the robot and organizing a contactless control method of the robot, simulation modeling was carried out in the RoboGuid environment. This environment is for industrial robots, provided by Fanuc.
The article presents a noise reduction method based on minimizing a multicriteria objective function. The technique makes it possible to perform minimization according to the criteria of the root-mean-square difference of the deviation between adjacent estimates of pixel values (vertical, horizontal) and between the mean-square difference of the input elements and the resulting estimates. The first criterion allows you to reduce the noise component in locally stationary areas of the image, the second to preserve the boundaries of transitions between objects. In the article, the adaptation of the choice of the processing parameter is performed using a trained neural network. The training was carried out on standard test images from widely used databases (Kodak, MS COCO, etc.). Tables comparing the effectiveness of the proposed adaptation algorithm to the previously applied approach are given.
Simultaneous localization and mapping (SLAM) systems are useful for camera tracking, and 3-D reconstructions may be desired for many robotic tasks. There is a problem consisting of a decrease in the accuracy of planning the movement trajectory caused by incorrect sections on the depth map due to incorrect distance determination to objects. Such defects appear as a result of poor lighting, specular or fine-grained surfaces of objects. As a result, the effect of increasing the boundaries of objects (obstacles) appears, and the overlapping of objects makes it impossible to distinguish one object from another. In this paper, we propose a multisensor SLAM system capable of recovering a globally consistent 3-D structure. The proposed method mainly takes two steps. The first step is to fusion images from visible cameras and depth sensors based on the PLIP model (parameterized model of logarithmic image processing) close to the human visual system's perception. The second step is image reconstruction. This article presents an approach based on a modified exemplar block-based algorithm using the autoencoder-learned local image descriptor for image inpainting. For this purpose, we learn the descriptors using a convolutional autoencoder network. Then, a 3-D point cloud is generated by using the reconstructed data. Our system outperforms the state-of-the-art methods quantitatively in reconstruction accuracy on a benchmark for evaluating RGB-D SLAM systems.
We consider in this paper the problem of image inpainting in infrared image analysis, where the objective is to reconstruct missing or deteriorated parts of an image. In this work, we develop image inpainting using quaternion representation concepts and modified exemplar-based technique. In our work, we exploit the concept of sparse representation, which takes a group of nonlocal patches with similar textures as the basic unit instead of a patch. As a result, the proposed method provides plausible restoration while propagating information of edge for the target region. Experimental inpainting results demonstrate the effectiveness of the proposed method in the task of reconstruction thermal images.
The fusion of data obtained in different electromagnetic ranges is an important task for many areas of research. The combining data is necessary for security systems (when searching for people in difficult weather conditions (snow, fog, rain, dust), automated control systems (auto-driving, UAVs), etc. The process of data analysis involves identifying base features. It includes the search and selection of borders, salience maps (human attention cards), angles, analysis of color gradients, etc. Most often, the detected features highlighted in images recorded in one range do not coincide with data obtained in other ranges. This is due to the fact that different electromagnetic ranges operate with different physical characteristics of objects in frames. The paper presents an approach based on the search and analysis of the basic descriptive characteristics of objects and the search for their correspondences on images of the same object, recorded in different electromagnetic ranges. As such data, the directions of the gradients are revealed, the search for the boundaries and angles of objects, the selection of locally stationary regions, the search for the center of mass of objects, the identification of the middle lines of stationary regions with included structures. The search for features is carried out on the basis of data obtained at various scales. Simplification of images is carried out on the basis of an algorithm for analyzing stationary regions and replacing the current intensity with an average. On the set of test data obtained in the visible range, near and far-infrared range, depth maps, the applicability of the proposed approach is shown. As an example of the applicability of this approach, an example is shown of stitching a pair of images obtained in different electromagnetic ranges.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.