PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Antanas Verikas,1 Dmitry P. Nikolaev,2 Petia Radeva,3 Jianhong Zhou4
1Halmstad Univ. (Sweden) 2Institute for Information Transmission Problems (Russian Federation) 3Univ. de Barcelona (Spain) 4Changchun Univ. of Science and Technology (China)
This PDF file contains the front matter associated with SPIE Proceedings Volume 11041 including the Title Page, Copyright information, Table of Contents, Introduction, and Conference Committee listing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The proposed algorithm matches local coordinates of each image with coordinates of the map by extracting roads’ line segments from the image and finding geometric transformation successfully matching the roads’ segments and the map’s ones. Parameters estimation is based on RANSAC algorithm which analyses the segments’ location and orientation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to plan precise treatment or accurate tumor removal surgery, brain tumor segmentation is critical for detecting all parts of tumor and its surrounding tissues. To visualize brain anatomy and detect its abnormalities, we use multi-modal Magnetic Resonance Imaging (MRI) as input. This paper introduces an efficient and automated algorithm based on the 3D bit-plane neighborhood concept for Brain Tumor segmentation using a rule-based learning algorithm. In the proposed approach, in addition to using intensity values in each slice, we consider sets of three consecutive slices to extract information from 3D neighborhood. We construct a Rule base using sequential covering algorithm. Through a rule-based ordering method and a reward/penalty policy, we assign weights to each rule such that the largest weight is assigned to the strongest (mostly referred) rule. Finally, the rules are ranked from the strongest to the weakest. Regarding to the strength of rules in the framework, those with highest weight are selected for voxel labeling. This algorithm is tested on BRATS 2015 training database of High and Low Grade tumors. Dice and Jaccard indices are calculated and comparative analysis is implemented as well. Experimental results indicate competitive performance compared to the state of the art methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes an improvement for an existing and widely spread approach of panorama stitching for images of planar objects. The proposed method is based on projective transformations graph adjustment. Evaluation is presented on a heterogeneous dataset which contains images of Earth’s and Mars’s surfaces, images taken using a microscope, as well as handwritten and printed text documents. Quality enhancement of panorama stitching method is illustrated on this dataset and shows more than twofold reduction in the accumulated computation error of projective transformations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper the method of image alignment based on average image sharpness maximization is proposed. The algorithm for global-shift model is investigated, its efficiency by applying FFT is shown. For projective model, an approach for image alignment using local shifts and RANSAC to obtain the final transform is considered. Experimental results for the system of document's reconstruction in a video stream increasing quality of output image are demonstrated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Binocular cameras have gained increasing attention because they can capture high-resolution images at a lower cost than monocular cameras. However, many existing binocular camera technologies typically require accurate depth estimation. To address this problem, this paper presents a new image enhancement method based on monochromecolored cameras. Our method replaces depth estimation with dense matching of feature points, thereby effectively reducing the computational complexity. After image matching, matrix completion is used to recover the color information of the monochrome image. Consequently, our method produces a high-quality image under the low-light condition. We built real image database for the experiments, and the results reveal that our method exhibits superior performance over existing methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a novel edge detector for synthetic aperture radar (SAR) images is proposed by introducing the Bhattacharyya coefficient (BC) combining with the rotated biwindow configuration. Based on the quantified input image, the BC is computed from two sample distribution histograms of local regions supported by the subwindows on the opposite sides of the pixel to be detected. With biwindows of different directions sliding through the image, multiple directional Bhattacharyya coefficient matrices are obtained, which are utilized to extract the edge strength map (ESM), characterizing the intensity variation in SAR images. Sequent nonmaximum suppression and hysteresis thresholding refine the extracted ESM into thin edges. Experiment results show that the proposed edge detector can accurately extract edges. Moreover, the BC-based ESM can act as a good precursor to guide SAR image segmentation based on region merging.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The paper considers the problem of images cropping obtained by projective transformation of source images. The problem is highly relevant to analysis of projective distorted images. We propose two cropping algorithms based on estimation of pixel stretching under the transformation. The algorithms use the ratio of pixel neighborhood areas and the ratio of their chord lengths. The methods comparison is conducted by estimation of cropped background relative areas. The experiment uses real dataset containing projective distorted images of the pages of Russian civil passports. The method based on chord lengths ratio shows better results on highly distorted images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The present paper is devoted to the solution of a tomographic reconstruction problem of using a regularized algebraic approach for large scale data. The paper explores the issues related to the use of cone beam polychromatic computed tomography. An algorithm for regularized solution of the linear operator equation is described. The minimizing parametric composite function is given and step of the iterative procedure developed is written. The reconstructed volumetric image is about 60 billions voxels. It forces to divide the task of reconstruction of the full volume into subtasks for the efficient implementation of the reconstruction algorithm on the GPU. In each of the subtasks the current solution for the local volume of a given size is calculated. An approach to local volumes selection and solutions crosslinking is described. We compared the image quality of the proposed algorithm with results of Filtered Back Projection (FBP) algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present differentiable implementations of several common image processing algorithms: Canny edge detector, Niblack thresholding and Harris corner detector. The implementations are presented in the form of fully convolutional networks and explicitly arranged exactly to the original algorithms. Usage of such form of the algorithms allows to tune their parameters with a gradient descent. We performed parameter tuning in the edge detection problem and it shows that our implementation enables us to obtain better results on the BSDS-500 dataset. As a part of implementations of algorithms, we introduce a generalization of pooling method, which allows using arbitrary structure element. We also analyze the given neural network architectures and show the connections with contemporary approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
According to its causes and effects, it becomes essential to judge the quality of the model after a treatment, compared to the initial image. In fact, the quality of 3D objects becomes a paramount criterion for any treatment. Many authors proposed descriptors to evaluate the quality or the natural appearance of images. However, it is obvious that the best correlation between the results obtained and human visual perception. We stand out against current trends by avoiding purely developing and mathematical measures, or completely inspired by the HVS (Human Visual System). Indeed our new metric 3DrwPSNR based on the use of Weber's law that takes into account the human visual system. This law translates a logarithmic eye’s perception of light. This property led us to develop a metric that takes into consideration the relative difference of models and not the absolute difference. These measurements prove there are much correlated with the human visual appreciation of the processed 3D objects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recently, image captioning has received much attention from the artificial-intelligent (AI) research community. Most of the current works follow the encoder-decoder machine translation model to automatically generate captions for images. However, most of these works used Convolutional Neural Network (CNN) as an image encoder and Recurrent Neural Network (RNN) as a decoder to generate the caption. In this paper, we propose a sequence-to-sequence model that uses RNN as an image encoder that follows the encoder-decoder machine translation model, such that the input to the model is a sequence of images that represents the objects in the image. These objects are ordered based on their order in the captions. We demonstrate the results of the model on Flickr30K dataset and compare the results with the state-ofthe-art methods that use the same dataset. The proposed model outperformed the state-of-the-art methods on all metrics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a patch-based inpainting algorithm for image block recovery in block-based coding image transmission. The algorithm is based on a geometric model for patch synthesis. The lost pixels are recovered by copying pixel values from the source using a similarity criterion. We used a trained neural network to choose the “best similar” patch. Experimental results show that the proposed method outperforms widely used state-of-the-art methods in both subjective and objective measurements of image block recovery.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we study combination of Viola-Jones classifier with deep convolutional neural network as an approach to the problem of object detection and classification. It is well known that Viola-Jones detectors are fast and accurate in detection of vast variety of different objects. On the other hand, methods based on neural network usage demonstrate high accuracy in the problems of image classification. The main goal of this paper is to study viability of Viola-Jones classifier in problem of image classification. The first part of both algorithms is the same: we will use Viola-Jones classifier to find object bounding rectangle in the image. The second part of the algorithms is different: we will compare usage of Viola-Jones classifier with convolutional neural network-based classifier. We will provide speed and accuracy comparison between these two algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work we discuss the known algorithms for linear colour segmentation based on a physical approach and propose a new modification of segmentation algorithm. This algorithm is based on a region adjacency graph framework without a pre-segmentation stage. Proposed edge weight functions are defined from linear image model with normal noise. The colour space projective transform is introduced as a novel pre-processing technique for better handling of shadow and highlight areas. The resulting algorithm is tested on a benchmark dataset consisting of the images of 19 natural scenes selected from the Barnard’s DXC-930 SFU dataset and 12 natural scene images newly published for common use. The dataset is provided with pixel-by-pixel ground truth colour segmentation for every image. Using this dataset, we show that the proposed algorithm modifications lead to qualitative advantages over other model-based segmentation algorithms, and also show the positive effect of each proposed modification. The source code and datasets for this work are available for free access at http://github.com/visillect/segmentation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Driver inattention and drowsiness are part causes of road accidents in Malaysia. Statistics shows that about 3 deaths per 10 000 registered vehicles reported by Malaysian Institute of Road Safety Research (MIROS) in 2016. Hence, an assistant system is needed to monitor driver’s condition like some car manufacturers introduced to their certain models of car. The assistant system is a part of main system known as advanced driver assistance systems (ADAS) are systems developed to enhance vehicle systems for safety and better driving. Desire to build safer vehicles and roads to reduce the number of fatalities and by legislation cause demand for ADAS. However, there are several challenges to design, implement, deploy, and operate ADAS. The system is expected to gather accurate input, be fast in processing data, accurately predict context, and react in real time. Suitable approach is needed to fulfil the system expectation. There are four types of detection including by using physiological sensors, driver performance, computer vision, and hybrid system. This paper describes the drowsiness and driver in attention detection and classification using computer vision approach. Our approach aims to classify driver drowsiness and inattention using computer vision. We proposed a technique to classify drowsiness into three different classes of eye state; open, semi close and close. The classification is done by using feature extraction method, percentage of eye closure (PERCLOS) technique and Support Vector Machine (SVM) classifier. Two types of data training and testing images of drivers’ eye condition which are eye with spectacles and eye without spectacles have been used. The results show that the proposed technique can classify the classes of distraction or drowsiness with high accuracy. Furthermore, by using only one type of eye condition data training, we are also able to classify the three different drowsiness classes regardless the eye conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Traffic light recognition (TLR) is an integral component of an intelligent vehicle and advance driver assistance systems (ADAS). At present, most of TLR solutions use vision based system along with prior knowledge of traffic light position (map information and height) provided by supporting sensors like Global Positioning System (GPS) sensor, to obtained high accuracy. In this work, we present a method that performs a real time TLR using only vision sensor and achieve good results. Our TLR process is divided into three stages, viz., traffic light box (TLB) detection, extraction of the glowing area from traffic light box and classification. Here, traffic light box detection is carried out using state-ofthe-art real-time object detection method, You Only Look Once (YOLO). For extraction, we project traffic light box region of interest (ROI) to custom color space and perform the blob analysis. In order to elimination false positives, we introduce light weight efficient classifier model in custom color space. For traffic light states classification, we use support vector machine (SVM) with RGB histogram of the cropped ROI as a feature. Bosch Small Traffic Lights Dataset has been used for the empirical validation of our method and achieving F1 score of 0.94 as a performance benchmark.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Failure in pedestrian detection systems can be extremely crucial, specifically in driverless driving. In this paper, failures in pedestrian detectors are refined by re-evaluating the results of state of the art pedestrian detection systems, via a fully convolutional neural network. The network is trained on a number of datasets which include a custom designed occluded pedestrian dataset to address the problem of occlusion. Results show that when applying the proposed network, detectors can not only maintain their state of the art performance, but they even decrease average false positives rate per image, especially in the case where pedestrians are occluded.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Stereoscopic vision modules have seen limited success in both engineering and consumer world, due to the required additional hardware (image acquisition, Virtual Reality headsets, 3D glasses). In the last years, especially the gaming and education sectors have benefited from such specialized headgear, providing virtual or augmented reality. However, many other industrial and biomedical applications such as e.g. computer aided design (CAD) or tomographic data display, so far have not fully exploited the increased 3D rendering capabilities of present-day computer hardware. We present an approach to use standard desktop PC hardware (monitor and webcam) to display user-position aware projections of 3D data without additional headgear. The user position is detected from webcam images, and the rendered 3D data (i.e. the view) is adjusted to match the corresponding user position, resulting in a quasi virtual reality rendering, albeit without the 3D effect of proper 3D head-gear. The approach has many applications from medical imaging, to construction and CAD, to architecture, to exhibitions, arts and performances. Depending on the user location, i.e. the detected head position, the data is rendered differently to attribute for the user view angle (zoom) and direction. As the user moves his or her head in front of the monitor, different features of the rendered object become visible. As the user moves closer to the screen, the view angle of the rendered data is decreased, resulting in a zoomed-in version of the rendered object.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Falls are the major cause of serious injuries and even death for elderly people. Fall detectors are usually based on wearable devices such as gyroscope, accelerometers, etc. Unfortunately, elderly people often forget to wear them especially those with dementia. In this paper, we present a new vision-based method for automatic fall detection in smart home environment. First, we extract efficiency the person silhouette based on background subtraction method and active contour. Then, motion and shape features are extracted from person body parts and analyzed in order to classify fall from other daily activities using rule-based classification. Evaluation results demonstrate the effectiveness of the proposed method in smart home environment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Pneumatic valve-controlled micro-droplet ejection is a printing technique that has potential applications in many fields, especially in the field of bio-printing. The ejection is controlled by a solenoid valve being briefly turned on, so that high pressure gas enters the liquid reservoir, forming a gas pressure pulse, forcing the liquid out through a tiny nozzle to form a micro-droplet. For bio-printing applications, the bio-inks are typically non-standard. The difficulties are not only that the initial working parameters are difficult to set, but also the working conditions change over time in many cases. In order to maintain a stable single-drop ejection state, a machine vision based ejection monitoring was designed to obtain the number, positions and sizes of the droplets for each ejection, and a feedback control is realized by adjusting the conduction time of the solenoid valve or the gas pressure at the front end of the solenoid valve.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have developed a prototype of a miniature prism-based optical system for simultaneous acquisition of two stereoscopic images on a single image sensor. The scheme and optical characteristics of the system are presented. We show that after a proper geometrical calibration and image processing it is possible to calculate three-dimensional (3-D) shape of the inspected objects. The devices based on this system may be effectively used for 3-D machine vision applications and remote visual inspection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Video slide shows constitute a very frequent video type. Typically, in order to produce such a video, a series of still images are used and processed. We present a method to solve the video slide show inversion problem where we are given a video slide show and we wish to extract the sequence of still images that have been used to produce this video. Our approach relies on a fast and efficient key-frame extraction method that partitions the video into content homogeneous segments and extracts a representative key-frame for each video segment. An important characteristic of this method is that the number of key-frames is determined automatically. Next, the set of key-frames is further processed to solve various problems such as mixing of images, fade-in/fade-out and zooming effects that produce keyframes not belonging to the original image sequence, thus they must be detected and discarded. We provide illustrative examples from the application of the method on several video slide shows with different characteristics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Under content-based video search in arbitrary metric space, there arises necessity to deal with the problems concerning various aspects of Big Data specificities. Since most of the available data do not match with a query (usually in the form ‘ad exemplum’), it comes into being a construction of elimination regions of such data, which allows to significantly reduce the number of necessary comparisons for metric search. It is proposed the use of solely pivot points to calculate the distances to the query. Such approach gives possibility to exclude the entire clusters from consideration without computationally capacious operations. General case of elimination regions scheme is considered.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Person reidentification is a very challenging problem nowadays because of a big amount of video surveillance systems used. The data from such systems is processed to analyze events or emergency situations, find specific people, etc. One of the ways of solving the problem of an area security is the development of person reidentification algorithms. In this paper we propose an algorithm for person reidentification based on RGB histogram features calculation. On the first stage HOG descriptor is selected to detect a person on an image. Then we used k-means++ clustering algorithm to remove background on a person image. Finally, Bayes and SVM classification methods were used for person reidentification. Experimental results showed that the proposed solution can be used for person reidentification with high precision (not less than 82%). To carry out research 3D People Surveillance Dataset was used.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Detecting small objects is challenging because of its low resolution and noisy representation. This paper focus on localize the bullet holes on a 4m*4m target surface and determine the shot time and position of new bullet holes on the target surface based on surveillance videos of the target. Under such a condition, bullet holes are extremely small compared with the target surface. In this paper, an improved model based on Faster-RCNN is proposed to solve the problem using two networks in series. The first network is trained using original video frames and obtain coarse locations of bullet holes, the second network is trained using the candidate locations obtained by the first network to get accurate locations. Experiment result shows that the series Faster-RCNN algorithm improves the average precision by 20.3% over the original Faster-RCNN algorithm on our bullet-hole dataset. To determine the shot time and improve detection accuracy, several algorithms have also been proposed, using these algorithms, detection accuracy of shot times and new shot points reaches the same level as human.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Measuring Heartbeat Rate (HR) is an important tool for monitoring the health of a person. When the heart beats the influx of blood to the head causes slight involuntary movement and subtle skin color changes, which cannot be seen by the naked eye but can be tracked from facial videos using computer vision techniques and can be analyzed to estimate the HR. However, the current state of the art solutions encounter an increasing amount of complications when the subject has voluntary motion on the face or when the lighting conditions change in the video. Thus the accuracy of the HR estimation using computer vision is still inferior to that of a physical Electrocardiography (ECG) based system. The aim of this work is to improve the current non-invasive HR measurement by fusing the motion-based and color-based HR estimation methods and using them on multiple input modalities, e.g., RGB and thermal imaging. Our experiments indicate that late-fusion of the results of these methods (motion and color-based) applied to these different modalities, produces more accurate results compared to the existing solutions
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we propose an original approach to optical video stream recognition system design, introducing new modules for image quality assessment, image quality correction and feedback. The main novelty of the proposed approach lays in combining image quality assessment results with the global dynamic object saliency maps which indicate the importance or the informative value of the corresponding image regions. The approach is applied to the identity documents video stream recognition system, where saliency maps are initially provided by document templates and are dynamically changing over time – for example, according to per-field stopping rules. Experiments demonstrated an increase of such essential recognition systems characteristics as accuracy, reliability and performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to the recent boom in artificial intelligence (AI) research, including computer vision (CV), it has become impossible for researchers in these fields to keep up with the exponentially increasing number of manuscripts. In response to this situation, this paper proposes the paper summary generation (PSG) task using a simple but effective method to automatically generate an academic paper summary from raw PDF data. We realized PSG by combination of vision-based supervised components detector and language-based unsupervised important sentence extractor, which is applicable for a trained format of manuscripts. We show the quantitative evaluation of ability of simple vision-based components extraction, and the qualitative evaluation that our system can extract both visual item and sentence that are helpful for understanding. After processing via our PSG, the 979 manuscripts accepted by the Conference on Computer Vision and Pattern Recognition (CVPR) 2018 are available1 . It is believed that the proposed method will provide a better way for researchers to stay caught with important academic papers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a novel method for the 3D reconstruction of urban scenes using a front-facing stereo camera in a vehicle. Our point-based approach warps an active region of the reconstructed point cloud to the next frame and uses an extended information filter for the temporal fusion of clustered disparity estimates in pixel bins. We splat the information of projected pixels according to subpixel weights and discard uncertain points. This method allows us to remove redundant points required for the reconstruction and at the same time presents a significantly denser model than competing approaches with improved disparity estimates. Our approach avoids common visual artifacts like spurious objects in the reconstruction. This results in a reconstruction with higher visual fidelity compared to other approaches, which is important for immersive applications. We compare our proposed system to other approaches in a quantitative and qualitative evaluation on the KITTI odometry dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to an impressive progress in digital imaging reached over past decades, the importance of analog video systems as primary instruments of receiving, transmitting and storing data has been greatly reduced. However, there still exist large amount of data stored in analog formats on media affected by aging. In this paper, a new three-stage method for detecting and restoring blotches on a video sequence has been developed. The new method including the motion compensation, LBP calculations and data classification using the neural network is shown to have higher efficiency than commonly used ROD, SROD and SDI methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Glaucoma is one of the most dangerous causes of blindness that results in permanent blindness within a few years if left untreated. It is very hard to diagnose particularly in early stages. Using ophthalmological images, vasculature of blood vessels is most valuable factor for detecting glaucoma. It can be segmented by image processing techniques which help in early diagnosis. In this research the vasculature found within the optic disc is segmented, then used to calculate its ratio in ISNT quadrants. On the basis of ISNT rule we find out that ratio of blood vessels in each and evaluates the results whether blood vessels are being nasalized i.e. they are violating or obeying ISNT rule. The proposed methodology is examined on 50 images collected from different image databases which are FAU, DMED and MESSIDOR to testify nasalization of vessels in retinal images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this article we present a method to extract the human airway tree from MR image. The assesment of morphometric data of bronchi is useful to estimate many diseases severity. Nowadays, the measurements are realized on CT scan which expose the patient to ionizing radiation. MRI radiation-free technique has never been used so far to extract the human bronchi because of insufficient signal and contrast. We propose in this article a processing chain to perform the 3D segmentation of the human airway tree from MR image. From this segmentation we can extract quantitative measurements on the airway, similarly to CT scan segmented airway trees. Finally we will present the results of a clinical study to prove those measurements are relevant. Replacing CT scan by MRI to assess airway diseases is mandatory for many applications (young people imaging, repeated measurements to follow-up a response to a treatment, ...) for which ionizing acquisition should be avoided. Thus, the results presented in this article open many perspectives related to lung diseases study.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work a dimension estimation system is developed for logistic applications. The purpose of the system is to find the oriented minimum bounding box of a package. The volume of the bounding box and the volumetric weight of the package were determined. Intel® RealSense™ Depth Camera D415 was used to obtain point cloud of the package view. Then smoothing and filtering algorithms were applied to eliminate the noise and the distortion. The object is isolated from the background and the minimum bounding box is determined. Different geometric shapes were tested including: hardboard calibrated cube, cube with an uneven top, uncalibrated box, cube with a sloped side, small cylinder, tube and cylinder with irregular top. Statistical analysis of the measurements revealed an average error rates less than 0.5cm for normal work conditions. This error rate is acceptable for most logistic operations
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a new digital watermarking method for protection of vector maps in geographic information systems. The method is based on a combination of two novel approaches. Firstly, the watermark is embedded into polygon objects of the map by cyclically shifting the vertex list of each polygon. Secondly, a raster image, superimposed on the map, is considered as a watermark. The major advantage of this method is that, unlike most existing watermarking techniques, it does not distort the map by altering the coordinate values. Experimental results demonstrate the efficiency of the proposed method, as well as the robustness of the embedded watermark against common geometric transformations: translation, scaling, rotation and cropping.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Estimating head pose of pedestrians is a crucial task in autonomous driving system. It plays a significant role in many research fields, such as pedestrian intention judgment and human-vehicle interaction, etc. While most of the current studies focus on driver’s-view images, we reckon that surveillant images are also worthy of attention since more global information can be obtained from them than driver’s-view images. In this paper, we propose a method for head pose estimation from surveillant images. This approach consists of two stages, head detection and pose estimation. Since the head of pedestrian takes up a very small number of pixels in a surveillant image, a two-step strategy is used to improve the performance in head detection. Firstly, we train a model to extract body region from the source image. Secondly, a head detector is trained to locate head position from the extracted body regions. We use YOLOv3 as our detection network for both body and head detection. For head pose estimation, we treat it as classification task of 10 categories. We use ResNet-50 as the backbone of the classifier, of which the input is the result of head detection. A serial of experiments demonstrate the good performance of our proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many techniques are used to solve problems of security in the Internet such as cryptography or watermarking. In this context watermarking is a way for protecting copyright and proving authenticity of a digital data. In this paper, a non blind digital watermark scheme is proposed. It is based on Discrete Cosine transformation (DCT), singular Values Decomposition (SVD) and Beta Chaotic Map (BCM). The experimental results show that this scheme is robust against several attacks compared to other algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we demonstrate how an existing deep learning architecture for automatically lip reading individuals can be adapted it so that it can be made speaker independent, and by doing so, improved accuracies can be achieved on a variety of different speakers. The architecture itself is multi-layered consisting of a convolutional neural network, but if we are to apply an initial edge detection-based stage to pre-process the image inputs so that only the contours are required, the architecture can be made to be less speaker favourable. The neural network architecture achieves good accuracy rates when trained and tested on some of the same speakers in the ”overlapped speakers” phase of simulations, where word error rates of just 1.3% and 0.4% are achieved when applied to two individual speakers respectively, as well as character error rates of 0.6% and 0.3%. The ”unseen speakers” phase fails to achieve as good an accuracy, with greater recorded word error rates of 20.6% and 17.0% when tested on the two speakers with character error rates of 11.5% and 8.3%. The variation in size and colour of different people’s lips will result in different outputs at the convolution layer of a convolutional neural network as the output depends on the pixel intensity of the red, green and blue channels of an input image so a convolutional neural network will naturally favour the observations of the individual whom the network was tested on. This paper proposes an initial ”contour mapping stage” which makes all inputs uniform so that the system can be speaker independent.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Sign language is a medium of communication for a person with an auditory and verbal disability or deficiency. Therefore, it is essential to understand their hand gestures without difficulty in order to have effortless and improved communication. Hand gesture detection is a challenging task. In this paper, we proposed an efficient method to recognize and classify images that contains hand gesture, using image Segmentation and the Bottleneck feature from a pre-trained model of Deep Neural Network. Our model achieved a descent accuracy over 96% therefore can be used to build an efficient system which can work as an interpreter between the disabled person and the other party. A comparison between conventional CNN (Convolutional Neural Network) model and our model is also shown to measure the effectiveness of our proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The paper considers bimodal person identification problem by analyzing the speaker’s face and voice. Two speaker identification algorithms are developed and compared. The idea of the first algorithm consists of extracting features from the speech signal in the form of mel frequency cepstral coefficients and, with this basis, forming a speaker model using Gaussian mixtures. Second approach is based on the use of a universal background model obtained from the records of a large number of speakers. For face identification, a neural network with 13 convolutional layers was used. For the learning and testing, the databases of speech signals and face images of 100 people were formed. The final bimodal identification system shows the high level of accuracy identification of more than 95%. The results of this experiment demonstrated the possibility of applying the proposed algorithms to the person identification problem in real-life systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The use of image processing and classification for agricultural applications has been widely studied and has led to work such as the automatic grading of fruit and vegetables, yield approximation and defect detection. Image segmentation is one of the first steps to identify the region of interest within an image. This paper presents an approach to automatic segmentation and classification of healthy and defective Carabao mangoes. K-means, range filtering and color-channel segmentation were utilized so that the varying texture and color of mangoes due to the surface defects can be considered. Results show that the proposed technique performs better than the classical K-means segmentation. The performance of segmentation step has a considerable influence on the precision of the classification model. Segmented and not segmented images were trained using KNN, SVM, MLP and CNN. The experiments showed that the models performed better when trained with segmented images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Despite the major advances, the accuracy of modern face verification systems depends on the lighting conditions. The variability of illumination can be compensated either by performing image preprocessing or by training more robust verification models. Nowadays, great priority is given to the development of neural network classifiers, while the importance of image preprocessing is being undeservedly neglected. This article proposes a method for spatially weighted brightness normalization of grayscale face images which preserves the relevant image information. An experimental study is performed to demonstrate the effects of various methods for brightness normalization on the accuracy of the neural network classifier in the application of face verification. It is shown that brightness normalization can improve the face verification accuracy for images captured in complex illumination conditions, that is, to compensate for samples that were not fully present in the training data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This work focuses on the Fast Hough Transform (FHT) algorithm proposed by M.L. Brady. We propose how to modify the standard FHT to calculate sums along lines within any given range of their inclination angles. We also describe a new way to visualise Hough-image based on regrouping of accumulator space around its center. Finally, we prove that using Brady parameterization transforms any line into a figure of type “angle”.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Camera Calibration and Positioning and Visual Attention
An extensive research has been held in the field of the visual attention modelling throughout the past years. However, the egocentric visual attention in real environments has still not been thoroughly studied. We introduce a method proposal for conducting automated user studies on the egocentric visual attention in a laboratory. Goal of our method is to study distance of the objects from the observer (their depth) and its influence on the egocentric visual attention. The user studies based on the method proposal were conducted on a sample of 37 participants and our own egocentric dataset was created. The whole experimental and evaluation process was designed and realized using advanced methods of computer vision. Results of our research are ground-truth values of the egocentric visual attention and their relation to the depth of the scene approximated as a depth-weighting saliency function. The depth-weighting function was applied on the state-of-the-art models and evaluated. Our enhanced models provided better results than the current depthweighting saliency models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The procedure of camera color calibration ensures accurate and repeatable acquisition of scene colors. The main process of calibration procedure is the color transformation from camera RGB color space to device independent color space, such as CIEXYZ or CIELAB. Typical calibration procedure assumes the uniformity of irradiance across the whole scene, but such assumption is difficult to achieve. Spatial changes in irradiance are typical for indoor and outdoor light conditions. The problem of color calibration under non-uniform lighting was researched by, e.g., B. Funt and P. Bastani. In the present article, their calibration procedure was tested together with the classic approach to camera calibration. Based on these experiments a modification was proposed, in which RGB image values are scaled accordingly to the results of additional measurements. This modification allows to obtain, in conditions of spatially nonuniform scene illumination, lower values of color differences ΔE * ab and ΔE00 than in cases of classic method and FuntBastani method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work, we present a camera geopositioning system based on matching a query image against a database with panoramic images. For matching, our system uses memory vectors aggregated from global image descriptors based on convolutional features to facilitate fast searching in the database. To speed up searching, a clustering algorithm is used to balance geographical positioning and computation time. We refine the obtained position from the query image using a new outlier removal algorithm. The matching of the query image is obtained with a recall@5 larger than 90% for panorama-to-panorama matching. We cluster available panoramas from geographically adjacent locations into a single compact representation and observe computational gains of approximately 50% at the cost of only a small (approximately 3%) recall loss. Finally, we present a coordinate estimation algorithm that reduces the median geopositioning error by up to 20%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we address the particularly challenging problem of calibrating a stereo pair of low resolution (80 × 60) thermal cameras. We propose a new calibration method for such setup, based on sub-pixel image analysis of an adequate calibration pattern and bootstrap methods. The experiments show that the method achieves robust calibration with a quarter-pixel re-projection error for an optimal set of 35 input stereo pairs of the calibration pattern, which namely outperforms the standard OpenCV stereo calibration procedure.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The goal of our work is to use visual attention to enhance autonomous driving performance. We present two methods of predicting visual attention maps. The first method is a supervised learning approach in which we collect eye-gaze data for the task of driving and use this to train a model for predicting the attention map. The second method is a novel unsupervised approach where we train a model to learn to predict attention as it learns to drive a car. Finally, we present a comparative study of our results and show that the supervised approach for predicting attention when incorporated performs better than other approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the past decades, Vector Quantization (VQ) model has been very popular across different pattern recognition areas, especially for feature-based tasks. However, the classification or regression performance of VQ-based systems always confronts the feature mismatch problem, which will heavily affect the performance of them. In this paper, we propose a two-stage iterative Procrustes algorithm (TIPM) to address the feature mismatch problem for VQ-based applications. At the first stage, the algorithm will remove mismatched feature vector pairs for a pair of input feature sets. Then, the second stage will collect those correct matched feature pairs that were discarded during the first stage. To evaluate the effectiveness of the proposed TIPM algorithm, speaker verification is used as the case study in this paper. The experiments were conducted on the TIMIT database and the results show that TIPM can improve VQ-based speaker verification performance clean condition and all noisy conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Optical Character Recognition (OCR) of Tamil characters is harder, due to the variety, compared to that of Latin characters. We propose a new technique to perform optical character recognition of printed Tamil characters using a hierarchical Convolutional Neural Network (CNN) which recognises the similarities between characters, while identifying the differences between the fine-grained characters. The performance of the developed system is comparable to the stateof-the-art techniques for OCR.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper a task of correction (post-processing) of machine-readable zone recognition results is discussed. A survey is presented for existing approaches of recognition error correction methods and an algorithm is proposed for applying these methods for machine-readable zone post-processing. Experimental results are shown for the described methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we study the real-time augmentation - method of increasing variability of training dataset during the learning process. We consider the most common label-preserving deformations, which can be useful in many practical tasks. Due to limitations of existing augmentation tools like increase in learning time or dependence on a specific platform, we developed own real-time augmentation system. Experiments on MNIST and SVHN datasets demonstrated the effectiveness of suggested approach - the quality of the trained models improves, and learning time remains the same as if augmentation was not used.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we consider the problem of detecting counterfeit identity documents in images captured with smartphones. As the number of documents contain special fonts, we study the applicability of convolutional neural networks (CNNs) for detection of the conformance of the fonts used with the ones, corresponding to the government standards. Here, we use multi-task learning to differentiate samples by both fonts and characters and compare the resulting classifier with its analogue trained for binary font classification. We train neural networks for authenticity estimation of the fonts used in machine-readable zones and ID numbers of the Russian national passport and test them on samples of individual characters acquired from 3238 images of the Russian national passport. Our results show that the usage of multi-task learning increases sensitivity and specificity of the classifier. Moreover, the resulting CNNs demonstrate high generalization ability as they correctly classify fonts which were not present in the training set. We conclude that the proposed method is sufficient for authentication of the fonts and can be used as a part of the forgery detection system for images acquired with a smartphone camera.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The paper presents an algorithm for document image recognition robust to projective distortions. This algorithm is based on a similarity metric, which is learned using Siamese architecture. The idea of training Siamese networks is to build a function of converting the image into a space where a distance function corresponding to a pre-defined metric approximates the similarity between objects of initial space. During learning the loss function tries to minimize the distance between pairs of object from the same class and maximize it between the ones from different classes. A convolutional network is used for mapping initial space to the target one. This network lets to construct a feature vector in target space for each class. Classification of objects is performed using the mapping function and finding the nearest feature vector. The proposed algorithm achieved recognition quality comparable to classifying convolutional network on an open dataset of document images MIDV-500 [1]. Another important advantage of this method is the possibility of one-shot learning that is also shown in the paper.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper considers problems regarding the development of stochastic models consistent with the results of character image recognition in video stream. Assumptions about their structure and properties are formulated for the constructed models. The description of the model components defines the Dirichlet distribution and its generalizations. The parameters of these distributions are determined using statistical estimation methods. The Akaike information criterion is used to rank models. The verification of the agreement of the proposed theoretical distributions to the sample data is carried out.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a novel self-learning framework, which converts a noisy, pre-labeled multi-class object dataset into a purified multi-class object dataset with object bounding-box annotations, by iteratively removing noise samples from the low-quality dataset, which may contain a high level of inter-class noise samples. The framework iteratively purifies the noisy training datasets for each class and updates the classification model for multiple classes. The procedure starts with a generic single-class object model which changes to a multi-class model in an iterative procedure of which the F-1 score is evaluated to reach a sufficiently high score. The proposed framework is based on learning the used models with CNNs. As a result, we obtain a purified multi-class dataset and as a spin-off, the updated multi-class object model. The proposed framework is evaluated on maritime surveillance, where vessels need to be classified into eight different types. The experimental results on the evaluation dataset show that the proposed framework improves the F-1 score approximately by 5% and 25% at the end of the third iteration, while the initial training datasets contain 40% and 60% inter-class noise samples (erroneously classified labels of vessels and without annotations), respectively. Additionally, the recall rate increases nearly by 38% (for the more challenging 60% inter-class noise case), while the mean Average Precision (mAP) rate remains stable.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the paper we consider computational optimization of recognition system on Very Long Instruction Word architecture. Such architecture is aimed to a broad parallel execution and low energy consumption. We discuss VLIW features on the example of Elbrus-based computational platform. In the paper we consider system for 2D art recognition as the example. This system is able to identify a painting on acquired image as a painting from the database, using local image features constructed from YACIPE-keypoints and their RFD-based binary color descriptors, created as a concatenation of RFD-like descriptors for each channel. They are computed fast, while the 2D art database is quite large, so in our case more than a half of execution time consumes descriptor comparison using Hamming distance during image matching. This operation can be optimized with the help of low-level optimization considering special architecture features. In the paper we show efficient usage of intrinsic functions for Elbrus-4C processor and memory access with array prefetch buffer, which is specific for Elbrus platform. We demonstrate the speedup up to 11.5 times for large arrays and about 1.5 times overall speedup for the system without any changes in intermediate computations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Local feature description is widely used in micro-expressions (ME) recognition. However, contemporary low-level handcrafted feature is insufficient in representing ME due to its insignificant and subtle motion which results in low recognition rate. This paper presents a novel handcrafted feature to represent ME based on intensity-level difference mapping, namely Center-Symmetric Local Mapped Pattern (CS-LMP). Due to its capability in capturing subtle pixel changes, CS-LMP is proposed to retrieve ME subtle motions which results in better accuracy. In this paper, CS-LMP features are extracted from ME public datasets and the results are compared to other state-of-the-art approaches where the classifications are performed using support vector machine and K-nearest neighbours. The results show that our approach produces prominent results as high as 79.59% compared to competing approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the development of Artificial Neural Networks (ANNs), they are becoming key components in many computer vision systems. However, to train ANNs or other machine learning programs it is necessary to create large and representative datasets, which can be a costly, hard and sometimes even impossible task. Another important problem with such programs is the data drift: in real-world applications input data can change with time, and the quality of a machine learning system trained on the fixed dataset may deteriorate. To combat these problems, we propose a model of ANN-based machine learning classification system that can be trained during its exploitation. The system both classifies input examples and performs training on the data gathered during its operation. We assume that besides ANN there is an external module in the system that can estimate confidence of the answers given by ANN. In this paper we consider two examples of such external module: a separate, uncorrelated classifier and a module that estimates ANN output by searching recognized words in a dictionary. We conduct numerical experiments to study the properties of the proposed system and compare it to ANNs trained offline.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Hand gesture recognition (HGR) is a natural way of Human Machine Interaction and has been applied on different areas. In this paper, we discuss works done in the area of applications of HGR in industrial robots where focus is on the processing steps and techniques in gesture-based Human Robot Interaction (HRI), which can provide useful information for other researchers. We review several related works in the area of HGR based on different approaches including sensor based approach and vision approach. After comparing the two approaches, we found that 3D vision-based HGR method is a challenging but promising researching area. Then, concerning works of implementation of HGR in industrial scenario are discussed in detail. Pattern recognition algorithms that effectively used in HGR like k-means, DTW etc. are briefly introduced as well.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Spatial relations are important ingredients for the interpretation of global meaning of structured objects as well as resolving the uncertainty caused by the ambiguities in the feature extraction stage. In this paper, we present a fuzzy rulebased system that accomplished the task of automated linguistic of spatial relationships between each neighboring pair of on-line handwritten stroke characters. We introduced the fuzzy logic in order to evaluate the possible interpretation and precision of the relation itself. The multiclass SVM classifiers are used for ourexperiment to classify the obtained spatial relations. Experiments usingMAYASTROUN database showed that the proposed method produces more intuitive results with a recognition rate of 94.82%. In fact, the experimental results highlighted that our approach outperforms other approaches that are reported in literature.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One possible approach to tackle the class imbalance in classification tasks is to resample a training dataset, i.e., to drop some of its elements or to synthesize new ones. There exist several widely-used resampling methods. Recent research showed that the choice of resampling method significantly affects the quality of classification, which raises the resampling selection problem. Exhaustive search for optimal resampling is time-consuming and hence it is of limited use. In this paper, we describe an alternative approach to the resampling selection. We follow the meta-learning concept to build resampling recommendation systems, i.e., algorithms recommending resampling for datasets on the basis of their properties.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we propose a dynamic programming solution to the template-based recognition task in OCR case. We formulate a problem of optimal position search for complex objects consisting of parts forming a sequence. We limit the distance between every two adjacent elements with predefined upper and lower thresholds. We choose the sum of penalties for each part in given position as a function to be minimized. We show that such a choice of restrictions allows a faster algorithm to be used than the one for the general form of deformation penalties. We named this algorithm Dynamic Squeezeboxes Packing (DSP) and applied it to solve the two OCR problems: text fields extraction from an image of document Visual Inspection Zone (VIZ) and license plate segmentation. The quality and the performance of resulting solutions were experimentally proved to meet the requirements of the state-of-the-art industrial recognition systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work we extend WGAN-GP in order to achieve better generation of synthesized images for finger spelling classification. The main difference between the ordinary WGAN-GP and the proposed algorithm is that in the training we employ both training samples and training labels. These training labels are fed to the generator, that generates the synthetic images using both the randomized latent input and the input label. In ordinary WGAN-GP, latent input variables are usually sampled from an unconditional prior. In the proposed algorithm the latent input vector is a concatenation of random part, the class labels and additional variables that are drawn from Gaussian distributions representing hand poses or gesture attributes. The JSL dataset for Hiragana sign recognition has been balanced using the rendered samples on the basis of a 3D hand model as well as the extended WGAN-GP.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In team sports scenes, recorded during training and lessons, it is common to have many players on the court, each with his own ball performing different actions. Our goal is to detect all players in the handball court and determine the leading player who performs the given handball technique such as a shooting at the goal, catching a ball or dribbling. This is a very challenging task for which, apart from an accurate object detector that is able to deal with cluttered scenes with many objects, partially occluded and with bad illumination, additional information is needed to determine the leading player. Therefore, we propose a leading player detector method combining the Mask R-CNN object detector and spatiotemporal interest points, referred to as MR-CNN+STIPs. The performance of the proposed leading player detector is evaluated on a custom sports video dataset acquired during handball training lessons. The performance of the detector in different conditions will be discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recently, convolutional neural network (CNN) has been widely implemented in the compute vision, nature language processing and automatic driving. However, it makes much difficulties to employ the neural network in the embedded system because of the limit of memory storage and the computation bandwidth. To address those limitations, we explore a two-stage approach in neural network compression for the scene, object detection. In this paper, we first propose an effective pruning approach on a trained neural network, and achieve total 81.86%-91.54% sparse rate with the accuracy losing 1-3%. Then we explore the quantization method to apply on the pruned neural network, and propose an adaptive codebook to store the quantized weight parameters and the index of the weight parameters. We utilize the two-stage model compression approach, model pruning and weights quantization, to implement on tiny-YOLO, the state-of-art object detection model, achieving total 41.9-62.7X compression rate with the accuracy loss less than 3.3%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the development of marine resources, the USV (Unmanned Surface Vehicle) was widely used as a platform for autonomous navigation in the marine environment. In order to ensure the safe navigation of USV, this paper proposed a sea surface obstacle detection method based on probability graphical model and sea-sky-line. Our method utilized the SLIC algorithm to segment the sea surface image for image pre-processing. Then, we proposed the superpixel-based probability graphical model to segment the image, and the sea surface image would be divided into three main semantic regions and an obstacle region. Finally, we proposed a sea-sky-line detection algorithm. Based on this, obstacles within the sea-sky-line would be detected. The accuracy of this method has reached 82.1%, and the recall rate has reached 92.0%. The method can effectively avoid the interference of sea surface reflection and objects such as clouds in the sky, and has a good effect on the detection of obstacles.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we present modification of the Viola-Jones approach for solving government seal stamp of the Russian Federation detection problem. The main contributions of the proposed modification are combining brightness and edge features as well as using L1 norm of the gradient of the image for calculating edge features. This modification allows to build classifiers which are more robust to noise, absence of a characteristic structure of contrasts and object's boundaries. The modification is experimentally compared to original Viola-Jones algorithm and showing better quality on different testing sets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Current state-of-the-art object detectors are based on supervised deep learning approaches. These methods require a large amount of annotated training data, which hinders a wider use of these methods in industry. We propose a method for generating synthetic training data for the task of detecting which objects in a pile can be picked up by a robot arm. The method requires few input images, which are used to create annotated images of piles. After training a state-of-theart detector on the synthetic data, we test it on real images. The results show that the model trained in such a way is not a rival to the best object detectors trained on large datasets of real images, but it is good for the specific task of detecting pickable objects in the piles. The main advantage of the proposed training approach is that the existing models can be easily re-trained to work with piles of different objects by personnel who do not specialize in machine learning.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Digitization of many documents in public institutions is performed with the use of optical scanner with fixed scanned area size. Therefore, scanned document does not always fill all the image area and may contain additional marginal noise. A major difficulty is the appearance of borders area, which can have varying pixel intensity with additional distortions. In this paper, a novel algorithm for fast detection of the document area in scanned images is proposed. It consists of three main stages: preprocessing, edge density projections and black run analysis. The experimental results on several scanned images with different size of document have demonstrate the 86.1% accuracy document content with fast processing speed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Gesture recognition is defined as non-verbal motions used as a means of communication in Human Computer Interaction. It is one of the significant aspects of HCI, both in the device interfaces and interpersonally. In a virtual reality system, gestures can be used to navigate, control or interact with a computer. The aim of gesture recognition is to capture gestures that are formed in a certain way and are detected by a device such as a camera. Hand gesture recognition is one of the logical ways to generate a convenient and high adaptability interface between devices and users. In this paper, a system is created for hand gesture recognition using image processing tools, namely Wavelets Transform (WT), Empirical Mode Decomposition (EMD) methods, Artificial Neural Networks (ANN) and Convolutional Neural Network (CNN), for gesture classification. These methods are evaluated based on many factors such as execution time, accuracy, sensitivity, specificity, positive and negative predictive value, likelihood, receiver operating characteristic, area under roc curve and root mean square. Preliminary results indicate that WT had less execution time than EMD and CNN. CNN had the ability to extract distinct features and classify data accurately while EMD and WT were less effective. Hence, the classification accuracy is improved dramatically.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper the authors compared the accuracy of several stereo matching algorithms using problem-oriented metrics developed by the authors earlier for obstacle detection. For comparison we have chosen the most computationally effective open-source algorithms, suitable for using in autonomous systems with limited processor capacities. The quality of the algorithms was compared on the public dataset KITTI Stereo Evaluation 2015. The hypothesis that the problemoriented metric of the stereo matching quality will lead to a different ranking than the universal metric, was not confirmed. At the same time, our measurements of the algorithms execution time showed results significantly different from those stated on KITTI portal.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Extraction of salient object from blurred and similar background color image is very difficult task. Many image segmentation methods have been proposed to overcome this problem but their performance is unsatisfactory when the target object and background has similar color appearance. In this paper, we have proposed a technique to overcome this problem with fast fuzzy-c-mean membership maps. These maps are blended by using Porter-Duff compositing method. The composite process is accomplished under different blending modes where foreground element of one map blend on the dropback element of the second map. These blended maps contain some outliers, which are removed by applying morphological technique. Finally an image mask, which is the composite form of frequency prior, color prior and location prior of an image is used to extract the final salient map from the given blended maps. Experiments on four well-known datasets (MSRA, MSRA-1000, THUR15000 and SED) are conducted; The results indicate the efficiency of proposed method. Our approach produces more accurate image segmentation, where the background and foreground maps have similarity in color appearance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep convolutional neural networks have led to significant improvement over the previous salient object detection systems. The existing deep models are trained end-to-end and predicts salient objects by calculating pixel values, which results saliency maps are typically blurry. Our Pixel-wise Binary Classification Network (PBCN) focuses on binary classification in pixel level for salient object detection: saliency and background. In order to increase the resolution of output feature maps and get denser feature maps, Hybrid dilation convolution (HDC) is employed into PBCN. Then, Hybrid Dilation Spatial Pyramid Pooling (HDSPP) is proposed to extract denser multi-scale image representations. In HDSPP, it contains one 1×1 convolution and several dilated convolutions, with different rates and the output feature maps of the convolutions will be fused. Finally, softmax is introduced to implement the binary classification instead of sigmoid. Experiment, on four datasets, show that PBCN significantly improves the state-of-the-art.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a method for classification and localization of road signs in a 3D space, which is done with a help of neural network and point cloud obtained from a laser range finder (LIDAR). In addition, to accomplish this task and train the neural network (which is based on Faster-RCNN architecture) a dataset was collected. The trained convolutional network is used as a part of ROS node which fuses the obtained classification, data from the camera and lidar measurements. The output of the system is a set of images with bounding boxes and point clouds, corresponding to real signs on the road. The introduced method was tested and performed well on a dataset acquired from a self-driving car during different road conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An array of fluid flow sensors can be used to detect and track underwater objects via the fluid flow field these objects create. The sensed flows combine to a spatio-temporal velocity profile, which can be used to solve the inverse problem; determining the relative position and orientation of a moving source via a trained model. In this study, two training strategies are used: simulated data resulting from continuous motion in a path and from vibratory motion at discrete locations on a grid. Furthermore, we investigate two sensing modalities found in literature: 1D and 2D sensitive flow sensors; all while varying the sensor detection threshold via a noise level. Results show that arrays with 2D sensors outperform those with 1D sensors, especially near and next to the sensor array. On average, the path method outperforms the grid method with respect to estimating the location and orientation of a source.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The paper presents an analysis of various approaches to constructing descriptions for gradient fields of digital images. The analyzed approaches are based on the well-known methods for reducing the data dimensionality, such as Principal (PCA) and Independent (ICA) Component Analysis, Linear Discriminant Analysis (LDA). We apply these methods not to original image, represented as a two-dimensional field of brightness (a halftone image), but to its secondary representation in the form of a two-dimensional gradient field, that is a complex-valued image. In this case, the approaches of using both the entire gradient field and only its phase part are analyzed. Also, the two independent ways of forming the original object final description are considered: using gradient field expansion coefficients in a derived basis and using the original author's method called model-oriented descriptors. The latter ones enable halving the number of real coefficients used in the original object description. The studies are conducted via solving face recognition problem. The effectiveness of the analyzed methods is demonstrated by applying them to images from Extended Yale Face Database B. The comparison is made using a nearest neighbor's classifier.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recognizing human body activities from the video sequences are directly depends on the features extraction for motion analysis, which is each activity can be presented by certain motion features. Therefore, by using corresponding features, we can probably classify different activities. This idea inspires us to form the activity recognition as a classification problem and verify its feasibility. In this work, two important goals are presented. The first one is extracting the motion and texture features from RGBD sequences by proposing a feature extracting method to extract feature vector values based on the Gray-Level Co-occurrence matrices (GLCM) of the dense optical flow pattern and the well-known Haralick features from these matrices by measuring meaningful properties such as energy, contrast, homogeneity, entropy, sum average, and correlation to capture local spatial and temporal characteristics of the motion through the neighboring optical flow fields (orientation and magnitude). Secondly, we present a performance comparison of five different classifiers such as Artificial Neural Networks, Naive Bayes classifier, Random Forest, K-Nearest Neighbors, and Support Vector Machine. Various numerical experiments results are carried on four well-known public datasets (Gaming Datasets, Cornell Activity Datasets, MSR Daily Activity 3D and Online RGBD Datasets) to verify the effectiveness of these classification algorithms. From experiments, the classifiers show different performance according to the features that computed and the set of classes from different activities. And the results demonstrate that all the five algorithms achieve satisfactory activity recognition performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose an accurate lane-level map building method using low-cost sensors such as cameras, GPS and in-vehicle sensors. First, we estimate the ego-motion from the stereo camera and the in-vehicle sensors, and globally optimize the accurate vehicle positions by fusion with the GPS data. Next, we perform lane detection on every image frame in the camera. Lastly, we repeatedly accumulate and cluster the detected lanes based on the accurate vehicle positions, and perform polyline fitting algorithm. The polyline fitting algorithm follows a variant of the Random Sample Consensus (RANSAC) algorithm, which particularly solves the multi-line fitting problem. This algorithm can expand the lane area and improve the accuracy at the same time by repeatedly driving the same road. We evaluated the lane-level map building on two types of roads: a proving ground and a real driving environment. The lane map accuracy verified at the proving ground was 9.9982cm of the CEP error.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In many of today’s big data analytics applications, it might need to analyze social media feeds as well as to visualize users’ opinions. This will provide a viable alternative source to establish new metrics in our digital life. Social interaction with people in Twitter is open-ended, making media analysis in Twitter easier in comparison with other social media. That is because the interaction in those media is often different since most of them are private. This work is therefore devoted to focus merely on Twitter and deemed to be within the confines of Data Mining. It is concerned with Natural Language Processing (NLP)-based sentiment analysis for Twitter’s opinion mining. As such, the objective of this work is to use a data mining approach of text-feature extraction, classification, and dimensionality reduction, using sentiment analysis to analyze and visualize Twitter users’ opinion. The utilized methodology is based on applying sentiment analysis NLP on a large number of tweets in order to get word scoring of the tweet and thus to exploit public tweeting for knowledge discovery. This will moreover serve for fake news detection. The pertinent mechanism involves several consecutive steps, namely: dataset collection stage, the pre-processing stage, NLP stage, sentiment analysis stage, and prediction and classification stage using BNN. The U.S. Airlines Sentiment Analysis Twitter dataset has been utilized which is already provided with Data for Everyone. The presented system is monitoring Twitter streams from both the media and the public. It is capable to extract meaningful data from tweets in real-time and store them into a relational model for analysis. And then use our dimension reduction method. This will help people discover the correlation of the leading role between them, which also reflects news media’s focuses and people’s interests. This system has proved better results with respect to accuracy and efficiency in comparison with some other similar works. It is convenient for a wide application spectrum involving: big data analytics solutions, predicting e-commerce customer’s behavior, improving marketing strategy, getting market competitive advantages, besides visualization in various data mining applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Computational models predicting stimulus-driven human visual attention usually incorporate simple visual features, such as intensity, color and orientation. However, saliency of shapes and their contour segments influence attention too. Therefore, we built 30 own shape saliency models based on existing shape representation and matching techniques and compared them with 5 existing saliency methods. Since available fixation datasets were usually recorded on natural scenes where various factors of attention are present, we performed a novel eye-tracking experiment that primarily focuses on shape and contour saliency. Fixations from 47 participants who looked at silhouettes of abstract and realworld objects were used to evaluate the accuracy of proposed saliency models and investigate which shape properties are most attentive. The results showed that visual attention integrates local contour saliency, saliency of global shape features and shape dissimilarities. Fixation data also showed that intensity and orientation contrasts play an important role in shape perception. We found that humans tend to fixate first irregular geometrical shapes and objects whose similarity to a circle is different from other objects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a novel and efficient initialization method for generalized facial landmark localization with an unsupervised roll-angle estimation based on B-spline models. We first show that the roll angle is crucial for an accurate landmark localization. Therefore, we develop an unsupervised roll-angle estimation by adopting a joint 1st -order B-spline model, which is robust to intensity variations and generic for application to various face detectors. The method consists of three steps. First, the scaled-normalized Laplacian of Gaussian operator is applied to a bounding box generated by a face detector for extracting facial feature segments. Second, a joint 1 st -order B-spline model is fitted to the extracted facial feature segments, using an iterative optimization method. Finally, the roll angle is estimated through the aligned segments. We evaluate four state-of-the-art landmark localization schemes with the proposed roll-angle estimation initialization in the benchmark dataset. The proposed method boosts the performance of landmark localization in general, especially for cases with large head pose. Moreover, the proposed unsupervised roll-angle estimation method outperforms the standard supervised methods, such as random forest and support vector regression by 41.6% and 47.2%, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Long range relations play a key role in tasks like human pose estimation that requires dense prediction. We propose an additional module containing a process called feature translation, to gather long range information at early stages. It is shown that such module has connection with dilated convolution and is more efficient. The module significantly improves performance in pose estimation and we show that most of the improvement is contributed by the feature translation process.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The paper considers the problem of 2D art identification in photos acquired with mobile devices under the conditions of museum exhibition. The proposed approach is based on a compact description of an image with a constellation of keypoints and corresponding local descriptors. A two-step comparison scheme is described for finding the best reference image matching the query. Bag-of-features approach is used as a first step, then mutual disposition of points is analyzed. Rejection of the query is performed if no suitable matches are found. Geometrical normalization of the query image is proposed to achieve higher robustness against scale and viewpoint variations. After the normalization, mutual disposition of points is estimated using a simplified geometric model. Advantages of the described approach over state-of-the-art solutions are considered. The results of the experiments conducted on the open WikiArt dataset are presented along with processing times for different hardware platforms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many domain specific challenges for feature matching and similarity learning in computer vision have been relying on labelled data, either using heuristic or more recent approaches via deep learning. While aiming for precise solutions, we need to process larger number of features which may result in higher computational complexity. This paper proposes a novel method of similarity learning through two-part cost function as it could be done using heuristic approaches in original feature space in an unsupervised manner, while also reducing feature complexity. The features are encoded on the lower dimensionality manifold which preserve original structure of data. This approach takes advantage of siamese networks and autoencoders to obtain compressed features while maintaining same distance properties as in the original feature space. This is done by introducing new loss function with two terms, which aims for good reconstruction as well as learning the similar data point neighborhood from encoded and reconstructed feature space.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Neural networks are powerful technology for classifying character patterns and object images. A huge number of training samples is very important for classification accuracy. A novel method for recognizing handwritten hiragana characters is proposed that combines pre-trained convolutional neural networks (CNN) and support vector machines (SVM). The training samples are augmented by pattern distortion such as by cosine translation and elastic distortion. A pre-trained CNN, Alex-Net, can be used as the pattern feature extractor. Alex-Net is pre-trained for large-scale object image datasets. An SVM is used as a trainable classifier. Original hiragana samples of 46 classes on the ETL9B are divided in two-fold by odd and even dataset numbers. Samples with the odd dataset number and augmented patterns on the ETL9B database are trained by the SVM. The feature vectors of character patterns are passed to the SVM from AlexNet. The average error rate was 1.130% for 100 test patterns of each of the 46 classes for a 5-times test, and the lowest error rate was 0.978% with 506138 training patterns of distorted hiragana characters. Experimental results showed that the proposed method is effective in recognizing handwritten hiragana characters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recent developments in the remote sensing systems and image processing made it possible to propose a new method of the object classification and detection of the specific changes in the series of satellite Earth images (so called targeted change detection). In this paper we develop a formal problem statement that allows to use effectively the deep learning approach to analyze time-dependent series of remote sensing images. We also introduce a new framework for the development of deep learning models for targeted change detection and demonstrate some cases of business applications it can be used for.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An end-to-end deep learning (DL) control model is proposed to solve autonomous landing problem of the quadrotor in way of supervised learning. Traditional methods mainly focus on getting the relative position of the quadrotor through GPS signal which is not always reliable or position-based vision servo (PBVS) methods. In this paper, we have constructed a deep neural network based on convolutional neural network(CNN) whose input is raw image. A monocular camera is used as only sensor to capture down-looking image which contains landing area. To train our deep neural network, we have used our self-built image dataset. After training phase, the well-trained control model is tested and the results perform well. Light changes and background interferences have little influence on the model`s performance, which shows the robustness and adaptation of our deep learning model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Convolution Neural Networks (CNN) have evolved to be the state-of-art technique for machine learning tasks. However, CNNs bring a significant increase in the computation and parameter storage costs, which makes it difficult to deploy on embedded devices with limited hardware resources and a tight power budget. In recent years, people focus on reducing these overheads by compressing the CNN models, such as pruning weights and pruning filters. Compared with the method of pruning weights, the method of pruning filters does not result in sparse connectivity patterns. And it is conducive to the parallel acceleration on hardware platforms. In this paper, we proposed a new method to judge the importance of filters. In order to make the judgement more accurate, we use the standard deviation to represent the amount of information extracted by the filter. In the process of pruning, the unimportant filters can be removed directly without loss in the test accuracy. We also proposed a multilayer pruning method to avoid setting the pruning rate layer by layer. This holistic pruning method can improve the pruning efficiency. In order to verify the effectiveness of our algorithm, we do experiments with simple network VGG16 and complex networks ResNet18/34. We re-trained the pruned CNNs to compensate the accuracy loss caused by the pruning process. The results showed that our pruning method can reduce inference cost by up to 50% for VGG16 and 35% for ResNet18/34 on CIFAR10 with little accuracy loss.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a vehicle re-identification method for a parking lot toll system. Given a probe image captured from one camera installed in the entrance of a parking lot, re-identification is the method of identifying a matching image from a gallery set constructed from different cameras in the exit region. This method is especially useful when the license plate recognition fails. Our method is based on a convolutional neural network (CNN) which is a variant of multilayer perceptron (MLP). An input image of the CNN model is cropped by the license plate detection (LPD) algorithm to eliminate the background of an original image. To train a vehicle re-identification model, we adopt the pre-trained models which showed the outstanding results in the ImageNet [1] challenge from 2014 to 2015. Then, we fine-tune one of the models (GoogLeNet [2]) for a car’s model recognition task using a large-scale car dataset [3]. This fine-tuned model is utilized as a feature extractor. Cosine function is used to measure the similarity between a probe and a gallery. To evaluate the performance of our method, we create two datasets: ETRI-VEHICLE-2016-1 and ETRI-VEHICLE2016-2. The experimental result reveals that the proposed technique can achieve promising results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we investigate the classification of two soft tissue sarcoma subtypes within a multi-modal medical dataset based on three pre-trained deep convolutional networks of the ImageNet challenge. We use multiparametric MRI’s with histologically confirmed liposarcoma and leiomyosarcoma. Furthermore, the impact of depth on fine-tuning for medical imaging is highlighted. Therefore, we fine-tune the AlexNet along with deeper architectures of the VGG. Two configurations with 16 and 19 learned layers are fine-tuned. Experimental results reveal a 97.2% of classification accuracy with the AlexNet CNN, while better performance has been achieved using the VGG model with 97.86% and 98.27% on VGG-16-Net and VGG-19-Net, respectively. We demonstrated that depth is favorable for STS subtypes differentiation. Addionally, deeper CNN’s converge faster than shallow, despite, fine-tuned CNN‘s can be used as CAD to help radiologists in decision making.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As of late, various studies used deep learning methods on brain tumor data. There are two principle tasks used for brain tumor classifications: feature extraction and brain tumor classification. The motive of this study is to analyze various images of the BraTS 2015 dataset by using deep learning algorithms and to classify the brain tumors to different types of regions: whole, core and enhanced. We address the issue by using deep belief networks, a kind of neural network that detects features in images and classifies them. This paper introduces a three-step framework for classifying multiclass radiography images. The first step utilizes a de-noising technique based on vanilla data preprocessing to remove noise and insignificant features of the images. For learning the unlabelled features we used unsupervised deep belief network (DBN) in the second step. In the small-scale DBNs have demonstrated significant potential but when scaling to large networks the computational cost of training the restricted Boltzmann machine is a major issue. Discriminative feature subsets obtained in the first two steps serve as inputs into classifiers in the third step for evaluations. Our goal is a machine capable of recognizing a brain tumor’s type; we define a probabilistic model that classifies brain tumors to different types of regions having as input the best preprocessing vanilla (Image size 256×256). Using the BraTS data to train the deep belief networks proves an accuracy of 91.6% on its classifications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Convolutional Neural Networks (CNNs) are getting larger and deeper, and thus becoming harder to be deployed on systems with limited resources. Though convolutional filters benefit from the concept of receptive field, they still take up lots of resources to store these parameters in the large amounts of filters. Therefore, a compression method of pre-trained CNN models using "Linear Representation" of convolutional kernels is introduced in this paper. First, a codebook of template kernels "Kt". are generated by conducting unsupervised clustering on all convolutional kernels, with Pearson Correlation Coefficient set as distance. Then all the convolutional kernels are represented by the closest templates using linear fitting function a • Kt + b , which means that only two parameters and a codebook index are enough to represent a kernel. After that, the model is retrained with fixed template kernels and only two related parameters need to be finetuned for each kernel. Experiments show that convolutional kernels of a large CNN model can be represented using only a small amount of templates. Thus, this method can reach a compression rate of convolutional layers near 4×, with tiny impact on precision after retraining. Nevertheless, the proposed method can be performed with other compression approaches to get higher compression rate.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Tracking of ball in sports videos is one of the most challenging tasks in computer vision and video processing domain. Recent ball tracking approaches fail to handle tracking of a small size and fast moving ball. Inaccurate 2D ball detection leads to further deterioration of 3D ball tracking results. This paper presents a soccer ball tracking by detection approach using a pre-trained Convolutional Neural Network (CNN). The proposed algorithm used CNN for identifying ball from background and other moving objects like players and referees. The 2D ball detection results are fine-tuned for identifying true ball positions. True ball positions, from cameras shooting the scene from different angle, are further mapped on ground plane. The actual ball movement is tracked in 3D from top-view. Experiments show that the proposed algorithm can tackle challenges like small ball size, shape changes, occlusion and tracking high-speed balls.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Floods are the natural disasters which can give serious damage to properties, roads, vehicles and even people. These damages bring huge payload both to individuals and governments. Thus, a system which can detect floods at early stage and warn the related offices immediately will be very useful for public. Detecting flooding early can save human lives, time, money for the government, as well as an important step to move towards smarter cities. In this paper, we propose the use of a deep learning architecture to detect floods in certain susceptible areas. We used FCN AlexNet deep learning architecture to train and test our dataset. Images of our dataset are collected from two PTZ cameras with different view angles. According to the experimental results, used system gets above 95% classification accuracy on both cameras.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Internet of Things (IoT) devices, mobile phones, and robotic systems are often denied the power of deep learning algorithms due to their limited computing power. However, to provide time critical services such as emergency response, home assistance, surveillance, etc., these devices often need real time analysis of their camera data. This paper strives to offer a viable approach to integrate high performance deep learning based computer vision algorithms with low-resource and low-power devices by leveraging the computing power of the cloud. By offloading the computation work to the cloud, no dedicated hardware is needed to enable deep neural networks on existing low computing power devices. A Raspberry Pi based robot, Cloud Chaser, is built to demonstrate the power of using cloud computing to perform real time vision tasks. Furthermore, to reduce latency and improve real time performance, compression algorithms are proposed and evaluated for streaming real-time video frames to the cloud.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Convolutional neural networks are able to perform a hierarchical learning process starting with local features. However, a limited attention is paid to enhancing such elementary level features like edges. We propose and evaluate two wavelet-based edge-feature enhancement methods to preprocess the input images to convolutional neural networks. The first method develops representations by decomposing the input images using wavelet transform and limited reconstructing subsequently. The second method develops such feature-enhanced inputs to the network using local modulus maxima of wavelet coefficients. For each method, we have developed a new preprocessing layer by implementing each proposed method and have appended to the network architecture. Our empirical evaluations demonstrate that the proposed methods are outperforming the baselines and previously published work.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Crowd counting is a challenging task in computer vison field and haven’t been well addressed until now. In this paper, we intend to develop an end to end multi-scale deep convolutional neural network(CNN) model that can accurately estimate the crowd count from an individual image with arbitrary crowd density and perspective. The proposed model extract multi-scale deep CNN features from the input image and regress the crwod count directly, without any post-processing . Hence our model could handle muti-scale targets well in various crowd scene. We evaluate our model on several benchmark datasets and the performance outperforms some state-of-the-art methods. What’s more, due to the end-to-end characteristics, our model demonstrates good practical application performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Laser SLAM can be implemented using ROS and Ubuntu system. However, it cannot be run in Windows operating system which is more stable than Ubuntu. To implement the laser SLAM in Windows system, the main program of laser SLAM in ROS is carefully analyzed and modified to make it adapt to Windows system. The main programs of laser processing, coordinate transformation and map construction are rewritten and reorganized. To verify the effectiveness of our work, experiments were conducted in real-world environments. The results of experiments validated that laser SLAM can be implemented in Windows system by rewriting and reorganizing these main programs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Machine learning has made breakthroughs in areas such as computer vision and natural language processing. In recent years, more and more research has been done on the application of machine learning on robotic grasping. This article summarizes the research progress of machine learning on robotic grasping, from the aspects of object grasping datasets, two main categories of methods that differ from the criteria for successful grasping with deep learning or reinforcement learning algorithm, discusses what current researches have done and the problems that have not yet been solved, and hopes to inspire new ideas in research of robotic grasping based on machine learning.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present the precise indoor positioning system for mobile robot pose estimation based on visual edge detection. The set of onboard motion sensors (i.e. wheel speed sensor and yaw rate sensor) is used for pose prediction. A schematic plan of the building, stored as a multichannel raster image, is used as a prior information. The pose likelihood estimation is performed via matching of edges, detected on the optical image, against the map. Therefore, the proposed method does not require any deliberate building infrastructure changes and makes use of the inherent features of manmade structures - edges between walls and floor. The particle filter algorithm is applied in order to integrate heterogeneous localization data (i.e. motion sensors and detected visual features). Since particle filter uses probabilistic sensor models for state estimation, the precise measurement noise modeling is key to positioning quality enhancement. The probabilistic noise model of the edge detector, combining geometrical detection noise and false positive edge detection noise, is proposed in this work. Developed localization system was experimentally evaluated on the car-like mobile robot in the challenging environment. Experimental results demonstrate that the proposed localization system is able to estimate the robot pose with a mean error not exceeding 0.1 m on each of 100 test runs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Patrolling is a task of providing a uniform coverage of some area with one or several vehicles. Recent scientific developments focus on patrolling using multiple cooperative autonomous agents without a single center of command. Using a group of agents can increase the efficiency of patrolling; however, group algorithms need to govern not only individual movements, but also cooperation between agents and their distribution over the area. To achieve cooperation agents exchange data about their movements and the patrolling area. Agent's decisions can be affected by information received from other agents, however, few studies had considered how the presence of incorrect information affects the patrolling’s efficiency. In this paper we consider a novel problem of counteracting and detecting a sabotaging agent in the context of a multi-agent stochastic patrolling. We consider the modified Social Potential Fields approach, propose a model of sabotaging agent and develop two algorithms for its counteraction and detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent decades, great researchers have focused on the development of simultaneous localization and mapping. Thus, SLAM technology (Simultaneous Localization and Mapping) has gone through a great technological evolution over the years. The present investigative work shows a chronology of milestones in the evolution of the SLAM of mobile robotics until today with a comparison of the main techniques that have been developed, aiming to identify the most suitable algorithm to perform a generation of 3D SLAM in real time in terms to improve accuracy, speed, and robustness among other aspects. These features will help to reduce mapping reconstructions errors, to improve the localization of a robot, and to provide greater reliability of the environment worked. This analysis will be established under some specific research criteria and as a prelude to a project focused on making a mobile robot capable of providing relevant information for search and rescue of people in critical situations, situations in which where it will be essential to perform SLAM 3D reliably and in real time. Thus, the present work it is focused to do an important systematic analysis regarding mobile robotics on the need to reason in a world that is sometimes confusing, sometimes dynamic and changing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.