Training data is an essential ingredient within supervised learning, but time consuming, expensive and for some applications impossible to acquire. A possible solution is to use synthetic training data. However, the domain shift of synthetic data makes it challenging to obtain good results when used as training data for deep learning models. It is therefore of interest to refine synthetic data, e.g. using image-to-image translation, to improve results. The aim of this work is to compare different methods to do image-to-image translation of synthetic training data of thermal IR-images using generative adversarial networks (GANs). Translation is done both using synthetic thermal IR-images alone, as well as including pixelwise depth and/or semantic information. To evaluate, we propose a new measure based on the Frechet Inception Distance, adapted to work for thermal IR-images. We show that by adapting a GAN model to also include corresponding pixelwise depth data to each synthetic IR-image, the performance is improved compared to using only IR-images.
In recent years the rise of deep learning neural networks has shown great results in image classification. Most of the previous work focuses on classification of fairly large objects in visual imagery. This paper presents a method of detecting and classifying small objects in thermal imagery using a deep learning method based on a RetinaNet network. The result shows that a deep neural network with a relative small set of labelled images can be trained to classify objects in thermal imagery. Objects from classes with the most training examples (cars, trucks and persons) can with relative high confidence be classified given an object size of 32×32 pixels or smaller.
This paper presents components of a sensor management architecture for autonomous UAV systems equipped with IR and video sensors, focusing on two main areas. Firstly, a framework inspired by optimal control and information theory is presented for concurrent path and sensor planning. Secondly, a method for visual landmark selection and recognition is presented. The latter is intended to be used within a SLAM (Simultaneous Localization and Mapping) architecture for visual navigation. Results are presented on both simulated and real sensor data, the latter from the MASP system (Modular Airborne Sensor Platform), an in-house developed UAV surrogate system containing a gimballed IR camera, a video sensor, and an integrated high performance navigation system.
We present a two-stage process for target identification and pose estimation. A database of possible target states, i.e. identity and pose, is precomputed by a two-step clustering procedure, reflecting the two stages of the identification process. The current database is based on images generated from 3D CAD models of military ground vehicles on which realistic infrared textures have been applied. At the coarse level, the database is divided into a set of clusters, each represented by a small set of eigenimages, obtained through principal component analysis (PCA). The classification at this level is achieved by measuring the orthogonal distance between the region of interest (ROI) and the eigenspace of each cluster. Each cluster itself contains a few subclusters. A support vector machine is employed for a pairwise discrimination of subclusters. The likelihood that the target belongs to a particular cluster/subcluster is based on histograms, obtained at the time of training of the system. In addition to the classification of individual images it is also possible to handle image sequences where the pose of the target might vary in subsequent image frames. In this situation, the pose is assumed to change according to a first-order Markov process. The overall probability for each target state is accumulated through recursive Bayesian estimation. The performance of the above procedure has been evaluated through the identification of targets in synthetic image sequences, where the targets are placed in realistic backgrounds. Currently , we are able to correctly identify the targets in more than 80 percent of the image sequences. In about 60 (80) percent of the cases the pose can be estimated within an accuracy of 10 (20) degrees. The accuracy of the pose estimation is limited by the size of the sub-clusters.
One way to increase the robustness and efficiency of unmanned surveillance platforms is to introduce an autonomous data acquisition capability. In order to mimic a sensor operator's search pattern, combining wide area search with detailed study of detected regions of interest, the system must be able to produce target indications in real time. Rapid detection algorithms are also useful for cueing image analysts that process large amounts of aerial reconnaissance imagery. Recently, the use of a sequence of increasingly complex classifiers has by several authors been suggested as a means to achieve high processing rates at low false alarm and miss rates. The basic principle is that much of the background can be rejected by a simple classifier before more complex classifiers are applied to analyse more difficult remaining image regions. Even higher performance can be achieved if each detector stage is implemented as a set of expert classifiers, each specialised to a subset of the target training set. In order to cope with the increasingly difficult classification problem faced at successive stages, the partitioning of the target training set must be made increasingly fine-grained, resulting in a coarse-to-fine hierarchy of detectors. Most of the literature on this type of detectors is concerned with face detection. The present paper describes a system designed for detection of military ground vehicles in thermal imagery from airborne
platforms. The classifier components used are trained using a variant of the LogitBoost algorithm. The results obtained are encouraging, and suggest that it is possible to achieve very low false alarm and miss rates for this very demanding application.
This paper describes a framework for image processing and sensor
management for an autonomous unmanned airborne surveillance system
equipped with infrared and video sensors. Our working hypothesis
is that integration of the detection-tracking-classification chain
with spatial awareness makes possible intelligent autonomous data
acquisition by means of active sensor control. A central part of
the framework is a surveillance scene representation, suitable for
target tracking, geolocation, and sensor data fusion involving
multiple platforms. The representation, based on Simultaneous
Localization and Mapping, SLAM, take into account uncertainties
associated with sensor data, platform navigation, and prior
knowledge. A client/server approach, for on-line adaptable
surveillance missions, is introduced. The presented system is
designed to simultaneously and autonomously perform the following
tasks: provide wide area coverage from multiple viewpoints by
means of a step-stare procedure, detect and track multiple
stationary and moving ground targets, perform a detailed analysis
of detected regions-of-interest, and generate precise target
coordinates by means of multi-view geolocation techniques.
In several recent articles it has been suggested that the shape of the correlation peak be used to distinguish between target and clutter. The peak shape is characterized in terms of some features, such as geometrical moments, which are then fed into a classifier that decides whether the peak was generated by target or clutter. The classification can be facilitated by an appropriate filter design. The maximum average correlation height (MACH) filter was designed to product similar correlation planes for target variations present in the training set. In this article we present generalizations of the MACH filter with the intention of decreasing the peak shape variation for targets in severe clutter. We show that by taking into account the non- overlapping character of the background noise and focusing the MACH correlation plane similarity requirement to the peak neighborhood, it is possible to simultaneously achieve a small variation in correlation peak shape and high peak- to-sidelobe ratios for cluttered images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.