Susceptibility to adversarial attacks is an issue that plagues many deep neural networks. One method to protect against these attacks is adversarial training (AT), which injects adversarially modified examples into training data, in order to achieve adversarial robustness. By exposing the model to malignant data during training, the model learns to not get fooled by them during inference time. Although, AT is accepted to be the de facto defense against adversarial attacks, questions still remain when using it for practical applications. In this work, we address some of these questions: What ratio of original-to-adversarial in the training set is needed to make them effective?, Does model robustness from one type of AT generalize to another attack?, and Does the AT data ratio and generalization vary depending on model complexity? We attempt to answer these questions using carefully crafted experiments using CIFAR10 dataset and ResNet models with varying complexity.
KEYWORDS: Video, Cameras, Object detection, RGB color model, Education and training, Detection and tracking algorithms, Video surveillance, Image processing, Signal processing, Feature extraction
Video sensors are ubiquitous in the realm of security and defense. Successive image data from those sensors can serve as an integral part of early-warning systems by drawing attention to suspicious anomalies. Using object detection, computer vision, and machine learning to automate some of those detection and classification tasks aids in maintaining a consistent level of situational awareness in environments with ever-present threats. Specifically, the ability to detect small objects in video feeds would help people and systems to protect themselves against far away or small hazards. This work proposes a way to accentuate features in video stills by subtracting pixels from surrounding frames to extract motion information. Features extracted from a sequence of frames can be used either alone, or that signal can be concatenated onto the original image to highlight a moving object of interest. Using a two-stage object detector, we explore the impacts of frame differencing on Drone vs. Bird videos from both stationary cameras as well as cameras that pan and zoom. Our experiments demonstrate that this algorithm is capable of detecting objects that move in a scene regardless of the state of the camera.
Adversarial training (AT) is considered the most effective strategy to defend a machine learning model against adversarial attacks. There are many different methods to perform AT, but the underlying principle is the same, namely, augment the training data with adversarial examples. In this work, we investigate the efficacy of four different adversarial example generation strategies on AT of a given classification model. The four methods represent different categories of attack and data. Specifically, two of the adversarial generation algorithms perform attacks in the pixel domain, while others operate in the latent space of the data. On the other hand, two of the methods generate adversarial data samples designed to be near the model decision boundaries, while the other two generate generic adversarial examples (not necessarily at the boundary). The adversarial examples from these methods are used to adversarially train models on MNIST and CIFAR10. In the absence of a good metric to measure robustness of a model, capturing the effect of AT using a single metric can be a challenge. Hence, we opt to evaluate the robustness improvements resulting of the adversarially trained model using a variety of empirical metrics introduced in the literature that measure local Lipshitz value of a network (CLEVER), smoothness of decision boundaries, robustness to adversarial perturbations and defense transferability
Cooperative autonomous systems, such as swarms, multi-camera systems, or fleets of self-driving vehicles, can better understand a given scene by sharing multiple modalities and varying viewpoints. This superposition of data adds robustness and redundancy to a system typically burdened with obstructions and unrecognizable, distant objects. Collaborative perception is a key component of cooperative autonomous systems where modalities can include camera sensors, LiDAR, RADAR, and depth images. Meanwhile, the amount of useful information that can be shared between agents in a cooperative system is constrained by current communication technologies (e.g. bandwidth limitations). Recent developments in learned compression can enable the training of end-to-end cooperative systems using deep learning with compressed communication in the pipeline. We explore the use of a deep learning object detector in a cooperative setting with a learned compression model facilitating communication between agents. To test our algorithm, this research will focus on object detection in the image domain as a proxy for one of the modalities used by collaborative systems.
Standard object detectors are trained on a wide array of commonplace objects and work out-of-the-box for numerous every-day applications. Training data for these detectors tends to have objects of interest that appear prominently in the scene making them easy to identify. Unfortunately, objects seen by camera sensors in the real-world scenarios typically do not always appear large, in-focus, or towards the center of an image. In the face of these problems, the performance of many detectors lags behind the necessary thresholds for their successful implementation in uncontrolled environments. Specialized applications necessitate additional training data to be reliable in-situ, especially when small objects are likely to appear in the scene. In this paper, we present an object detection dataset consisting of videos that depict helicopter exercises recorded in an unconstrained, maritime environment. Special consideration was taken to emphasize small instances of helicopters relative to the field-of-view and therefore provides a more even ratio of small-, medium-, and large-sized object appearances for training more robust detectors in this specific domain. We use the COCO evaluation metric to benchmark multiple detectors on our data as well as the WOSDETC (Drone Vs. Bird) dataset; and, we compare a variety of augmentation techniques to improve detection accuracy and precision in this setting. These comparisons yield important lessons learned as we adapt standard object detectors to process data with non-iconic views from field-specific applications.
Object detectors on autonomous systems often have to contend with dimly-lit environments and harsh weather conditions. RGB images alone typically do not provide enough information. Because of this, autonomous systems have an array of other specialized sensors to observe their surroundings. These sensors can operate asynchronously, have various effective ranges, and create drastically different amounts of data. An autonomous platform must be able to combine the many disparate streams of information in order to leverage all of the available information while creating the most comprehensive model of its environment. In addition to multiple sensors, deep learning-based object detectors typically require swaths of labeled data to achieve good performance. Unfortunately, collecting multimodal, labeled data is exceedingly labor-intensive which necessitates a streamlined approach to data collection. The use of video game graphics engines in the production of images and video has emerged as a relatively cheap and effective way to create new datasets. This helps to close the data gap for computer vision tasks like object detection and segmentation. Another unique aspect of using gaming engines to generate data is the ability to introduce domain randomization which randomizes certain parameters of the game engine and generation scheme in order to improve generalization to real-life data. In this paper, we outline the creation of a multi-modal dataset using domain randomization. Our dataset will focus on the two most popular sensors in autonomous vehicles, LiDAR and RGB cameras. We will perform baseline testing of an object detector using a data-fusion deep learning architecture on both our synthetic dataset and the KITTI dataset for comparison.
Object detection for computer vision systems continues to be a complicated problem in real-world situations. For instance, autonomous vehicles need to operate with very small margins of error as they encounter safety-critical scenarios such as pedestrian and vehicle detection. The increased use of unmanned aerial vehicles (UAVs) by both government and private citizens has created a need for systems which can reliably detect UAVs in a large variety of conditions and environments. In order to achieve small margins of error, object detection systems, especially those reliant on deep learning methods, require large amounts of annotated data. The use of synthetic datasets provides a way to alleviate the need to collect annotated data. Unfortunately, the nature of synthetic dataset generation introduces a reality and simulation gap that hinders an object detector's ability to generalize on real world data. Domain randomization is a technique that generates a variety of different scenarios in a randomized fashion both to close the reality and simulation gap and to augment a hand-crafted dataset. In this paper, we combine the AirSim simulation environment with domain randomization to train a robust object detector. As a final step, we fine-tune our object detector on real-world data and compare it with object detectors trained solely on real-world data.
The creation of large labeled datasets for optical ow is often infeasible due to the difficulty associated with measuring dynamic objects in real scenes. Current datasets from real world scenes are often sparse in terms of ground truth. Generating synthetic datasets where ground truth can be easily obtained tends to be the easiest way to acquire the large labeled datasets required to achieve good performance. Often, the switch from synthetic to real world imagery leads to a drop-in performance. Recently with the development of differentiable image warping layers, unsupervised methods, which require no ground truth optical ow, can be applied to train a deep neural network (DNN) model for optical ow tasks, and this allows for training with un-labeled video. Brightness constancy assumption is the underlying principle that enables unsupervised learning of optical ow. Violations of the brightness constancy assumption of optical ow in particular at occlusions results in large outlier errors which are harmful to the learning process. The use of robust regression loss function and outlier prediction methods attempt to alleviate the problem of outliers. In this paper, we will conduct experiments to compare performance various unsupervised optical ow methods by exploring the performance of different robust cost functions, and outlier methods.
Current challenges in spectrum monitoring include radar emitter state identification and the ability to detect changes in radar activity. Recently, large labeled datasets and better compute power have led to improvements in the classification performance of deep neural network (DNN) models on structured data like time series and images. The reliance on large labeled dataset in order to achieve state of the art performance is a hindrance for machine learning applications especially in the area of radar, which tends to have a wealth of noisy and unlabeled data. Due to the abundance of unlabeled data, the problem of radar emitter and activity identification is commonly setup as a clustering problem, which requires no labels. The deep clustering approach uses an underlying deep feature extractor such as an autoencoder to learn a low dimensional feature representation in the service of facilitating a clustering task. In this paper, we will evaluate different clustering loss functions such as K-means for training DNNs, and we use radar emitter state and activity identification as our example task.
Much progress has been made in recent years in almost every research area within computer vision. This has led to an increased interest in applying computer vision algorithms to real-world problems, such as robot navigation, driver-less cars, and first-person video analysis. However, in each of these real-world applications, there are still significant challenges in processing degraded data, particularly when estimating motion from a single camera, which is commonly solved using optical flow. Previous studies have shown that state-of-the-art optical flow methods fail under realistic conditions of added noise, compression artifacts, and other types of degradations. In this paper we investigate strategies to improve the robustness of optical flow to these degradations by using the degradations and data augmentations in the training and fine-tuning stages of deep learning approaches to optical flow. We test these strategies using real and simulated data and attempt to illuminate this important area of research to the community.
In real-world video data, such as full-motion-video (FMV) taken from unmanned vehicles, surveillance systems,
and other sources, various corruptions to the raw data is inevitable. This can be due to the image acquisition
process, noise, distortion, and compression artifacts, among other sources of error. However, we desire methods
to analyze the quality of the video to determine whether the underlying content of the corrupted video can be
analyzed by humans or machines and to what extent. Previous approaches have shown that motion estimation,
or optical flow, can be an important cue in automating this video quality assessment. However, there are
many di↵erent optical flow algorithms in the literature, each with their own advantages and disadvantages. We
examine the e↵ect of the choice of optical flow algorithm (including baseline and state-of-the-art), on motionbased
automated video quality assessment algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.