PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 11843 including the Title Page, Copyright information, and Table of Contents.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We recently developed a deep learning method that can determine the critical peak stress of a material by looking at scanning electron microscope (SEM) images of the material’s crystals. However, it has been somewhat unclear what kind of image features the network is keying off of when it makes its prediction. It is common in computer vision to employ an explainable AI saliency map to tell one what parts of an image are important to the network’s decision. One can usually deduce the important features by looking at these salient locations. However, SEM images of crystals are more abstract to the human observer than natural image photographs. As a result, it is not easy to tell what features are important at the locations which are most salient. To solve this, we developed a method that helps us map features from important locations in SEM images to non-abstract textures that are easier to interpret.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work we demonstrate a method for leveraging high-fidelity, multi-physics simulations of high-speed impacts in a particular manufactured material to encode prior information regarding the impactor material's strength properties. Our simulations involve a material composed of stacked cylindrical ligaments impacted by a high-velocity aluminum plate. We show that deep neural networks of relatively simple architecture can be trained on the simulations to make highly-accurate inferences of the strength properties of the impactor material. We detail our neural architectures and the considerations that went into their design. In addition, we discuss the simplicity of our network architecture which lends itself to interpretability of learned features in radiographic observations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Synthetic image generation using deep learning techniques like generative adversarial networks (GANS), has drastically improved since its inception. There has been significant research using human face datasets like FFHQ, and city-semantic datasets for self-driving car applications. Utilizing latent space distributions, researchers have been able to generate or edit images to exhibit specific traits in the resultant images, like face-aging. However, there has been little GAN research and datasets in the structural infrastructure domain. We propose an inverse-GAN application to embed real structural bridge detail images and incrementally edit them using learned semantic boundaries. Corrosion/non-corrosion and various steel paint colors were among the learned semantic boundaries discovered using the Interface-GAN methodology. The novel dataset used was procured from extracting hundreds of thousands of images from the Virginia Department of Transportation (VDOT) bridge inspection reports and was trained using the styleGAN2 generator. The trained model offers the ability to forecast deterioration incrementally, which is valuable to inspectors, engineers, and owners because it gives a window into the future on how and where damage may progress. As bridge inspectors typically review bridges every two years, this forecast could reinforce decisions for action or in-action.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For integrated silicon photonics to mature as an industry platform, robust methods for measuring and extracting the geometry of fabricated waveguides are needed. Due to the cost and time needed for SEM or AFM imaging, a method of extracting waveguide variability though optical measurements is often preferred. Here, we present a study of regression-based machine learning (ML) techniques that enable such variability extraction while maintaining compatibility with wafer-scale optical measurements. We first explicitly investigate the issue of non-unique effective and group index pairs that can affect the accuracy of regression-based techniques. Training data is then generated by simulating several geometries of wire waveguides in Lumerical’s MODE solver to simulate defects due to process variances. Finally, a representative set of ML regression techniques are tested for their ability to accurately estimate the geometries of said simulated waveguides. To the best of the authors' knowledge, this work represents the first attempt in the literature to i.) explicitly study the effects of non-uniqueness in optical measurement-based metrology and ii) present a model that potentially overcomes said non-uniqueness. This work represents an important step towards the maturing models for process variations in silicon photonic platforms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work, we leverage deep learning to reproduce and expand Synthetic Aperture Radar (SAR) based deforestation detections generated using a probabilistic Bayesian model. Our Bayesian updating deforestation detections leverage SAR backscatter and InSAR coherence to perform change detection on forested areas and detect deforestation regardless of cloud cover. However, this model does not capture all deforestation events and is better suited to near-real time alerting than accurate forest loss acreage estimates. Here, we use SAR based probabilistic detections as deforested labels and Sentinel-2 optical composites as input features to train a neural network to differentiate deforested patches at various stages of regrowth from native forest. The deep learning model predictions demonstrate excellent recall of the original Bayesian label, and low precision due to providing better coverage of deforestation and detecting deforested patches not included in the imperfect Bayesian labels. These results provide an avenue to improve existing deforestation models, specifically with regards to their ability to quantify deforested acreage.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The process of automatically masking objects from complex backgrounds is extremely beneficial when trying to utilize those objects for computer vision research, such as object detection, autonomous driving, pedestrian tracking, etc. Therefore, a robust method of segmentation is imperative towards ongoing research between the Digital Imaging and Remote Sensing Laboratory at the Rochester Institute of Technology and the Savannah River National Laboratory directed at the volume estimation of condense water vapor plumes emanating from mechanical draft cooling towers. Instance segmentation was performed on a custom data set consisting of RGB imagery with the Matterport Mask R-CNN implementation,1 where condensed water vapor plumes were masked out from mixed backgrounds for the purpose of 3D reconstruction and volume estimation. This multi-class Mask R-CNN was trained to detect cooling tower structure and plumes with and without data augmentation to study the effects on a preliminary data set, in addition to a model trained with a single plume class. The average precision and intersection over union metrics across all models were shown to not be statistically different. While each model is capable of detecting and segmenting plumes in the preliminary data set, all models essentially perform the task with the same efficacy. This indicates some level of bias in the preliminary data set, demonstrating the need for more variance in the form of additional annotated imagery. The single plume class model tested within 7% for mAP, AP, and IoU when compared to the other two models, demonstrating the ability of Mask R-CNN to detect and segment these dynamically-changing plumes without any spatial dependence on the stationary cooling tower structure. This ongoing research includes a long-term data collection campaign where imagery of condensed water vapor plumes will be continuously gathered over an 18-month period so as to include imagery examples under many different meteorological and environmental conditions, seasonal variations, and illumination changes that will occur over an annual cycle. Including this data in future training of the Mask R-CNN implementation is expected to reduce any bias that may exist in the current data set.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aerial automatic detection, tracking, and classification of small, fast moving marine objects in Synthetic Aperture Radar (SAR) data is a real challenge. Particularly, for many marine security applications, it is imperative to have automatic capability for identification of anomalous patterns of behaviors of target marine vehicles from their ocean wakes. By applying analytics on patterns of cruising of adversarial marine targets certain anomalous behavioral patterns can be identified and perilous situations can be mitigated. For learning such complex ocean wake patterns, deep learning classifiers can be trained. However, they need rich training datasets that are rarely publicly available. To overcome this shortcoming, there is a need for a large-scale multi-look (i.e., different perspective views) dataset of multi-vehicle TOI’s from different stand-off ranges, and diverse operating and environmental contexts including atmospheric conditions. In this study, we used IRIS electromagnetic modeling and simulation (IRIS-EM) virtual environment system to systematically generate a large-scale synthetic SAR imagery dataset of the test marine scenarios. A typical marine scenario includes physics-based CAD models of test marine vehicle(s) and their associated wake(s) as well as the ocean layer. In this paper, we present our systematic approach for generating synthetic SAR Imagery of marine test scenarios and detail our methodology for annotating the generated imagery methodically. To evaluate and verify the effectiveness of this approach, we bench-marked our generated simulated marine SAR imagery with similar context images taken by the physical SAR imaging systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present VisHash to address the problem of retrieving copies of images, particularly drawings and diagrams used in technical documents. While these images convey important technical information, it is difficult to search for these images. Recent advances in computer vision using deep learning methods have significantly advanced our ability to analyze and retrieve natural images, yet most of these advances do not directly apply to drawings and diagrams due to the very different nature of low-level features of the images. We find that classic computer vision techniques on patch-based extraction of image features such as relative brightness work better on drawings and diagrams compared to frequency-based similarity-preserving perceptual hashes; yet existing relative-brightness signatures often fail to calculate any meaningful signature due to the sparsity of information in technical drawings. We take advantage of the effectiveness of the relative-brightness signature and extend the approach to develop VisHash, a visual similarity preserving signature that works well on technical diagrams. Importantly, we demonstrate the high level of precision of VisHash for image retrieval compared with competing image hashes of large sets of real drawings from patents and technical images from the web. VisHash is available as open-source code to incorporate into image search and indexing workflows.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Producing a better segmentation mask is crucial in scene understanding. Semantic Segmentation is a vital task for applications such as autonomous driving, robotics, medical image understanding. Efficient high and low-level context manipulation is a key for competent pixel-level classification. The image’s high-level feature map helps in the better spatial configuration of the objects for Segmentation, while the low-level features help to discern the boundaries of the objects in the segmentation map. In our implementation, We use a two bridged network. The first bridge manipulates the subtle differences between images and produces a vector to understand the low-level features in the input images. The second bridge produces global contextual aggregation from the image while gathering a better understanding of the image’s high-level features. The backbone network is the dialated residual network which helps to avoid the attrition of the size of the image during feature extraction. We train our network on the Cityscapes dataset and ADE20k dataset and compare our results with the State-of-the-Art models. The initial experiments have yielded an initial mean IoU of 70.1% and pixel accuracy of 94.4% on the cityscapes dataset and 34.6% on the ADE20K dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Objective quality metrics provide cost-efficient methods for quality evaluation, as they are practically algorithms, models, that avoid the necessity of subjective assessment, which is a precise but resource-consuming approach. Their ultimate measure of prediction accuracy fundamentally relies on the correlation between the estimated levels of quality and the actual subjective scores of perceived quality, rated by human individuals. Such metrics have already been developed for every single emerging technology where quality, in general, is relevant. This applies to stereoscopic 3D imaging as well, which is utilized in both industry, healthcare, education and entertainment. In this paper, we introduce an exhaustive analysis regarding the practical applications of objective quality metrics for stereoscopic 3D imaging. Our contribution addresses each and every state-of-the-art objective metric in the scientific literature, separately for image and video quality. The study differentiates the metrics by input requirements and supervision, and examines performance via statistical measures. Machine learning algorithms are particularly emphasized within the paper, such as the Deep Edge and COlor Signal INtegrity Evaluator (DECOSINE) using Segmented Stacked Auto-Encoder (S-SAE), different Convolutional Neural Network (CNN) frameworks, and transfer-learning-based methods like the Xception model, AlexNet, ResNet-18, ImageNet, Caffe, GoogLeNet, and also our very own transfer-learning-based methods. The paper focuses on the actual practical applications of the predictive models, and highlights relevant criteria, along with general feasibility, suitability and usability. The analysis of the investigated use cases also addresses potential future research questions and specifies the appropriate directives for quality-focused, user-centric development.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Gaussianization is a recently suggested approach for density estimation from data drawn from a decidedly non-Gaussian, and possibly high dimensional, distribution. The key idea is to learn a transformation that, when applied to the data, leads to an approximately Gaussian distribution. The density, for any given point in the original distribution, is then given by the determinant of the transformation's Jacobian at that point, multiplied by the (analytically known) density of the Gaussian for the transformed data. In this work, we investigate the use of distilled machine learning to provide a compact implementation of the Gaussianization transform (which in usual practice is obtained iteratively), thereby enabling faster computation, better controlled regularization, and more direct estimation of the Jacobian. While density estimation underlies many statistical analyses, our interest is in hyperspectral detection problems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Quantifying the predictive uncertainty of Neural Network (NN) models remains a dificult, unsolved problem especially since the ground truth is usually not available. In this work we evaluate many regression uncertainty estimation models and discuss their accuracy using training sets where the uncertainty is known exactly. We compare three regression models, a homoscedastic model, a heteroscedastic model, and a quantile model and show that: while all models can learn an accurate estimation of response, the accurate estimation of uncertainty is very difficult; the quantile model has the best performance in estimating uncertainty; model bias is confused with uncertainty and it is very difficult to disentangle the two when we have only one measurement per training point; improved accuracy of the estimated uncertainty is possible, but the experimental cost for learning uncertainty is very large since it requires multiple estimations of the response almost everywhere in the input space.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Parametric texture models have been applied successfully to synthesize artificial images. Psychophysical studies show that under defined conditions observers are unable to differentiate between model-generated and original natural textures. In industrial applications the reverse case is of interest: a texture analysis system should decide if human observers are able to discriminate between a reference and a novel texture. Here, we implemented a human-vision-inspired novelty detection approach. Assuming that the features used for texture synthesis are important for human texture perception, we compare psychophysical as well as learnt texture representations based on activations of a pretrained CNN in a novelty detection scenario. Based on a digital print inspection scenario we show that psychophysical texture representations are able to outperform CNN-encoded features.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work, we demonstrate that the high-accuracy computation of the continuous nonlinear spectrum can be performed by using artificial neural networks. We propose the artificial neural network (NN) architecture that can efficiently perform the nonlinear Fourier (NF) optical signal processing. The NN consists of sequential convolution layers and fully connected output layers. This NN predicts only one component of the continuous NF spectrum, such that two identical NNs have to be used to predict the real and imaginary parts of the reflection coefficient. To train the NN, we precomputed 94035 optical signals. 9403 signals were used for validation and excluded from training. The final value of the relative error for the entire validation dataset was less than 0.3%. Our findings highlight the fundamental possibility of using the NNs to analyze and process complex optical signals when the conventional algorithms can fail to deliver an acceptable result.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Face detection is one of the most important research topics in the field of computer vision, and it is also the premise and an essential part of face recognition. With the advent of deep learning-based techniques, the performance of face detection has been largely improved and more and more daily applications have been witnessed. However, face detection is greatly affected by environmental illumination. Most of existing face detection algorithms neglect harsh illumination conditions such as nighttime condition where lighting is insufficient or it is totally dark. These conditions are often encountered in real-world scenarios, e.g., nighttime surveillance in law enforcement or civil settings. How to overcome the problem of face detection in the darkness becomes a critical and urgent demand. We thus in this paper study face detection in the darkness using infrared (IR) imaging. We build an IR face detection dataset and design a deep learning-based model to study the face detection performance. Specifically, the deep learning model is a Single Stage Detector which has the advantage of fast speed and lower computation cost compared with other face detectors that consists of multiple stages. In the experiment, we also compare the performance of our deep learning model with that of a well-known traditional face detection algorithm, AdaBoost. In terms of True Positive Rate (TPR), our model significantly outperforms AdaBoost by 5% -- a dramatic boost from 87% to 92%, which suggests our deep learning-based method with IR imaging can indeed meet the requirement of real-world nighttime face detection applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recognition of individual identity using the periocular region (i.e., periocular recognition) has emerged as a relatively new modality of biometrics and is a potential substitute for face recognition when facial occlusion happens, e.g., when wearing a mask. Moreover, many application scenarios occur at nighttime, such as nighttime surveillance in law reinforcement. We therefore study the topic of periocular recognition at nighttime using the infrared spectrum. However, the useful and effective area for periocular recognition is quite limited compared to that of face recognition since only the eyes are exposed. As a result, the performance of periocular recognition algorithms is relatively low. This issue of limited area poses a serious challenge even though many state-of-the-art face recognition algorithms yield high performance. This situation is even more deteriorated when periocular recognition is performed at nighttime. Thus, we in this paper propose an image super-resolution (SR) based technique for nighttime periocular recognition in which we enlarge the small-sized periocular image to have a larger effective area while retaining a high image quality. Super-resolution of the periocular images is achieved by a CNN model which first conducts interpolation of the periocular area to an expected size and then finds a nonlinear mapping between the input low quality periocular image and the output high quality periocular image. To validate our method, we compare our deep learning-based SR method with the original case of none SR involved at all, as well as the other two cases using traditional SR methods, namely bilinear interpolation and bicubic interpolation. In terms of quality metrics such as PSNR and SSIM as well as recognition metrics such as GAR and EER, our method significantly outperforms all the other three methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We demonstrate through unsupervised machine learning that spatiotemporal latent representations optimized for sparse coding are able to support up-sampling of decimated frames, interpolation of missing frames, and extrapolation to future frames of computational fluid dynamics (CFD) simulations via linear regression. Specifically, we optimize an overcomplete bases of convolutional kernels for the sparse reconstruction of local x,y,z velocities, which are then inferred from decimated data streams and used to reconstruct the entire stream. The input data we use here is a series of time steps extracted as 2D slices from 3D CFD simulations. The simulation input is decimated by removing every-other pixel in every odd frame and by removing every even frame entirely, resulting in a 75% decimation that exposes the sparse coding model to only 25% of the original data. Reconstructions generated by sparse inference utilize features that capture higher-order structures using a Locally Competitive Algorithm to learn the corresponding physics. The quality of reconstructions against the original data can be quantified by the absolute difference between pixels via PSNR in which a higher score indicates a reconstruction that is more faithful to the original input. This study observes the up-sampled, interpolated, and extrapolated reconstructions compared against their inputs for both 4-frame, 6-frame, and 8-frame architecture. Our results suggest that sparse coding networks may offer a competitive method for up-sampling, interpolating and extrapolating from lower to higher resolution CFD simulations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We demonstrate and analyze a new methodology for detecting and characterizing shocks in high energy density physics (HEDP) experiments using simulated radiographs. Our method consists of simulating many variations of an HEDP experiment using a multi-physics modeling code developed at Los Alamos National Laboratory. These simulations are then used to produce synthetic radiographs that emulate the actual experimental data collection setup. Shock contours are defined by the peaks of the density derivative values obtained at each radial coordinate of an x-rayed cylindrical object, giving us a ground truth of shock position that would not be available directly from the observed radiograph. Convolutional neural networks are then trained on our simulations, mapping radiograph to shock structure. We investigate four different state-of-the-art deep convolutional neural networks, Xception, ResNet152, VGG19, and U-Net, for use in regressing the HEDP radiograph to the shock position. It is demonstrated that our neural network approach offers a highly accurate shock locator. We find that the different network architectures are better tuned for locating distinct shock characteristics, equivalent to detecting shockwaves at multiple scales. Differences are quantified by ranking the four architectures by their overall performance accuracy. The regression model based on the Xception architecture is found to yield the highest accuracy. In order to understand limitations of these techniques to external perturbations in the experimental setup we also apply our trained networks to shock location in the presence of Gaussian and Flatfield noise. We find that the network shock-locations are surprisingly robust to noise giving confidence that they will perform well on experimental data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multi-physics hydrodynamic direct numerical simulations (DNS) are often computationally intensive, requiring significant computational resources to complete. For simulations requiring thousands of processors, the probability of anomalies occurring during a simulation is not insignificant. Since these simulations often run for a long time without human validation, such undetected anomalies can be costly. We will present results of our application of ML based techniques for early anomaly detection to hydrodynamics simulations. By treating the intermediate output of hydrodynamic simulations as images or videos, we could borrow ML techniques from computer vision for the task of anomaly detection. We generated a training dataset using CLAMR, a cell-based adaptive mesh refinement application which implements the shallow water equations. Modifications were done to the application to obtain a wider range of experiments for our dataset. By varying the mesh resolution, domain size, and the initial state of the simulation, we generated a range of experiments who’s states can be learned using computer vision techniques. Additionally, those same experiments could be run with anomalies injected during the simulation so our models could be trained to differentiate between nominal and anomalous simulation states. We also present ML models using PetaVision, a neuromorphic computing simulation toolkit, as well as other ML frameworks, and demonstrate that they can predict the state of a simulation at a succeeding time step based on the state of the DNS given results from a number of preceding time steps. We will further compare the relative performance of these approaches to early anomaly detection and potential next steps to applying these techniques to more complex, multi-physics DNS applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper examines the utility of a self-directed feedback training method for machine learning models trained on synthetic data. This method aims to improve the speed of data generation and training by generating small batches of training data and observing the classification performance for each class. The classification accuracy is then used to adjust subsequent training classes and data generation limiting the total generation and training time while achieving optimal performance. Synthetic generation of images provides a viable approach to training machine learning models when real data is sparse. Synthetic data removes the intensive and error-prone manual process of human data labeling through automatic tagging. This is particularly valuable for re-identification tasks where unique objects need to be identified from multiple cameras with different orientations, lighting, or focal characteristics. We construct an artificial re-identification scene using 3D modeling software and generate images with a number of human avatar objects taken from different orientations, backgrounds, and lighting conditions. Automatic tagging and bounding generates re-identification metadata allowing unique avatars to be recognized by a metric learning neural network. As the network improves, the classes with lowest performance prompt the generator to supply additional images to improve the classifier accuracy. This allows the rendering engine to focus on the dominant error cases. This process will be compared against the rendering/training time and accuracy of the same system trained without self-directed feedback training.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The interaction of dispersion and nonlinear effects gives rise to a wide variety of pulse dynamics and proves to be a fundamental bottleneck for high speed communications. Traditionally, time consuming and computationally inefficient algorithms are used for this purpose1 and therefore, research in nonlinear optics and optical communications is now implementing machine learning based methods.2–5 We show a comprehensive comparison of different neural network (NN) architectures to learn the nonlinear Schrodinger equation (NLSE). We have used a NN based approach to reconstruct the pulse (temporal and spectral domain) at the transmitter from the pulse received through a highly nonlinear fiber (HNLF) without the prior knowledge of fiber parameters. Additionally, the trained network can also predict the dispersion and nonlinear parameters of an unknown fiber. The proposed NN also mitigates the need of using iterative reconstruction methods which are computationally expensive and slow. A detailed comparison of six different NN based techniques namely fully connected NN (FCNN), cascade NN (CaNN), convolutional NN (CNN), long short term memory networks (LSTM), bidirectional LSTM (BiLSTM) and gated recurrent unit (GRU) is presented. To our knowledge, the literature does not contain a detailed discussion of the NN architecture which is most suitable for learning the transfer function of the fiber. We perform a comprehensive study by including all popular NN architectures which enables the estimation of pulse profile for arbitrary pulse width, chirp, second and third order dispersion, nonlinearity and fiber length which can benefit nonlinear optics experiments and coherent optical communications. The growing popularity of NNs is resulting in increased design and development of hardware that is optimized for processing NN architectures. In light of this flexibility and optimised hardware, popularity of NN in optics is set to increase.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Smart assistant usage has increased significantly with the AI boom and growth of IoT. Speech as an input modality brings a level of personalization to the various smart voice assistant products and applications; however, many smart assistants underperform when tasked with interpreting atypical speech input. Dysarthria, heavy accents, and deaf and hard-of-hearing speech characteristics prove difficult for smart assistants to interpret despite the large amounts of diverse data used to train automatic speech recognition models. In this study, we explore the Transformer architecture for use as an automatic speech recognition model for speech with medium to low intelligibility scores. We utilize the Transformer model pre-trained on the Librispeech dataset and fine-tuned on the Torgo dataset of atypical speech, as well as a subset of the University of Memphis Speech Perception Assessment Laboratory’s (UMemphis SPAL) Deaf speech dataset. We also develop a methodology for performing automatic speech recognition using a Node.JS application running on a Raspberry Pi 4 to function as a pipeline between the user and a Google Home smart assistant device. The highest performing Transformer model shows a 20.2% character error rate with a corresponding 29.0% word error rate on a subset of medium intelligibility audio samples from the UMemphis SPAL dataset. This study highlights the importance for a large, transcribed dataset, fueling a large atypical-speech data gathering effort through a newly developed web application, My-Voice.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visual search and similarity can aid an e-commerce platform by providing appropriate recommendations where semantic labeling and associated metadata does not always exist. In this work, we detail the specifics of our system that powers visually similar recommendations. While a common approach leverages learned representation from common classification tasks using DNN’s, the crux of the problem are the labels and ontology that is applied in order for the DNN to functionally perform. Our proposed approach in production for a variety of products is to supply these recommendations based on a defined taxonomy through a hierarchy that has been carefully curated while additionally scaled up through our platform's natural crowd-sourcing interface. To scale the use of these taxonomies in production, we quantization schemes for retrieving approximate nearest neighbors after applying base transformations on the images using Apache Beam and Tensorflow Transform. The nearest neighbor retrievals are based on using a ResNet model architecture, trained from scratch on 3000+ classes. These are trained daily in a distributed fashion and optimizing data throughput. Finally, in order to verify the appropriateness, we use an extensive human evaluation pipeline and quality control. In this work, we share our product design learnings from the various attempts/experiments we conducted for a successful launch.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The performance of recommendation systems is highly dependent on candidate matching techniques for scoping users’ information needs. Existing candidate matching methods are based on text embedding and collaborative filtering, which base similarity primarily on semantics or co-occurrences of listings. Unfortunately, they do not leverage valuable user behavior to include recommendation impressions, clicks, or the sequences leading up to a purchase. This rich information reflects accurate user preferences, which have been widely used to enhance the ranking stage of a recommendation system. Yet integrating contextual info into the matching stage is challenging because the feature dimensionality and sparsity will be increased as well. Recently, graph representation learning (GRL) has gained much success in industrial applications like item-to-item recommendation systems. GRL learns a mapping of embedding nodes (with edges) into a low dimensional space by representing users’ behaviours into an activity graph. The goal is to optimize this mapping function so that the learnt geometric relationships can reflect the structural information of the original graph. The trained embeddings can be used as features for downstream applications like nearest neighbor search and ranking problems. Our work focuses on a GRL framework to enhance the performance of the candidate generation. Such an approach inevitably faces the cold start problem. For example, listings with fewer or none user interactions can not be learned effectively. To deal with it, side information like shop, category, price, etc. are integrated into the listing embedding by learning an integrated multi-view embedding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Every year, 60,000 lives are lost worldwide from disasters.1 Building collapse during earthquakes account for the majority of these deaths.2 Unreinforced masonry (URM) buildings are particularly vulnerable during seismic events due to the brittle nature of the construction material. Many communities have undertaken costly and timely mitigation programs to locate and retrofit or replace them before disaster strikes. An automated approach for identifying seismically vulnerable buildings using street level imagery has been met with limited success to this point with no promising results presented in literature. We achieved the best overall accuracy reported to date, at 83.6%, in identifying unfinished URM, finished URM, and non-URM buildings. Moreover, an accuracy of 98.8% was achieved for identifying both suspected URMs (finished or unfinished URM). We perform extensive empirical analysis to establish synergistic parameters on our deep neural network, namely ResNeXt-101-FixRes. Lastly, we present a visualization for the layers in the network to ascertain and demonstrate how the deep neural network can distinguish between material and geometric features to predict the type of URM building.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents the algorithms to detect trace chemicals using a multi-wavelength camera. Multispectral images of the chemical and the background were collected using the Ocean Thin Films SpectroCam. The camera has an integrated motor with 8 filter color wheels and 8 interchangeable custom band pass filters in the spectral range of 200–900 nm. Since chemicals have their unique spectral reflectance, the stack of 8-dimensional image data was obtained and subsequently analyzed to develop algorithms that can uniquely identify the area where a chemical is present. In this study, we primarily used RDX, 1,3,5-Trinitroperhydro-1,3,5-triazine, the explosive component in C4. The aim of this study was to investigate the potential of the multispectral imaging system and the accuracy of the model in determining C4 chemical.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Video classification is a crucial aspect when we discuss human-machine interface as it helps to analyze various activities. Using transfer learning techniques can help us in making predictions accurately. The dataset used for research is a subdivision of the UCF101-Action Recognition Dataset, consisting of 10 classes in total, where each class contains more than 120 videos. Each video is converted into a series of frames at a frame rate of 5. Feature extraction is performed on these frames using InceptionV3. The fine-tuned model architecture is composed of 4 dense layers. These layers are built using “relu” activation function with 1024, 512, 256 and 128 neurons respectively and another dense layer is built using “softmax” activation function with 10 neurons so as to predict 10 classes. This technique finds a huge range of applications related to human-machine interface such as helping the visually challenged people in classifying various activities.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The pupil response to light is usually used for inspection of a patient's activity. When medical personnel performs light reflection examinations, it depends on the experience of medical staff. Even for the same person, they may induce variant measurement results in different situations. This paper proposes an algorithm that can calculate pupil size based on Convolution Neural Network (CNN). The study aims to measure pupil size on a mobile device convenient for medical staff to measure real-time. Still, the shape of the pupil is not round, and 50% of pupils can be calculated using ellipses for the best fitting. Therefore, we use the major and minor axis of an ellipse to represent the size of pupils and use the two parameters as the output of the network. The compares the mean error with changing the depth of the network and the field of view (FOV) of the convolution filter. Finally, the mean error of the pupil length is 7.63%. The result shows that both deepening the network and widening the FOV of the convolution filter can reduce the mean error. In the operation speed, we use mobile device systems at 36 frames per second to use mobile device systems for pupil size prediction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Detecting exoplanets and measuring their masses is a priority for astrophysics. Stellar astrometry is capable of detecting and unequivocally measure exoplanet masses, solving the system inclination ambiguity that constrains the accuracy of radial velocity techniques. The astrometric signal of an Earth-like planet around nearby stars is around ∼ 1 micro-arcsecond (μas), while current instrumentation (Hubble, GAIA) reaches 25 μas at best. After photon noise, the main limiting factor to measure such astrometric signals is related to the optical distortions that arise from small deformations of the optical system. A novel technique, called diffractive pupil, allows to obtain distortion-calibrated astrometry vectors from the image. Currently, analytical methods are being used to extract the distortion map from the diffractive features. However, different sources of noise limit the accuracy of the algorithm. In this paper, we test the ability of machine learning to detect telescope pointing errors and astrometry signals injected in simulated images, as a first step towards distortion calibration correction using machine learning. Our image simulator generates a star field modified by telescope pointing errors, optical distortion, and common noise sources such as photon noise, flat field, read out, and dark current. We evaluate the performance of legacy analytical algorithms for astrometry and compare it with the results of machine learning runs. We find that this type of algorithms show better results than the analytical approach by two orders of magnitude when detecting astrometric signals in images with telescope pointing error perturbations, achieving a Mean Absolute Error (MAE) of ∼ 1.7 × 10−6 px for the predicted target star translations in comparison with the ∼ 5 × 10−4 px MAE obtained by the traditional approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Research from WHO shows in every 1.44 minutes a teenager is killed in road traffic crashes around the world. Our paper suggests the use of Deep Learning Algorithms in preventing such casualties through hardware and software implementation of the device in motor vehicles, also to overcome the potential limitation of Face Recognition to go that extra mile. In this paper we suggest using the integration of 4 models namely, Face Detection, Passive Liveness Detection (PLD), Face Recognition, Eye Detection. The Face Detection model consists of 2 versions which use Haar Cascades and Histogram of Oriented Gradients (HOG). PLD model is used for Presentation Attack Detection. The face recognition model is trained using the shape predictor 68 landmarks which are unique for each face. Using these landmarks, eye detection is also performed and the monitoring of sleepiness is carried out with the overall results of 98% accuracy with the face detection model, 94% accuracy with the face recognition model (limited to n=10 faces per model) with an Eye Aspect Ratio threshold of 0.3. Along with that, different current attack scenarios/ limitations of Facial Recognition that will be faced with these devices are described. Based on these scenarios, some of the preventive methods are elaborated to make the purpose of the device to its fullest performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Foveal Avascular Zone (FAZ) is of clinical importance since the retinal vascular arrangement around the fovea changes with retinal vascular diseases and in high myopic eyes. Therefore, it is important to segment and quantify the FAZs accurately. Using a novel location-aware deep learning method the FAZ boundary was segmented in en-face optical coherence tomography angiography (OCTA) images. The FAZ dimensions were compared the parameters determined using four methods: (1) device in-built software (Cirrus 5000 Angioplex), (2) manual segmentation using Image J software by an experienced clinician, and (4) the new method (new location-aware deep-learning method). The parameters were measured from OCTA data from healthy subjects (n=34) and myopic patients (n=66). For this purpose, FAZ location was manually delineated in en-face OCTA images of dimensions 420x420 pixels corresponding to 6mm x 6mm. A modified UNet segmentation with an additional channel from a Gaussian distribution around the likely location of the FAZ was designed and trained using 100 manually segmented OCTA images. The predicted FAZ and the related parameters were then obtained using a test dataset consisting of 100 images. For analysis, two strategies were applied. The segmentation of FAZ was compared using the Dice coefficient and Structural Similarity Index (SSIM) to determine the effectiveness of the proposed deep learning method when compared to the other three methods. Furthermore, to provide deeper insight, a set of FAZ dimensions namely area, perimeter, circularity index, eccentricity, perimeter, major axis, minor axis, inner circle radius, circumcircle radius, the maximum and minimum boundary dimensions, and orientation of major axis were compared between the 3 methods. Finally, vessel-related parameters including tortuosity, vessel diameter index (VDI) and vessel avascular density (VAD) were calculated and compared. The high myopic eyes exhibited a narrowing the FAZ area and perimeter. The currently developed algorithm does not correct for axial length variations. This analysis should be extended with a larger number of images in each group of myopia as well as correcting for axial length variations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Pupillary Light Reflex (PLR) refers to the change in pupil size due to changes in illumination. The PLR is used by clinicians for the non-invasive assessment of the pupillary pathway. Typically, Infrared (IR) illumination based pupillometers are used to measure the PLR. Researchers have explored the problem of robust pupil detection and reconstruction with algorithms based on traditional computer vision techniques. These techniques do not generalize well when tested with visible light (VL) images. The current study presents a novel approach to pupillometry that uses deeplearning (DL) methodology which is applied to VL images. We used public iris datasets (e.g., UBIRISv2) and data augmentation techniques to train our models for robustness. Noise in the images can be due to different lighting conditions, iris colors, pupil shapes, etc. Ellipses were fit to the pupil images and the parameters were extracted. We evaluated a UNet model and its quantized version. A. non-deep learning model (PuRe) was also evaluated. This study also reports the accuracy of these models with real-world experimental data. This work is the first step toward a VL smartphone-based pupillometer that is fast, accurate, and relies on on-device computing. Such a device can be useful in areas where internet access is unavailable and, more importantly, can be used in the field by paramedics for telemedicine purposes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recent studies in the field of adversarial machine learning have highlighted the poor robustness of convolutional neural networks (CNNs) to small, carefully crafted variations of the inputs. Previous work in this area has largely been focused on very small image perturbations and how these completely throw off the classifier output and cause CNNs to make high-confidence misclassifications while leaving the image visually unchanged for a human observer. These attacks modify individual pixels of each image and are unlikely to exist in a natural environment. More recent work has demonstrated that CNNs are also vulnerable to simple transformations of the input image, such as rotations and translations. These ‘natural’ transformations are much more likely to occur, either accidentally or intentionally, in a real-world scenario. In fact, humans experience and successfully recognize countless objects under these types of transformations every day. In this paper, we study the effect of these transformations on CNN accuracy when classifying 3D face-like objects (Greebles). Furthermore, we visualize the learned feature representations by CNNs and analyze how robust these learned representations are and how they compare to the human visual system. This work serves as a basis for future research into understanding the differences between CNN and human object recognition, particularly in the context of adversarial examples.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Siamese deep-network trackers have received significant attention in recent years due to their real-time speed and state-of-the-art performance. However, Siamese trackers suffer from similar looking confusers, that are prevalent in aerial imagery and create challenging conditions due to prolonged occlusions where the tracker object re-appears under different pose and illumination. Our work proposes SiamReID, a novel re-identification framework for Siamese trackers, that incorporates confuser rejection during prolonged occlusions and is wellsuited for aerial tracking. The re-identification feature is trained using both triplet loss and a class balanced loss. Our approach achieves state-of-the-art performance in the UAVDT single object tracking benchmark.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have designed and built a machine vision system for the inspection and monitoring at the production line of fiber interlock armor cables. Fiber interlock armor cable is a highly reflective continuous cylindrical product, making the inspection task optically challenging. Our economical solution for the design approach has been to utilize a vision system based on a single camera and a tunnel source of uniform illumination in conjunction with two flat mirrors to obtain a 360-degree view of the product. The resolution of imaging system was determined based on the smallest features sizes expected on the cables, such that it allows the detection of defects and imperfections as well as geometrical measurements on the order of tens of microns. The measurement and imperfection methods utilize a deep learning algorithm to intelligently detect manufacturing defects in the cable in-line with the production. Our optical system can detect imperfections in real time during the manufacturing process and alert operators while marking the defective region on the cable, which reduces wasted product and ultimately cost on the production line. Our vision system is able to inspect a variety of interlock armor cables with different sizes and shapes, making it uniquely versatile. Our deep-learning system is 78.9% accurate with an initial training size of 10,000 samples. Our machine vision solution is highly replicable, maximizing the use of off-the-shelf parts for ease of replication to serve and operate on multiple manufacturing lines.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The present work deals with the implementation of machine learning algorithms for the analysis of the coupling efficiency of tapers for silicon photonics applications operating in the C band. The analyzed tapers are used for coupling a continuous waveguide with a periodical subwavelength waveguide and they are composed by several segments with variable lengths. The training, testing, and validating data sets have been numerically obtained by an efficient frequency domain finite element method which solves the wave equation and determines the spatial distribution of the electromagnetic fields and the coupling efficiency for each taper configuration. An excellent agreement has been observed for the coupling efficiency calculation using the machine learning algorithms when compared with the one obtained by using the finite element method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.