Reflectance Confocal Microscopy (RCM) is a non-invasive imaging technique used in biomedical research and clinical dermatology. It provides virtual high-resolution images of the skin and superficial tissues, reducing the need for physical biopsies. RCM employs a laser light source to illuminate the tissue, capturing the reflected light to generate detailed images of microscopic structures at various depths. Recent studies explored AI and machine learning, particularly CNNs, for analyzing RCM images. Our study proposes a segmentation strategy based on textural features to identify clinically significant regions, empowering dermatologists in effective image interpretation and boosting diagnostic confidence. This approach promises to advance dermatological diagnosis and treatment.
Whole slide imaging (WSI), also called digital virtual microscopy, is a new imaging modality. It allows for the application of AI and machine learning methods to cancer pathology to help establish a means for the automatic diagnosis of cancer cases. However, designing machine-learning models for WSI is computationally challenging due to its required ultra-high resolution. The current state-of-the-art models use multiple instance learning (MIL). MIL is a weakly-supervised learning method in which the model uses an array of inferences from many smaller instances to make a final classification about the entire set. In the context of WSI, researchers divide the ultra-high-resolution image into many patches. The model then classifies the slide based on an array of inferences from the patches. Among several ways of making the final classification, attention-based mechanisms have resulted in superb accuracy scores. The Transformer, one attention-based algorithm, has reported substantial improvements for WSI comprehension tasks. In this project, we studied and compared several WSI comprehension algorithms. We used the following three datasets: CAMELYON16+17, TCGALung, and TCGA-Kidney. We found that attention-based MIL algorithms performed better than standard MIL algorithms for classifying WSI images, achieving a higher mean accuracy and AUC. However, none of the attention-based algorithms performed significantly better than the others, reporting accuracy scores that varied widely. Presumably, it is due to the limited availability of training samples in the data corpus. Since it is not easy to increase the samples from human subjects, some machine learning techniques like transfer learning could help mitigate this issue.
Cancer is the leading cause of death by disease in American children. Each year, nearly 16,000 children in the United States and over 300,000 children globally are diagnosed with cancer. Leukemia is a form of blood cancer that originates in the bone marrow and accounts for one-third of pediatric cancers. This disease occurs when the bone marrow contains 20% or more immature white blood cell blasts. Acute lymphoblastic leukemia is the most prevalent leukemia type found in children, with half of all annual cases in the U.S. diagnosed for subjects under 20 years of age. To diagnose acute lymphoblastic leukemia, pathologists often conduct a morphological bone marrow assessment. This assessment determines whether the immature white blood cell blasts in bone marrow display the correct morphological characteristics, such as size and appearance of nuclei. Pathologists also use immunophenotyping via multi-channel flow cytometry to test whether certain antigens are present on the surface of blast cells; the antigens are used to identify the cell lineage of acute lymphoblastic leukemia. These manual processes require well-trained personnel and medical professionals, thus being costly in time and expenses. Computerized decision support via machine learning can accelerate the diagnosis process and reduce the cost. Training a reliable classification model to distinguish between mature and immature white blood cells is essential to the decision support system. Here, we adopted the Vision Transformer model to classify white blood cells. The Vision Transformer achieved superb classification performance compared to state-of-the-art convolutional neural networks while requiring less computational resources for training. Additionally, the latent self-attention architecture provided attention maps for a given image, providing clues as to which portion(s) of the image were significant in decision-making. We applied the Vision Transformer model and a convolutional neural network model to an acute lymphoblastic leukemia classification dataset of 12,528 samples and achieved accuracies of 88.4% and 86.2%.
Detecting suspicious lesions in medical imaging is the important first step in computer-aided detection (CAD) systems. However, detecting abnormalities in breast tissue is difficult due to the lesion's varying size, shape, margin, and contrast with the background tissue. We focused on mass segmentation, a method that provides notable morphological features by outlining contours of masses. Accurate segmentation is crucial for correct diagnosis. Recent advancements in deep learning have improved object detection and segmentation, and these techniques are also being applied to medical imaging studies. We focused on U-net, which is a recently developed mass segmentation algorithm based on a fully convolutional network. The U-net architecture consists of (1) a contracting path to increase the resolution of the output and (2) a symmetric expanding path to better locate the region of interest. The performance of a U-net model was tested with 63 digital mammograms from INbreast, a publicly available database. We trained the model with images resized to 40x40 pixels and conducted 10-fold cross-validation to prevent overfitting. The model's performance with respect to breast density and the lesion's BI-RADS rating was also investigated. Dice coefficients (DC) were used as a performance measure to compare the predicted segmentation of the model with the ground truth. Logistic regression and an analysis of variance were performed to determine the significance of the DCs with regards to breast density and lesion behavior and to calculate the 95% confidence interval. The average DC was 0.80. The difference between DCs for BI-RADS 2 and 4c and for BI-RADS 2 and 5 were significant, suggesting that the model has more difficulty in segmenting benign lesions.
Medical imaging devices, such as X-ray machines, inherently produce images that suffer from visual noise. Our objectives were to (i.) determine the effect of image denoising on a medical image classification task, and (ii.) determine if there exists a correlation between image denoising performance and medical image classification performance. We performed the medical image classification task on chest X-rays using the DenseNet-121 convolutional neural network (CNN) and used the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) metrics as the image denoising performance measures. We first found that different denoising methods can make a statistically significant difference in classification performance for select labels. We also found that denoising methods affect fine-tuned models more than randomly-initialized models and that fine-tuned models have significantly higher and more uniform performance than randomly-initialized models. Lastly, we found that there is no significant correlation between PSNR and SSIM values and classification performance for our task.
Deep convolutional neural networks (CNNs) have in recent years achieved record-breaking performance on many image classification tasks and are therefore well-suited for computer aided detection (CAD). The need for uncertainty quantification for CAD motivates the need for a probabilistic framework for deep learning. The most well-known probabilistic neural network model is the Bayesian neural network (BNN), but BNNs are notoriously difficult to sample for large complex network architectures, and as such their use is restricted to small problems. It is known that the limit of BNNs as their widths increase toward infinity is a Gaussian process (GP), and there has been considerable research interest in these infinitely wide BNNs. Recently, this classic result has been extended to deep architectures in what is termed the neural network Gaussian process (NNGP) model. In this work, we implement an NNGP model and apply it to the ChestXRay14 dataset at the full resolution of 1024x1024 pixels. Even without any convolutional aspects to the network architecture and without any data augmentation, our five layer deep NNGP model outperforms other non-convolutional models and therefore helps to narrow the performance gap between non-convolutional and convolutional models. Our NNGP model is fully Bayesian and therefore offers uncertainty information through its predictive variance that can be used to formulate a predictive confidence measure. We show that the performance of the NNGP model is significantly boosted after low-confidence predictions are rejected, suggesting that convolution is most beneficial only for these low-confidence examples. Finally, our results indicate that an extremely large fully-connected neural network with appropriate regularization could perform as well as the NNGP if not for the computational bottleneck resulting from the large number of model parameters.
Integration of heterogeneous data from different modalities such as genomics and radiomics is a growing area of research expected to generate better prediction of clinical outcomes in comparison with single modality approaches. To date radiogenomics studies have focused primarily on investigating correlations between genomic and radiomic features, or selection of salient features to determine clinical tumor phenotype. In this study, we designed deep neural networks (DNN), which combine both radiomic and genomic features to predict pathological stage and molecular receptor status of invasive breast cancer patients. Utilizing imaging data from The Cancer Imaging Archive (TCIA) and gene expression data from The Cancer Genome Atlas (TCGA), we evaluated the predictive power of Convolutional Neural Networks (CNN). Overall, results suggest superior performance on CNNs leveraging radiogenomics in comparison with CNNs trained on single modality data sources.
Prior research has shown that physicians’ medical decisions can be influenced by sequential context, particularly in cases where successive stimuli exhibit similar characteristics when analyzing medical images. This type of systematic error is known to psychophysicists as sequential context effect as it indicates that judgments are influenced by features of and decisions about the preceding case in the sequence of examined cases, rather than being based solely on the peculiarities unique to the present case. We determine if radiologists experience some form of context bias, using screening mammography as the use case. To this end, we explore correlations between previous perceptual behavior and diagnostic decisions and current decisions. We hypothesize that a radiologist’s visual search pattern and diagnostic decisions in previous cases are predictive of the radiologist’s current diagnostic decisions. To test our hypothesis, we tasked 10 radiologists of varied experience to conduct blind reviews of 100 four-view screening mammograms. Eye-tracking data and diagnostic decisions were collected from each radiologist under conditions mimicking clinical practice. Perceptual behavior was quantified using the fractal dimension of gaze scanpath, which was computed using the Minkowski–Bouligand box-counting method. To test the effect of previous behavior and decisions, we conducted a multifactor fixed-effects ANOVA. Further, to examine the predictive value of previous perceptual behavior and decisions, we trained and evaluated a predictive model for radiologists’ current diagnostic decisions. ANOVA tests showed that previous visual behavior, characterized by fractal analysis, previous diagnostic decisions, and image characteristics of previous cases are significant predictors of current diagnostic decisions. Additionally, predictive modeling of diagnostic decisions showed an overall improvement in prediction error when the model is trained on additional information about previous perceptual behavior and diagnostic decisions.
Our objective is to improve understanding of visuo-cognitive behavior in screening mammography under clinically equivalent experimental conditions. To this end, we examined pupillometric data, acquired using a head-mounted eye-tracking device, from 10 image readers (three breast-imaging radiologists and seven Radiology residents), and their corresponding diagnostic decisions for 100 screening mammograms. The corpus of mammograms comprised cases of varied pathology and breast parenchymal density. We investigated the relationship between pupillometric fluctuations, experienced by an image reader during mammographic screening, indicative of changes in mental workload, the pathological characteristics of a mammographic case, and the image readers’ diagnostic decision and overall task performance. To answer these questions, we extract features from pupillometric data, and additionally applied time series shapelet analysis to extract discriminative patterns in changes in pupil dilation. Our results show that pupillometric measures are adequate predictors of mammographic case pathology, and image readers’ diagnostic decision and performance with an average accuracy of 80%.
Several researchers have investigated radiologists’ visual scanning patterns with respect to features such as total time examining a case, time to initially hit true lesions, number of hits, etc. The purpose of this study was to examine the complexity of the radiologists’ visual scanning pattern when viewing 4-view mammographic cases, as they typically do in clinical practice. Gaze data were collected from 10 readers (3 breast imaging experts and 7 radiology residents) while reviewing 100 screening mammograms (24 normal, 26 benign, 50 malignant). The radiologists’ scanpaths across the 4 mammographic views were mapped to a single 2-D image plane. Then, fractal analysis was applied on the composite 4- view scanpaths. For each case, the complexity of each radiologist’s scanpath was measured using fractal dimension estimated with the box counting method. The association between the fractal dimension of the radiologists’ visual scanpath, case pathology, case density, and radiologist experience was evaluated using fixed effects ANOVA. ANOVA showed that the complexity of the radiologists’ visual search pattern in screening mammography is dependent on case specific attributes (breast parenchyma density and case pathology) as well as on reader attributes, namely experience level. Visual scanning patterns are significantly different for benign and malignant cases than for normal cases. There is also substantial inter-observer variability which cannot be explained only by experience level.
Previously, we have shown the potential of using an individual’s visual search pattern as a possible biometric. That study focused on viewing images displaying dot-patterns with different spatial relationships to determine which pattern can be more effective in establishing the identity of an individual. In this follow-up study we investigated the temporal stability of this biometric. We performed an experiment with 16 individuals asked to search for a predetermined feature of a random-dot pattern as we tracked their eye movements. Each participant completed four testing sessions consisting of two dot patterns repeated twice. One dot pattern displayed concentric circles shifted to the left or right side of the screen overlaid with visual noise, and participants were asked which side the circles were centered on. The second dot-pattern displayed a number of circles (between 0 and 4) scattered on the screen overlaid with visual noise, and participants were asked how many circles they could identify. Each session contained 5 untracked tutorial questions and 50 tracked test questions (200 total tracked questions per participant). To create each participant’s "fingerprint", we constructed a Hidden Markov Model (HMM) from the gaze data representing the underlying visual search and cognitive process. The accuracy of the derived HMM models was evaluated using cross-validation for various time-dependent train-test conditions. Subject identification accuracy ranged from 17.6% to 41.8% for all conditions, which is significantly higher than random guessing (1/16 = 6.25%). The results suggest that visual search pattern is a promising, temporally stable personalized fingerprint of perceptual organization.
Two people may analyze a visual scene in two completely different ways. Our study sought to determine whether human gaze may be used to establish the identity of an individual. To accomplish this objective we investigated the gaze pattern of twelve individuals viewing still images with different spatial relationships. Specifically, we created 5 visual “dotpattern” tests to be shown on a standard computer monitor. These tests challenged the viewer’s capacity to distinguish proximity, alignment, and perceptual organization. Each test included 50 images of varying difficulty (total of 250 images). Eye-tracking data were collected from each individual while taking the tests. The eye-tracking data were converted into gaze velocities and analyzed with Hidden Markov Models to develop personalized gaze profiles. Using leave-one-out cross-validation, we observed that these personalized profiles could differentiate among the 12 users with classification accuracy ranging between 53% and 76%, depending on the test. This was statistically significantly better than random guessing (i.e., 8.3% or 1 out of 12). Classification accuracy was higher for the tests where the users’ average gaze velocity per case was lower. The study findings support the feasibility of using gaze as a biometric or personalized biomarker. These findings could have implications in Radiology training and the development of personalized e-learning environments.
Search involves detecting the locations of potential lesions. Classification involves determining if a detected region is a
true lesion. The most commonly used measure of observer performance, namely the area A under the ROC curve, is
affected by both search and classification performances. The aim was to demonstrate a method for separating these
contributions and apply it to several clinical datasets. Search performance S was defined as the square root of 2 times the
perpendicular distance of the end-point of the search-model predicted ROC from the chance diagonal. Classification
performance C was defined as the separation of the unit-variance binormal distributions for signal and noise sites.
Eleven (11) datasets were fitted by the search model and search, classification and trapezoidal A were computed for each
modality and reader combination. Kendall-tau correlations were calculated between the resulting S, C and A pairs.
Kendall correlation (S vs. C) was smaller than zero for all datasets, and the average Kendall correlation was significantly
smaller than 0 (average = -0.401, P = 8.3 x 10-6). Also, Kendall correlation (A vs. S) was larger than zero for 9 out of 11
datasets and the average Kendall correlation was significantly larger than 0 (average = 0.295, P = 2.9 x 10-3). On the
other hand average Kendall correlation (A vs. C) was not significantly different from zero (average = 0.102, P = 0.25).
The results suggest that radiologists may learn to compensate for poor search performance with better classification
performance. This study also indicates that efforts at improving net performance, which currently focus almost
exclusively on improving classification performance, may be more successful if aimed at improving search performance.
Jackknife alternative free-response receiver operating characteristic (JAFROC) is a method for measuring human
observer performance in localization tasks. JAFROC is being increasingly used to evaluate imaging modalities because
it has been shown to have greater statistical power than conventional receiver operating characteristic (ROC) analysis,
which neglects location information. JAFROC neglects the non-lesion localization marks ("false positives") on abnormal
images. JAFROC1 is an alternative method that includes these marks. Both methods are lesion-centric in the sense that
they assign equal importance to all lesions; an image with many lesions would tend to dominate the performance metric,
and clinically less significant lesions are treated identically as more significant ones. In this paper weighted JAFROC
and JAFROC1 analyses are described that treat each abnormal image (not each lesion) as a unit of measurement and
account for different lesion clinical significances (weights). Lesion-centric and weighted methods were tested using a
simulator that includes multiple-reader multiple-case multiple-modality location level correlations. For comparison,
ROC analysis was also tested where the rating of the highest rated mark on an image was assumed to be its "ROC"
rating. The testing involved random numbers of lesions per image, random weights, case-mixes (ratio of normal to
abnormal images) and different correlation structures. We found that for either JAFROC or JAFROC1, both lesion-centric
and weighted analyses had correct NH behavior and comparable statistical powers. For either lesion-centric or
weighted analyses JAFROC1 yielded the highest power, followed by JAFROC and ROC yielded the least power,
confirming a recent study using a less flexible single-reader dual-modality simulator. Provided the number of normal
cases is not too small, JAFROC1 is the preferred method for analyzing human observer free-response data. For either
JAFROC or JAFROC1 weighted analysis is preferable.
The directional wavelet used in image processing has orientation selectivity and can provide a sparse representation of
edges in natural images. Multiwavelets offer the possibility of better performance in image processing applications as
compared to the scalar wavelet. Applying directionality to multiwavelets may thus gain both advantages. This paper
proposes a scheme, named multiridgelets, which is an extension of ridgelets. We consider the application of the
balanced multiwavelet transform to the Radon transform of an image. Specifically, we consider its use in the image
texture analysis. The regular polar angle method is employed to realize the discrete transform. Three statistical features:
standard deviation, median, and entropy are computed based on multiridgelet coefficients. Comparative study was made
with the results obtained using 2D wavelets, scalar ridgelets, and curvelets. Classification of the mura defects of the LCD
screen is tested to quantify performance of the proposed texture analysis methods. 240 normal images and 240 simulated
defected images are supplied to train the support vector machine classifier and another 40 normal and 40 defected
images for testing. It concludes that multiridgelets were comparable to or better than curvelets and gave significant
performance than 2D wavelets and scalar ridgelets.
We examined the statistical powers of three methods for analyzing FROC mark-rating data, namely ROC, JAFROC and
IDCA. Two classes of observers were simulated: a designer-level CAD algorithm and a human observer. A search-model
based simulator was used with the average numbers of false positives per image ranging from 0.21 for the human
observer to 10 for CAD. Model parameters were chosen to yield 80% and 85% areas under the predicted ROC curves
for both classes of observers and inter-image and inter-modality correlations of 0.1, 0.5 and 0.9 were investigated. The
area under the FROC curve up to abscissa α (ranging from 0.18 to 6.7) was used as the IDCA figure-of-merit; the other
methods used their well-known figures of merit. For IDCA power increased with α so it should be chosen as large as
possible consistent with the need for overlap of the two FROC curves in the x-direction. For CAD the IDCA method
yielded the highest statistical power. Surprisingly, JAFROC yielded the highest statistical power for human observers,
even greater than IDCA which, unlike JAFROC, uses all the marks. The largest difference occurred for conservative
reporting styles and high data correlation: e.g., 0.3453 for JAFROC vs. 0.2672 for IDCA. One reason is that unlike
IDCA, the JAFROC figure of merit is sensitive to unmarked normal images and unmarked lesions. In all cases the ROC
method yielded the least statistical power and entailed a substantial statistical power penalty (e.g., 24% for ROC vs. 41%
for JAFROC). For human observers JAFROC should be used and for designer-level CAD data IDCA should be used and
use of the ROC method for localization studies is discouraged.
We describe a query-by-content search engine that enables a radiologist to search a large database of diagnostically- proven (`benign' or `malignant') mammographic region of interest (ROIs). The database search is facilitated by a relational map which is a 2D display of all the ROIs in the database. Labeled points on the map represent ROIs in the database. The map is constructed from the output of a neural network that has been trained to cluster the ROIs in the database using a measure of perceptual similarity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.