PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 12929, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image perception, observer performance, and technology assessment have driven many of the advances in breast imaging. Technology assessment metrics were used to develop mammography systems, first with screen-film mammography and then to digital mammography and digital breast tomosynthesis. To optimize these systems clinically, it became necessary to determine what type of information a radiologist needed to make a correct diagnosis. Image perception studies helped define what spatial frequencies were necessary to detect breast cancers and how different sources of noise affected detectability. Finally, observer performance studies were used to show that advances in the imaging system led to better detection and diagnoses by radiologists. In parallel to these developments, these three concepts were used to develop computer-aided diagnosis systems. In this talk, I will highlight how image perception, observer performance, and technology assessment were leveraged to produce technologies that allow radiologists to be highly effective in detecting breast cancer.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper investigates whether two publicly available Artificial Intelligence (AI) models can detect retrospectively identified missed cancers within a double reader breast screening program and determine whether challenging mammographic cases are reflected in the performance of AI models. Transfer learning was conducted on the Globally-aware Multiple Instance Classifier (GMIC) and Global-Local Activation Maps (GLAM) models using an Australian mammographic dataset. Mammograms were enhanced to improve poor contrast using the Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithm. The sensitivity of the two AI models with pre-trained and transfer learning modes was evaluated on four mammographic case groups: ‘missed’ cancers, ‘prior-visible’ cancers, ‘prior-invisible’ cancers and ‘current’ cancers from the archives of a double reader breast screening program. The GMIC model outperformed the GLAM model with pre-trained and transfer learning modes in terms of sensitivity for all four cancer groups. The performance of the GMIC and GLAM models was best in ‘prior-visible’ cancers, followed by ‘prior-invisible’ cancers, ‘current’ cancers and ‘missed’ cancers. The performance of the GMIC and GLAM models on the ‘missed’ cancer cases was 84.2% and 81.5%, respectively while for the ‘prior-visible’ cancer cases, the performance was 92.7% and 89.2%, respectively. After transfer learning, both the GMIC and GLAM models demonstrated statistically significant improvement (>9.4%) in terms of sensitivity for all cancer groups. The AI models with transfer learning showed significant improvement in malignancy detection in challenging mammographic cases. The study also supports the potential of the AI models to identify missed cancers within a double reader breast screening program.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Early detection of breast cancer through screening mammography is crucial. However, the interpretation of mammograms is prone to high error rates, and radiologists often exhibit common errors specific to their practice regions. It is essential to identify prevalent errors and offer tailored mammography training to address these region-specific challenges. This study investigated the feasibility of leveraging Convolutional Neural Networks (CNN) with transfer learning to identify areas in screening mammograms that may contribute to a high proportion of false positive diagnoses by radiologists from the same geographical region. We collected mammography test sets evaluated by a cohort of Australian radiologists and segmented error-related patches based on their assessments. Each patch was labeled as “easy” or “difficult”, and subsequently, we proposed a patch-wise ResNet model to predict the difficulty level of each patch. Specifically, we employed the pre-trained ResNet-18, ResNet-50, and ResNet-101 as feature extractors. During training, we modified and fine-tuned the fully connected layers for our target task while keeping the convolutional layers frozen. The model’s performance was evaluated using 10-fold cross-validation, and the transferred ResNet-50 obtained the highest performance, achieving Receiver Operating Characteristics Area Under the Curve (AUC) values of 0.975 (±0.011) on the validation sets. In conclusion, our study demonstrated the feasibility of employing CNN-based transfer learning to identify the prevalent errors in specific radiology communities. This approach shows promise in automating the customization of mammography training materials to mitigate errors among radiologists in a region.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Breast arterial calcifications (BAC) are increasingly recognized as indicative markers for cardiovascular disease (CVD). In this study, we manually annotated BAC areas on 3,330 mammograms, forming the foundational dataset for developing a deep learning model to automate assessment of BAC. Using this annotated data, we propose a semi-supervised deep learning approach to analyze unannotated mammography images, leveraging both labeled and unlabeled data to improve BAC segmentation accuracy. Our approach combines the U-net architecture, a well-established deep learning method for medical image segmentation, with a semi-supervised learning technique. We retrieved mammographic examinations of 6,000 women (3,000 with confirmed CVD and 3,000 without) from the screening archive to allow for a focused study. Utilizing our trained deep learning model, we accurately detected and measured the severity of BAC in these mammograms. Additionally, we examined the time between mammogram screenings and the occurrence of CVD events. Our study indicates that both the presence and severity (grade) of BAC, identified and measured using deep learning for automated segmentation, are crucial for primary CVD prevention. These findings underscore the value of technology in understanding the link between BAC in mammograms and cardiovascular disease, shaping future screening and prevention strategies for women's health.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We aim to investigate if ordering mammograms based on texture features promotes visual adaptation, allowing observers to more correctly and/or rapidly detect abnormalities in screening mammograms, thereby improving performance. A fully-crossed, multi-reader multi-case evaluation with 150 screening mammograms (1:1, positive:negative) and 10 screening radiologists was performed to test three different orders of mammograms. The mammograms were either randomly ordered, ordered by Volpara density (low to high), or ordered by a self-supervised learning (SSL) encoding. Level of suspicion (0–100) scores and recall decisions were given per examination by each radiologist. The area under the receiver operating characteristic curve (AUC), sensitivity, and specificity were compared between ordering conditions using the open-access iMRMC software. Median reading times were compared with the Wilcoxon signed rank test. The radiologist-averaged AUC was higher when interpreting screening mammograms from low to high density than when interpreting mammograms in a random order (0.924 vs 0.936, P=0.013). The radiologist-averaged specificity for the mammograms ordered by density tended to increase (87.3% vs 91.2%, P=0.047) at similar sensitivities (79.9% vs 80.4%, P=0.846) with reduced reading time (29.3 seconds vs 25.1 seconds, P<0.001). For the SSL order no significant difference in screening performance (AUC: 0.924 vs 0.914, P=0.381) and reading time (both 29.3 seconds, P=0.221) with the random order was found. In conclusion, this study suggests that ordering screening mammograms from low to high density enables radiologists to improve their screening performance. Studies within a screening setting are needed to confirm these findings.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Undersampling in the frequency domain (k-space) in MRI enables faster data acquisition. In this study, we used a fixed 1D undersampling factor of 5x with only 20% of the k-space collected. The fraction of fully acquired low k-space frequencies were varied from 0% (all aliasing) to 20% (all blurring). The images were reconstructed using a multi-coil SENSE algorithm. We used two-alternative forced choice (2-AFC) and the forced localization tasks with a subtle signal to estimate the human observer performance. The 2-AFC average human observer performance remained fairly constant across all imaging conditions. The forced localization task performance improved from the 0% condition to the 2.5% condition and remained fairly constant for the remaining conditions, suggesting that there was a decrease in task performance only in the pure aliasing situation. We modeled the average human performance using a sparse-difference of Gaussians (SDOG) Hotelling observer model. Because the blurring in the undersampling direction makes the mean signal asymmetric, we explored an adaptation for irregular signals that made the SDOG template asymmetric. To improve the observer performance, we also varied the number of SDOG channels from 3 to 4. We found that despite the asymmetry in the mean signal, both the symmetric and asymmetric models reasonably predicted the human performance in the 2-AFC experiments. However, the symmetric model performed slightly better. We also found that a symmetric SDOG model with 4 channels implemented using a spatial domain convolution and constrained to the possible signal locations reasonably modeled the forced localization human observer results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Silicosis is a type of occupational lung disease or pneumoconiosis that results from the inhalation of crystalline silica dust that can lead to fatal respiratory conditions. This study aims to develop an online platform and benchmark radiologists' performance in diagnosing silicosis. Fifty readers (33 radiologists and 17 radiology trainees) interpreted a test-set of 15 HRCT cases. The median AUROC for all readers combined was 0.92 (0.93 for radiologists and 0.91 for trainees). No statistical differences were observed among the radiologists and trainees for their performance. Moderate agreement was recorded among readers for the correct diagnosis of silicosis (κ=0.57), however, there was considerable variability (κ<0.2) in the accurate detection of irregular opacities and ground glass opacities. Our online platform shows promise in providing tailored education to clinicians and facilitating future works of long-term observer studies and development of educational solutions to enhance the diagnostic accuracy of silicosis detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Satisfaction of Search (SOS), a phenomenon studied by medical imaging and cognitive science researchers, refers to the diminished visual search performance for a target in a search image when a prior target has already been detected. Much has been learned about the SOS effect by studying its pervasiveness across many different types of medical images, including chest radiography, abdominal contrasts, and breast imaging. Much has also been learned about the SOS effect by using simplified search images with targets that take little training to detect (see Adamo et al., 2021 for a review). In this study, we used simplified 2D and segmented-3D search images and investigated whether observers’ search performance differs between these imaging types. Consistent with research in breast imaging, Adamo et al. (2018) found that when novice and experienced observers searched for a single target, they: 1) made fewer false positives, 2) improved their hit rates, and 3) spent longer searching in segmented-3D images compared to 2D images. Here, we replicated this pattern when observers searched for multiple targets. Importantly, we also found that the SOS effect was reduced in segmented-3D images compared to 2D images, suggesting that segmented-3D imaging can improve search performance for multiple targets (abnormalities) within medical imaging.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Eye tracking in combination with artificial intelligence is a developing area of research with a wide range of applications, as evidenced by the increasing number of studies being conducted in this field. Such studies hold promising results in terms of prognosis and diagnosis, as they provide insight into how doctors interpret images and the factors that influence their decision-making processes. In this study, we investigated whether potential diagnostic errors made by physicians can be recognized through eye movements and artificial intelligence. To achieve this, we engaged four radiologists with varying levels of diagnostic experience to analyze 400 X-rays chest images with a wide range of anomalies, concurrently capturing their eye movements using an eye tracker. For each of the resulting 1546 readings, we computed numerical features extracted using radiologists’ gaze saccade data. Subsequently, we applied three machine learning algorithms such as random forest, support vector machines, k-nearest neighbor classifier, and also a neural network to map reading gaze features with radiological errors resulting in the error prediction accuracy of 0.7. Our experiments demonstrate the existence of a connection between diagnostic errors and gaze, indicating that eye-tracking data can serve as a valuable source of information for human error analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Model observers designed to predict human observers in detection tasks are important tools for assessing task-based image quality and optimizing imaging systems, protocol, and reconstruction algorithms. Linear model observers have been widely studied to predict human detection performance, and recently, deep learning model observers (DLMOs) have been developed to improve the prediction accuracy. Most existing DLMOs utilize convolutional neural network (CNN) architectures, which are capable of learning local features while not good at extracting long-distance relations in images. To further improve the performance of CNN-based DLMOs, we investigate a hybrid CNN-Swin Transformer (CNN-SwinT) network as DLMO for PET lesion detection. The hybrid network combines CNN and SwinT encoders, which can capture both local information and global context. We trained the hybrid network on the responses of 8 human observers including 4 radiologists in a two-alternative forced choice (2AFC) experiment with PET images generated by adding simulated lesions to clinical data. We conducted a 9-fold cross-validation experiment to evaluate the proposed hybrid DLMO, compared to conventional linear model observers such as a channelized Hotelling observer (CHO) and a non-prewhitening matched filter (NPWMF). The hybrid CNN-SwinT DLMO predicted human observer responses more accurately than the linear model observers and DLMO with only the CNN encoder. This work demonstrates that the proposed hybrid CNN-SwinT DLMO has the potential as an improved tool for task-based image quality assessment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Convolutional neural networks (CNNs) have been previously used as model observers (MO) for the purpose of defect detection in medical images. Due to their limited generalizability, such CNN observers do not possess the ability to recognize whether an input image comes from the same distribution as the data it was trained on, i.e., the ability of having domain awareness. In this paper we propose an adaptive learning approach for training a domain-aware CNN ideal observer (IO). In our approach we use a variant of U-Net CNN which is trained simultaneously for defect localization prediction and for reconstruction of the input image. We demonstrate that the reconstruction mean-squared-error (MSE) by the network can serve as an indicator of how well the observer performs in the defect localization task, which is an important step towards developing a domain-aware MO. Furthermore we propose an adaptive learning approach by automatically selecting datasets on which the model in training has poor reconstruction MSE. Our results show that this adaptive training approach can improve the model performance both in generalization and defect localization compared to a non-adaptive approach, particularly for out-of-distribution images - images that were not seen during the training of the algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image quality assessment is crucial for evaluating and optimizing imaging systems, especially in diagnostic tasks involving expert human readers. In this work, we propose a threshold mechanism that may enhance the ability of model observers to predict human performance in diagnostic tasks. The threshold sets lower limits on acceptable feature values extracted from an image. This mechanism was tested with existing prewhitening and nonprewhitening visual-search models for lesion search in lumpy-background images. The images simulated single-pinhole planar imaging with a radionuclide. Pinhole size was a study variable. A localization ROC (LROC) study format was used. A set of three Gabor functions defined the feature space. One study tested the two model observers with and without optimized thresholding. A second study with the prewhitening model examined the effect of thresholding on the training requirements needed to obtain stable estimates of LROC performance. With regard to agreement with human observers for the same search task, thresholding overcame substantial limitations of the nonprewhitening observer. Thresholding also consistently reduced the numbers of training images required to compute stable estimates of performance with the prewhitening visual-search model. Overall, these results point to possible benefits for low-resource application of these model observers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recent studies have proposed methods to preserve and enhance signal-detection performance in learning-based CT reconstruction with CNNs. Prior work has focused on optimizing for ideal observer (IO) or Hotelling observer (HO) performance during training. However, the performance of the IO or HO may not correlate well with the performance of human observers on the same task. In this work, we explore modified training procedures to optimize for a variety of model observers, such as the signal-Laplacian and non-prewhitening model observer with eye filter, that we hypothesize are a better proxy for a human model observer than the IO or HO. We illustrate the proposed training approach on a CNN-model used to reconstruct synthetic sparse-view breast CT data. Our results indicate that the proposed modified training allows one to preserve weak signals in the reconstructions while changing the overall noise characteristics in a way that may be beneficial to human observers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Purpose: To develop a volumetric channelized Hotelling observer (vCHO) for protocol optimization studies using an anthropomorphic phantom across different types of scanners, including energy integrating (EI-CT) and photon-counting (PC-CT) CT scanners and validate it against human observer results. Methods: The anthropomorphic abdominal phantom was scanned on a PCCT and EICT. The phantom consists of 5 anatomical landmarks and a cylindrical insert, containing low-contrast spheres (diameters 4, 6, and 8mm, 15 of each size) and a sizable signal-free background. The phantom was scanned on both systems with a wide range of CT parameters: tube voltage, dose, reconstruction settings, FOV, and slice thickness (more than 350 scans). An anthropomorphic vCHO was created to assess the targets detectability. A two-layer approach was used, where first the target locations were estimated using a CHO with Laguerre-Gauss channels, followed by an anthropomorphic CHO with volumetric difference-of-Gaussians channels to assess the percentage correctly detected spheres (PC). The threshold diameter (Dtr) at 62.5% PC was estimated via logistic regression, with lower Dtr representing better detectability and expressed in ‘mm’. Results: For almost all settings the PCCT performs better than the EICT. For both scanners while keeping the rest of the settings, the performance increased with tube voltage. As expected, the performance increased with higher dose. Softer kernel yielded higher quality. For the EICT, FBP outperformed iterative reconstruction, while for the PCCT the IR strength did not result in statistically significant results (p>0.1). Smaller FOV enhanced and similarly narrower slice thickness improved performance. When the results were compared to the human observer readings the vCHO outperformed the humans under all conditions, but comparatively the same conclusions about the image quality can be drawn. The human observers are more sensitive to scanning parameters change, where the vCHO detects smaller change in Dtr for the same reading conditions. Conclusions: This study introduced an anthropomorphic vCHO for CT image quality assessment using an anthropomorphic phantom and the first vCHO tendencies for different parameters for an EICT and PCCT scanners.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Evaluating medical imaging algorithm performance in a test set may lead to a biased result, especially if the number of images is low. In the case of the Medical Imaging Data and Resource Center (midrc.org) sequestered imaging data commons, developers may seek the evaluation of subsequent iterations of an algorithm using additional test subsets drawn from the sequestered commons, allowing for repeat testing but also possibly resulting in learning the sequestered commons when test samples overlap. We developed a method to measure image reuse in test subsets and to evaluate the impact of degree of image reuse on over- or under-estimation of performance by using the load factor, a metric from hash-table methodology that can be used to summarize the average test subset pairings per image. We established a relationship between the standard error of the area under the receiver operating curve (AUC) and load factor, and compared the relationship to interquartile range of AUC for the case of an image-derived predictor for COVID-19 severity on chest radiographs. As expected, AUC variation was inversely related to load factor while image usage increased with load factor, with similar performances between both predicted and actual AUC variation and load factor. Notably, low AUC variation was observed in load factors well above 1, the load factor typically described in the hash-table literature as optimal. These results translate the use of load factor for characterization of stand-alone test sets, supporting future work for operationalizing the use of sequestered test subsets for algorithm evaluation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image-to-image translation is a common task in computer vision and has been rapidly increasing the impact on the field of medical imaging. Deep learning-based methods that employ conditional generative adversarial networks (cGANs), such as Pix2PixGAN, have been extensively explored to perform image-to-image translation tasks. However, when noisy medical image data are considered, such methods cannot be directly applied to produce clean images. Recently, an augmented GAN architecture named AmbientGAN has been proposed that can be trained on noisy measurement data to synthesize high-quality clean medical images. Inspired by AmbientGAN, in this work, we propose a new cGAN architecture, Ambient-Pix2PixGAN, for performing medical image-to-image translation tasks by use of noisy measurement data. Numerical studies that consider MRI-to-PET translation are conducted. Both traditional image quality metrics and task-based image quality metrics are employed to assess the proposed Ambient-Pix2PixGAN. It is demonstrated that our proposed Ambient-Pix2PixGAN can be successfully trained on noisy measurement data to produce high-quality translated images in target imaging modality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The rapid evolution of deep generative models (DGMs) has highlighted their great potential in medical imaging research. Recently, it has been claimed that a diffusion generative model: denoising diffusion probabilistic model (DDPM), performs better at image synthesis than the previously popular DGMs: generative adversarial networks (GANs). However, this claim is based on evaluations employing measures intended for natural images, and thus, does not resolve questions about their relevance to medical imaging tasks. To partially address this problem, we performed a series of assessments to evaluate the ability of a DDPM to reproduce diagnostically relevant spatial context. Our findings show that in all our studies, although context was generally well replicated in DDPM-generated ensembles, it was never perfectly reproduced in the entire ensemble.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study explored if using a set of global radiomic (i.e., computer-extracted) features derived from mammograms could predict the gist of breast cancer (holistic perceptual information provided from radiologists’ first impression about the presence of an image abnormality after a brief sight of the image). A retrospective de-identified dataset was used to collect the gist of breast cancer (i.e., gist scores) from 13 readers interpreting 1100 screening craniocaudal mammograms (659 current “normal” cancer-free images, and 441 “prior” no visible signs of cancer images acquired two years before current cancer mammograms). The collected gist scores from all readers were averaged to eliminate the noise of the gist signal, giving one gist score per image. The images were grouped into high- and low-gist based on the 75th and 25th percentiles of the images containing the highest and lowest gist scores, respectively. A set of 130 handcrafted global radiomic features per image were extracted and used to construct two machine learning random forest classifiers: 1). Normal and 2). Prior based on the corresponding features computed from the “normal” and “prior” images, for distinguishing high- from low-gist images. The classifiers were trained and validated using the 10-fold cross-validation approach and their performances were measured by the area under the receiver operating characteristic curve (AUC). The Normal and Prior classifiers resulted in AUCs of 0.83 (95% CI: 0.77-0.85) and 0.84 (95% CI: 0.80-0.87) respectively, suggesting that the global mammographic radiomic features can predict the gist of breast cancer on a screening mammogram.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The aim of this study is to measure reader agreement for i) lesion classification and ii) breast density in Contrast Enhanced Mammography (CEM). Methods: Two experienced and two inexperienced CEM readers reported 60 examinations. Kappa was used to assess inter-reader agreement between experienced and inexperienced readers for lesion classification (benign/malignant) and breast density (dense/non-dense). Weighted kappa was used to assess agreement for BI-RADS categories (1-5) and BI-RADS density (A-D). Intraclass correlation coefficient (ICC) measured agreement for breast density using Visual Analog scale (VAS). Intra-reader agreement for one experienced and one inexperienced reader was measured after a three month interval. Results: Agreement between experienced readers was substantial (κ=0.66) for benign/malignant, and moderate (κ=0.57) for BI-RADS categories. Agreement for inexperienced readers was moderate for benign/malignant and BI-RADS categories (κ=0.52, κ=0.47 respectively). Breast density (dense/non-dense) agreement was almost perfect for experienced readers (κ=0.83) and substantial for BI-RADS (κ=0.70). Inexperienced reader agreement was moderate for dense/non-dense (κ=0.50) and BI-RADS (κ=0.49). ICC for VAS was moderate for experienced (ICC=0.60) and good (ICC=0.84) for inexperienced readers. Intra-reader agreement for benign/malignant classification was almost perfect for both experienced and inexperienced readers respectively (κ=0.83, κ=0.91). Conclusion: Experienced readers showed substantial agreement for lesion classification and almost perfect agreement for breast density. While inexperienced reader agreement was moderate for both lesion classification and breast density, their agreement for VAS was higher than experienced readers, suggesting that CEM may have a short learning curve and that radiologists could potentially be trained for CEM interpretation, which would help its implementation in other clinical practices in Saudi Arabia.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There is currently substantial interest in understanding batch reading to improve performance in breast-cancer screening, and to identify mechanisms of performance. We evaluated batch reading of digital breast tomosynthesis (DBT) images for breast cancer screening using observational data acquired at the University of Pittsburgh Medical Center (UPMC). We studied batches of screen exams that were defined by completion-time differences between sequentially interpreted cases, in which a completion time exceeding a threshold led to defining a new batch. After exclusions the data consisted of 121,652 exams from 15 readers, with a total of 1,081 cancers. We found that the inter-exam time threshold used for batch definition introduces selection bias that had a large impact on the cancer rate in the first case of a batch. For the smallest threshold (< 1 minute), all cases are defined as the first case of a new batch, and the cancer rate was the overall cancer rate of the data, 8.9/1000. As the threshold increased to 4-5 minutes, the cancer rate of the first case in the batch increased to nearly double the overall rate, 16.0/1000. This threshold excluded many non-cancer cases, which are typically read in 2-3 minutes for DBT, while still capturing most cancer cases, which take longer to complete. At a 10-minute completion-time difference, the first-case cancer rate decreased to 12.6/1000, and stabilized. We argue that this increase in cancer rate is likely due to readers terminating batch reading upon encountering a difficult case. Our results demonstrate a clear selection bias in batches defined by inter-exam time, and suggest using cancer rate for adjustment to reduce the effect of this bias.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study aims to investigate how the fluctuation of time intervals between self-assessment test sets influence the performance of radiologists and radiology trainees. The data was collected from 54 radiologists and 92 trainees who completed 260 and 550 readings of 9 mammogram test sets between 2019 and 2023. Readers’ performances were evaluated via case sensitivity, lesion sensitivity, specificity, ROC AUC and JAFROC. There was significant positive correlation between the intervals of test sets and radiologist's improvement in specificity and JAFROC (P<0.05). For separations in test sets exceeding 90 days, radiologists’ performance improved for sensitivity (5.2%), lesion sensitivity (6.6%), ROC (3.1%) and JAFROC (6.3%), with specificity remaining consistent. For trainees who completed test sets within a single day, a significant postive correlation was recorded between the time intervals of test sets and their improvement in ROC AUC (P=0.008) and JAFROC (P=0.02). However, for trainees who needed more than 1 day to complete a test set, this correlation was reversed in sensitivity (P=0.009) and ROC AUC (P=0.02). The most notable progress of trainees was found in sensitivity (6.15%), lesion sensitivity (11.6%), ROC AUC (3.5%) and JAFROC (4.35%) with specificity remained unchanged when the test sets were completed between 31-90 days.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Digital subtraction angiography (DSA) is commonly used in minimally invasive endovascular procedures for clinical decision-making in diagnosis and guidance. During a procedure, angiographic image sequences are taken sequentially, moving to more downstream blood vessels each time, mapping out the vascular structure of a patient. Localizing vascular structures within an image sequence with respect to prior image sequences can be challenging, especially when done in real-time. This study introduces a novel unsupervised method to localize DSA images with respect to each other in order to match the same vascular anatomy in different image sequences. The network consists of two parallel encoders that are used for matching and localization. First, images are matched according to the similarity of the encodings. Then, the encodings can be used to find the coordinate at which the images have the highest similarity, thus localizing the vasculature that matches in both images. The network was trained on a synthetic dataset which consisted of mother-daughter image pairs, where the daughter was a cropped version of a DSA image frame. The network was tested on a real-world dataset which consisted of image pairs that were matched according to anatomically neighboring blood vessels. Results show an AUC of 0.98 for the synthetic dataset and 0.69 for the real-world dataset. To conclude, the matching of the blood vessels was feasible with the use of unsupervised deep learning. The code can be found on: https://github.com/rooskraaijveld/ DSA localization.git
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Myocardial perfusion imaging using single-photon emission computed tomography (SPECT), or myocardial perfusion SPECT (MPS) is a widely used clinical imaging modality for the diagnosis of coronary artery disease. Current clinical protocols for acquiring and reconstructing MPS images are similar for most patients. However, for patients with outlier anatomical characteristics, such as large breasts, images acquired using conventional protocols are often sub-optimal in quality, leading to degraded diagnostic accuracy. Solutions to improve image quality for these patients outside of increased dose or total acquisition time remain challenging. Thus, there is an important need for new methodologies that can help improve the quality of the acquired images for such patients, in terms of the ability to detect myocardial perfusion defects. One approach to improving this performance is adapting the image acquisition protocol specific to each patient. Studies have shown that in MPS, different projection angles usually contain varying amounts of information for the detection task. However, current clinical protocols spend the same time at each projection angle. In this work, we evaluated whether an acquisition protocol that is optimized for each patient could improve performance on the task of defect detection on reconstructed images for patients with outlier anatomical characteristics. For this study, we first designed and implemented a personalized patient-specific protocol-optimization strategy, which we term precision SPECT (PRESPECT). This strategy integrates the theory of ideal observers with the constraints of tomographic reconstruction to optimize the acquisition time for each projection view, such that performance on the task of detecting myocardial perfusion defects is maximized. We performed a clinically realistic simulation study on patients with outlier anatomies on the task of detecting perfusion defects on various realizations of low-dose scans by an anthropomorphic channelized Hotelling observer. Our results show that using PRESPECT led to improved performance on the defect detection task for the considered patients. These results provide evidence that personalization of MPS acquisition protocol has the potential to improve defect detection performance on reconstructed images by anthropomorphic observers for patients with outlier anatomical characteristics. Thus, our findings motivate further research to design optimal patient-specific acquisition and reconstruction protocols for MPS, as well as developing similar approaches for other medical imaging modalities.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We investigated the use of equivalent relative utility (ERU) to evaluate the effectiveness of artificial intelligence (AI)-based rule-out algorithms designed to autonomously remove non-cancer patient images from radiologist review. Two evaluation metrics are explored: positive/negative predictive values and ERU. We applied both methods to a recent US study that concluded an improved specificity by retrospectively applying their AI algorithm to analyze a large mammography dataset. The ERU values are also calculated given the recall and cancer detection rates from a European mammography screening study. Without large prospective studies, ERU may provide insights in the effectiveness of a rule-out algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Because the conventional binormal ROC curve parameters are in terms of the underlying normal diseased and nondiseased rating distributions, transformations of these values are required for the user to understand what the corresponding ROC curve looks like in terms of its shape and size. In this paper I propose an alternative parameterization in terms of parameters that explicitly describe the shape and size of the ROC curve. The proposed two parameters are the mean-to-sigma ratio and the familiar area under the ROC curve (AUC), which are easily interpreted in terms of the shape and size of the ROC curve, respectively. In addition, the mean-to-sigma ratio describes the degree of improperness of the ROC curve and the AUC describes the ability of the corresponding diagnostic test to discriminate between diseased and nondiseased cases. The proposed parameterization simplifies the sizing of diagnostic studies when conjectured variance components are used and simplifies choosing the binormal a and b parameter values needed for simulation studies.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Double reading of screening mammograms, a feature of many breast cancer screening programs, is impacted by interactions between the two image readers. In this work, we describe how the bivariate binormal (BVBN) model, originally developed for statistical analysis of reader studies, can be used to analyze double reading of screening mammograms. The model posits two bivariate normal distributions that describe the distribution of latent decision variables of the two readers for cancer and non-cancer cases. The BVBN allows for the estimation of correlation coefficients between the decision variables of two readers, independent of performance and the threshold for recall. We contend that these correlation coefficients are a useful way to characterize interactions between readers because they characterize associations at the level of the perceptual response in a way that is consistent with Signal Detection Theory. We describe the BVBN model and show how parameters can be estimated from count data under an assumed multinomial distribution. The analysis presented focuses on two aspects of the BVBN model. For implementation using binary data, an equal-variance assumption on latent decision variables is required. Otherwise, the model is over-parameterized. We characterize and discuss the consequence of this assumption. We also show how disagreement rates, an alternative measure of reader interactions, suffer from base-rate effects making them more difficult to interpret than the correlation coefficients of the BVBN model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study, we aimed to understand the generalizability of a convolutional neural network (CNN)-based model observer for breast tomosynthesis images with two different (i.e., 30% and 50%) volume glandular fractions (VGFs). Spiculated signal with a volume equivalent to that of a spherical signal with a diameter of 1 mm was inserted at the center to generate signal-present breast volumes. The networks were optimized through brute force search in terms of depth (i.e., 5, 10, and 15 convolutional blocks) to investigate whether there is any correlation between the detection performance, and the difference between the theoretical receptive field (TRF) size of the network and the signal size. For all cases, the optimal detection performance of the CNN-based model observer was achieved when 5 convolutional blocks (i.e., TRF size of 1.1 mm) were used. To verify whether a nonlinear framework improves the generalizability of the observer, the detection performance of the CNN-based model observer was compared to that of the Hoteling observer (HO). A total of 18 tests were conducted by applying the optimal networks (i.e., N30%, N50%, Nboth) and the Hotelling templates (i.e., HT30%, HT50%, and HTboth) to each of the three testing subsets in order to compare the generalizability between the two observers. The CNN-based model observer showed a better generalized detection performance compared to that of the HO.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
3D echocardiography (3DE) is the standard modality for visualizing heart valves and their surrounding anatomical structures. Commercial cardiovascular ultrasound systems commonly offer a set of parameters that allow clinical users to modify, in real time, visual aspects of the information contained in the echocardiogram. To our knowledge, there is currently no work that demonstrates if the methods currently used by commercial platforms are optimal. In addition, current platforms have limitations in adjusting the visibility of anatomical structures, such as reducing information that obstructs anatomical structures without removing essential clinical information. To overcome this, the present work proposes a new method for 3DE visualization based on “focus + context” (F+C), a concept which aims to present a detailed region of interest while preserving a less detailed overview of the surrounding context. The new method is intended to allow clinical users to modify parameter values differently within a certain region of interest, independently from the adjustment of contextual information. To validate this new method, a user study was conducted amongst clinical experts. As part of the user study, clinical experts adjusted parameters for five echocardiograms of patients with complete atrioventricular canal defect (CAVC) using both the method conventionally used by commercial platforms and the proposed method based on F+C. The results showed relevance for the F+C-based method to visualize 3DE of CAVC patients, where users chose significantly different parameter values with the F+C-based method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Skin cancer is a prevalent and potentially fatal disease that requires accurate and efficient diagnosis and treatment. Although manual tracing is the current standard in clinics, automated tools are desired to reduce human labor and improve accuracy. However, developing such tools is challenging due to the highly variable appearance of skin cancers and complex objects in the background. In this paper, we present SkinSAM, a fine-tuned model based on the Segment Anything Model that showed outstanding segmentation performance. The models are validated on HAM10000 dataset which includes 10015 dermatoscopic images. While larger models (ViT_L, ViT_H) performed better than the smaller one (ViT_b), the finetuned model (ViT_b_finetuned) exhibited the greatest improvement, with a Mean pixel accuracy of 0.945, Mean dice score of 0.8879, and Mean IoU score of 0.7843. Among the lesion types, vascular lesions showed the best segmentation results. Our research demonstrates the great potential of adapting SAM to medical image segmentation tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study explores the validity of texture-based classification in the early stages of visual search/classification. Initially, we summarize our group's prior findings regarding the prediction of signal detection difficulty based on second-order statistical image texture features in tomographic breast images. Alongside the development of visual search model observers to accurately mimic search and localization in medical images, we continue examining the efficacy of texture-based classification/segmentation methods. We consider both first and second-order features through a combination of texture maps and Gaussian mixture model (GMM). Our aim is to evaluate the advantages of integrating these methods at the early stages of the visual search process, particularly in scenarios where target morphological features may be less apparent or known, as in clinical data. By merging knowledge of imaging physics and texture based GMM, we enhance classification efficiency and refine localization of regions suspected of containing target locations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Objective evaluation of quantitative imaging (QI) methods with patient data, while important, is typically hindered by the lack of gold standards. To address this challenge, no-gold-standard evaluation (NGSE) techniques have been proposed. These techniques have demonstrated efficacy in accurately ranking QI methods without access to gold standards. The development of NGSE methods has raised an important question: how accurately can QI methods be ranked without ground truth. To answer this question, we propose a Cram´er–Rao bound (CRB)-based framework that quantifies the upper bound in ranking QI methods without any ground truth. We present the application of this framework in guiding the use of a well-known NGSE technique, namely the regression-without-truth (RWT) technique. Our results show the utility of this framework in quantifying the performance of this NGSE technique for different patient numbers. These results provide motivation towards studying other applications of this upper bound.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rise of 3D content in medical applications, there is also an increasing need for 3D visualization solutions. Medical visualization solutions adhere to very strict quality standards and utilize advanced calibration schemes to guarantee the content is presented to the observer in the most optimal fashion. One crucial aspect of this calibration scheme is indicated by the Contrast Sensitivity Function (CSF). While this is investigated thoroughly for 2D displays, very little research has been performed to quantify CSF in 3D scenes. We have designed a perception experiment to quantify how human observers experience CSF on 3D displays. The experiment utilizes Gabor patches that are projected on a 3D scene. However, Gabor patches by themselves do not create enough depth cues for a human to properly fuse on by themselves. A stimulus was designed to ensure that the Gabor patch was fused on correctly so that the correct depth is perceived. The results show that human observers do perceive contrast in a slightly different fashion on a lenticular lens 3D display compared to 2D screens. The experiment shows that the observers become more sensitive for 3D when objects are placed further than the plane of the display and become less sensitive when objects are placed before the plane of the display. These results allow calibrating the scene so that the CSF is linearly perceived based on the content, similar to Grayscale Standard Display Function (GSDF) for 2D-displays.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Purpose: The aim of this study was to develop an algorithm for automated quality control of structured radiology reports and to automatically obtain the correct invoicing codes for the performed exam. Ultrasound (US) exams of the abdomen were selected as use case, including Doppler exams. Method: To build a correct algorithm for automatic billing, the billing tree for the Ultrasound exams was studied. In Switzerland, TARMED, is the tariff structure used for billing outpatient medical services. The 4600 services listed in TARMED are divided into chapters that group together all services with a well-defined common characteristic. For example, Chapter 39 covers all medical imaging services. These chapters are further subdivided into subchapters for greater precision. Using this information a modular Natural Language Processing algorithm based on the Natural Language Toolkit (NLTK) library was developed. A second NLP algorithm based on SPACY was also developed, with the objective of a double validation of the first developed NLP algorithm. To train and test the algorithm a dataset of 170 exams corresponding to US abdominal examinations along with their radiology report were extracted from our RIS. The results of the algorithm were validated by an experienced technologists which identified possible discrepancy between the algorithm results and the correct billing. This check was carried out on a batch of data containing 95 samples. A confusion matrix was used to analyze the results. Results: In all 95 data samples, the NLTK algorithm was able to detect the billing codes correctly 100% of the time. In all our 95 data samples, the Spacy algorithm was able to detect the billing codes correctly in 86.3% of cases. This algorithm tends to overestimate the type of abdominal examination present in the report. Indeed, the 13 cases in which the algorithm made an error were cases where it detected a full abdominal ultrasound when the examination was a simple lower or upper abdomen. Conclusion: The NLTK model provides reliable and efficient estimation of billing codes for abdominal ultrasound, facilitating the task of the technologies who saves time and avoids possible human errors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Objective assessment of medical image quality can be performed with mathematical model observers matched to radiologists. Foveated channelized Hotelling observer models (FCHO) have been shown to be more accurate predictors of the human search performance in simulated 3D images than standard model observers such as the ideal observer or the non-prewhitening observer with eye filter. However, nothing is known about the performance of FCHOs with the computed tomography (CT) modality as well as with images extracted from real patients. Patient-extracted images are smaller than simulated images and their size could be limiting for FCHOs as peripheral vision is modeled by an increasing spatial extent of channels. This study has two aims: to extend a foveated model observer to 2D anatomical liver CT images and to find channel parameters enabling the FCHO to match human performance. Regions of interest (ROIs) were automatically extracted from CT images of five patients’ livers and their size was of 100x100 pixels, a balance between the anatomical constraints and the modeling of peripheral vision. Two radiologist-validated small low-contrast hypodense hepatic metastases were simulated to generate signal-present ROIs. The signal diameters were of 1 cm relatively to the patient and their contrast of -50 HU. The foveated model observer used was a FCHO with dense difference-of-Gaussians channels that were optimized to the size of the extracted ROIs. The performance of the optimized FCHO could reproduce human performance for a detection task in anatomical liver CT images within standard error up to 9 degrees of visual angle. This study shows that optimized FCHOs could be used in more anthropomorphic assessments of image quality of CT units.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Hand x-rays are used for tasks such as detecting fractures and investigating joint pain. The choice of the x-ray view plays a crucial role in a medical expert’s ability to make an accurate diagnosis. This is particularly important for the hand, where the small and overlapping bones of the carpals can make diagnosis challenging, even with proper positioning. In this study, we develop a prototype that uses deep learning models, iterative methods and a depth sensor to estimate hand and x-ray machine parameters. These parameters are then used to generate feedback that helps ensure proper radiographic hand positioning. The method of this study consists of five steps: detector table parameter estimation, 2D hand joint landmark prediction, hand joint landmark depth estimation, radiographic positioning parameter extraction, and radiographic protocol constraint verification. Detector plane parameter estimation is achieved by fitting a plane to randomly queried depth points using RANSAC. Google’s MediaPipe HandPose model is used for 2D hand joint landmark prediction, and hand joint depth estimation is determined using the OAK-D Pro sensor. Finally, hand positioning parameters are extracted and evaluated for the selected radiographic viewing protocol. We focus on three commonly used hand positioning protocols: posterior-anterior, oblique, and lateral view. The prototype also has a user interface and a feedback system designed for practical use in the x-ray room. Two evaluations are undertaken to validate our prototype. First, with the help of a radiology technician, we rate the tool’s positioning feedback. Second, using a bespoke left-hand x-ray phantom and an x-ray machine, we generate images with and without the prototype guidance for a double-blind study where the images are rated by a radiologist.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Manual image segmentations are naturally subject to inaccuracies related to systematic errors (due to the tools used, eye-hand coordination, etc.). This was noted earlier when a simplified accuracy scale was proposed [1]. This scale arbitrarily divides a given range of values of the Kappa measurement parameter into classes: almost perfect (>0.80), substantial (0.61 - 0.80), moderate (0.41 - 0.60), fair (0.21 - 0.40), slight (0.00 - 0.21) and poor (< 0.00). However, the determination of threshold values between classes is not entirely clear and seems to be application-dependent. This is particularly important for images in which the tumor-normal tissue boundary can be very indistinct, as is observed in ultrasound imaging of the most common cancer in women - breast cancer [2]. In machine learning, there is an ongoing contest over the values of performance indicators obtained from new neural network architecture without accounting for any ground truth bias. This raises the question of what relevance, from a segmentation quality point of view, a gain at the level of single percentages has [3] if the references have much greater uncertainty. So far, research on this topic has been limited. The relationship between the segmentations of breast tumors on ultrasound images provided by three radiologists and those obtained using deep learning model has been studied in [4]. Unfortunately, the indicated segmentation contour sometimes varied widely in all three cases. A cursory analysis by multiple physicians, which focused only on the Kappa coefficient in the context of physicians’ BI-RADS category assignments, was conducted in the [5]. In this article, we present a preliminary analysis of the accuracy of experts’ manually prepared binary breast cancer masks on ultrasound images and their impact on performance metrics commonly used in machine learning. In addition, we examined how tumor type or BI-RADS category [6] affects the accuracy of tumor contouring.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In domains such as biomedical imaging, the evaluation of deep generative models (DGMs) for image-to-image translation tasks is additionally challenged by the need for substantial domain expertise, even for visual evaluation. To partially circumvent this problem, we propose a data-driven, human interpretable method to evaluate image-conditioned DGMs for the reproducibility of domain-relevant spatial context before the DGMs are considered for diagnostic tasks and real-world deployment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Medical imaging systems that are designed for producing diagnostically informative images should be objectively assessed via task-based measures of image quality (IQ). Ideally, computation of task-based measures of IQ needs to account for all sources of randomness in the measurement data, including the variability in the ensemble of objects to be imaged. To address this need, stochastic object models (SOMs) that can generate an ensemble of synthesized objects or phantoms can be employed. Various mathematical SOMs or phantoms were developed that can interpretably synthesize objects, such as lumpy object models and parameterized torso phantoms. However, such SOMs that are purely mathematically defined may not be able to comprehensively capture realistic object variations. To establish realistic SOMs, it is desirable to use experimental data. An augmented generative adversarial network (GAN), AmbientGAN, was recently proposed for establishing SOMs from medical imaging measurements. However, it remains unclear to which extent the AmbientGAN-produced objects can be interpretably controlled. This work introduces a novel approach called AmbientCycleGAN that translates mathematical SOMs to realistic SOMs by use of noisy measurement data. Numerical studies that consider clustered lumpy background (CLB) models and real mammograms are conducted. It is demonstrated that our proposed method can stably establish SOMs based on mathematical models and noisy measurement data. Moreover, the ability of the proposed AmbientCycleGAN to interpretably control image features in the synthesized objects is investigated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study focuses on investigating radiologists' decision-making processes in breast cancer screening, with the aim of exploring the potential of a decision prediction model trained using individual radiologists' decisions. We built decision prediction models based on the radiologists' eye position recordings and the locations where they indicated that a malignant mass was present. We considered 120 mammogram cases read by eight radiologists with different expertise levels. The decisions made were classified into three categories: True Positives (TP), False Negatives (FN), and False Positives (FP), based on the radiologists' marks and the ground truth. Notably, the data for each radiologist was used to train independent radiologist specific models. The marked areas (TPs, FPs) and the False Negative areas were cropped and fed into both base models VGG19 and ResNet50, which were pretrained with the ImageNet dataset. We enhanced both base models by incorporating a Gabor filter layer. The Gabor filter layer, implemented as a 2D convolutional layer with fixed weights, utilizes Gabor filters to extract essential Gabor features from the input. As a result, our approach yields four models tailored for decision prediction for each radiologist – VGG19 and ResNet50 each with and without Gabor filters. The models were analyzed and compared to assess their performance and potential benefits. The results underscored the significance of the radiologist's expertise and consistency in making decisions in determining the model's accuracy. When radiologists' responses are inconsistent regarding similar features across different cases, predicting the decisions using the individual models becomes challenging. Consequently, the models’ performance displayed variation based on individual radiologists’ data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Volume measurement of a paraganglioma (a rare neuroendocrine tumor that typically forms along major blood vessels and nerve pathways in the head and neck region) is crucial for monitoring and modeling tumor growth in the long term. However, in clinical practice, using available tools to do these measurements is time-consuming and suffers from tumor-shape assumptions and observer-to-observer variation. Growth modeling could play a significant role in solving a decades-old dilemma (stemming from uncertainty regarding how the tumor will develop over time). By giving paraganglioma patients treatment, severe symptoms can be prevented. However, treating patients who do not actually need it, comes at the cost of unnecessary possible side effects and complications. Improved measurement techniques could enable growth model studies with a large amount of tumor volume data, possibly giving valuable insights into how these tumors develop over time. Therefore, we propose an automated tumor volume measurement method based on a deep learning segmentation model using no-new-UNnet (nnUNet). We assess the performance of the model based on visual inspection by a senior otorhinolaryngologist and several quantitative metrics by comparing model outputs with manual delineations, including a comparison with variation in manual delineation by multiple observers. Our findings indicate that the automatic method performs (at least) equal to manual delineation. Finally, using the created model, and a linking procedure that we propose to track the tumor over time, we show how additional volume measurements affect the fit of known growth functions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Breast cancer screening is a critical component of healthcare, engaging over 2.23 million women annually in the UK. The National Health Service Breast Screening Program requires breast screening readers to participate in the PERFORMS scheme for quality assurance and training. This study investigates the potential of a specificity-focused test set within PERFORMS to reduce false positive recalls and enhance readers’ image interpretation skills in England. The specificity set comprised 60 challenging breast screening cases, including 15 malignant and 45 benign/normal cases. Among the 529 participating readers in the pre-specificity (pre-SP) round, post-specificity (post-SP) round, or both, they were categorized as those who underwent the specificity set (n = 317) and those who did not (n = 212). The post-SP recall rate was significantly lower (36.0%) compared to the pre-SP rate (37.7%) (p = 0.000). This decrease was more pronounced in those who undertook the specificity set (-2.7%) than in those who did not (-0.5%) (p = 0.0018). The correct return to screen rate and positive predictive value (PPV) improved in the post-SP set, with rates of 86.4% and 76.0%, respectively, compared to 82.4% and 68.5% in the pre-SP set (p = 0.000). The increase in correct return to screen and PPV was comparable between those who undertook the specificity set and those who did not (p = 0.0933 and p = 0.2515, respectively). In conclusion, integrating a specificity-focused test set within PERFORMS shows promise in positively impacting breast screening reader performance, offering insights for future training and quality assurance enhancements.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.