Large-scale population studies have examined the detection of sinus opacities in cranial MRIs. Deep learning methods, specifically 3D convolutional neural networks (CNNs), have been used to classify these anomalies. However, CNNs have limitations in capturing long-range dependencies across the low and high level features, potentially reducing performance. To address this, we propose an end-to-end pipeline using a novel deep learning network called ConTra-Net. ConTra-Net combines the strengths of CNNs and self-attention mechanisms of transformers to classify paranasal anomalies in the maxillary sinuses. Our approach outperforms 3D CNNs and 3D Vision Transformer (ViT), with relative improvements in F1 score of 11.68% and 53.5%, respectively. Our pipeline with ConTra-Net could serve as an alternative to reduce misdiagnosis rates in classifying paranasal anomalies.
The increasing incidence of laryngeal carcinomas requires approaches for early diagnosis and treatment. In clinical practice, white light endoscopy of the laryngeal region is typically followed by biopsy under general anesthesia. Thus, image based diagnosis using optical coherence tomography (OCT) has been proposed to study sub-surface tissue layers at high resolution. However, accessing the region of interest requires robust miniature OCT probes that can be forwarded through the working channel of a laryngoscope. Typically, such probes generate A-scans, i.e., single column depth images, which are rather difficult to interpret. We propose a novel approach using the endoscopic camera images to spatially align these A-scans. Given the natural tissue motion and movements of the laryngoscope, the resulting OCT images show a three-dimensional representation of the sub-surface structures, which is simpler to interpret. We present the overall imaging setup and the motion tracking method. Moreover, we describe an experimental setup to assess the precision of the spatial alignment. We study different tracking templates and report root-mean-squared errors of 0.08mm and 0.18mm for sinusoidal and freehand motion, respectively. Furthermore, we also demonstrate the in-vivo application of the approach, illustrating the benefit of spatially meaningful alignment of the A-scans to study laryngeal tissue.
Deep learning (DL) algorithms can be used to automate paranasal anomaly detection from Magnetic Resonance Imaging (MRI). However, previous works relied on supervised learning techniques to distinguish between normal and abnormal samples. This method limits the type of anomalies that can be classified as the anomalies need to be present in the training data. Further, many data points from normal and anomaly class are needed for the model to achieve satisfactory classification performance. However, experienced clinicians can segregate between normal samples (healthy maxillary sinus) and anomalous samples (anomalous maxillary sinus) after looking at a few normal samples. We mimic the clinicians ability by learning the distribution of healthy maxillary sinuses using a 3D convolutional auto-encoder (cAE) and its variant, a 3D variational autoencoder (VAE) architecture and evaluate cAE and VAE for this task. Concretely, we pose the paranasal anomaly detection as an unsupervised anomaly detection problem. Thereby, we are able to reduce the labelling effort of the clinicians as we only use healthy samples during training. Additionally, we can classify any type of anomaly that differs from the training distribution. We train our 3D cAE and VAE to learn a latent representation of healthy maxillary sinus volumes using L1 reconstruction loss. During inference, we use the reconstruction error to classify between normal and anomalous maxillary sinuses. We extract sub-volumes from larger head and neck MRIs and analyse the effect of different fields of view on the detection performance. Finally, we report which anomalies are easiest and hardest to classify using our approach. Our results demonstrate the feasibility of unsupervised detection of paranasal anomalies from MRIs with an AUPRC of 85% and 80% for cAE and VAE, respectively.
Clinical practitioners consider an abnormal cell metabolism as hallmark of carcinogenesis. Cellular energy metabolism is accessible by imaging of the fast autofluorescence decay of the endogenous fluorophore NADH. This technique is called metabolic fluorescence lifetime imaging (metabolic FLIM), best performed with multiphoton excitation and the rapid, precise and quantitative TCSPC technology from Becker&Hickl GmbH. However, conventional multiphoton FLIM microscopes rely on surface layer tissue access or excised tissue samples. Imaging inside the body for medical diagnostics is routinely accomplished with endoscopes. Here we present multiphoton metabolic FLIM of NADH performed through an endoscope, with significant applications in oncology and beyond.
Colorectal Cancer(CRC) poses a great risk to public health. It is the third most common cause of cancer in the US. Development of colorectal polyps is one of the earliest signs of cancer. Early detection and resection of polyps can greatly increase survival rate to 90%. Manual inspection can cause misdetections because polyps vary in color, shape, size and appearance. To this end, Computer-Aided Diagnosis systems(CADx) has been proposed that detect polyps by processing the colonoscopic videos. The system acts a secondary check to help clinicians reduce misdetections so that polyps may be resected before they transform to cancer. Polyps vary in color, shape, size, texture and appearance. As a result, the miss rate of polyps is between 6% and 27% despite the prominence of CADx solutions. Furthermore, sessile and flat polyps which have diameter less than 10 mm are more likely to be undetected. Convolutional Neural Networks(CNN) have shown promising results in polyp segmentation. However, all of these works have a supervised approach and are limited by the size of the dataset. It was observed that smaller datasets reduce the segmentation accuracy of ResUNet++. Self-supervision is a stronger alternative to fully supervised learning especially in medical image analysis since it redresses the limitations posed by small annotated datasets. From the self-supervised approach proposed by Jamaludin et al., it is evident that pretraining a network with a proxy task helps in extracting meaningful representations from the underlying data which can then be used to improve the performance of the final downstream supervised task. In summary, we train a U-Net to inpaint randomly dropped out pixels in the image as a proxy task. The dataset we use for pretraining is Kvasir-SEG dataset. This is followed by a supervised training on the limited Kvasir-Sessile dataset. Our experimental results demonstrate that with limited annotated dataset and a larger unlabeled dataset, self-supervised approach is a better alternative than fully supervised approach. Specifically, our self-supervised U-Net performs better than five segmentation models which were trained in supervised manner on the Kvasir-Sessile dataset.
Early detection of head and neck tumors is crucial for patient survival. Often, diagnoses are made based on endoscopic examination of the larynx followed by biopsy and histological analysis, leading to a high interobserver variability due to subjective assessment. In this regard, early non-invasive diagnostics independent of the clinician would be a valuable tool. A recent study has shown that hyperspectral imaging (HSI) can be used for non-invasive detection of head and neck tumors, as precancerous or cancerous lesions show specific spectral signatures that distinguish them from healthy tissue. However, HSI data processing is challenging due to high spectral variations, various image interferences, and the high dimensionality of the data. Therefore, performance of automatic HSI analysis has been limited and so far, mostly ex-vivo studies have been presented with deep learning. In this work, we analyze deep learning techniques for in-vivo hyperspectral laryngeal cancer detection. For this purpose we design and evaluate convolutional neural networks (CNNs) with 2D spatial or 3D spatio-spectral convolutions combined with a state-of-the-art Densenet architecture. For evaluation, we use an in-vivo data set with HSI of the oral cavity or oropharynx. Overall, we present multiple deep learning techniques for in-vivo laryngeal cancer detection based on HSI and we show that jointly learning from the spatial and spectral domain improves classification accuracy notably. Our 3D spatio-spectral Densenet achieves an average accuracy of 81%.
Here we present a study where we used in vivo hyperspectral imaging (HSI) for the detection of upper aerodigestive tract (UADT) cancer. Hyperspectral datasets were recorded in 100 patients before surgery in vivo. We established an automated data interpretation pathway that can classify the tissue into healthy and tumorous using, different deep learning techniques. Our method is based on convolutional neural networks (CNNs) with 2D spatial or 3D spatio-spectral convolutions combined with a state-of-the-art Densenet architecture. Using both the spatial and spectral domain improves classification accuracy notably. Our 3D spatio-spectral Densenet classification method achieves an average accuracy of over 80%.
Tumors of the upper respiratory tract are the sixth most common tumor entity in humans. Currently a dedicated screening method enabling a direct onsite diagnosis is missing. This can lead to delayed diagnoses and worse outcomes of the patients. An optical method enabling a direct distinction between healthy tissue, dysplastic tissue and cancerous tissue would be an ideal tool for the detection of tumors of the upper respiratory tract. In this study we used fluorescence lifetime imaging (FLIM) of NADH and FAD to image the metabolic state in different tissue samples of the upper aerodigestive tract (UADT). Due to the different metabolic pathways that are active in healthy and tumor cells their metabolic states differ significantly. FLIM datasets of tissue samples from 25 patients were recorded directly after surgery ex vivo in a special tissue culture medium at 37°C on a dedicated microscope using multiphoton excitation. By calculating the fluorescence-lifetime redox ratio (FLIRR) based on the FLIM measurements, we were able to visualize the metabolic state of the cells. We found that healthy tissue, dysplastic tissue and cancerous tissue showed significant differences in the FLIRR. This study suggests that the FLIRR might be a sensitive and robust parameter for the differentiation of cancerous and pre-cancerous UADT tissue and that optical metabolic imaging could be a valuable tool for an early tumor diagnosis within this area.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.