The purpose of this study is to develop a method of ROC analysis to evaluate both the ability of individual readers to
detect abnormal findings and the detectability of abnormal findings in individual cases by applying item response theory
to the results of 1/0 judgments on presence of abnormal findings in CT image readings. The validity of the method was
verified by the following data and methods. Twenty-four readers searched for abnormal findings in 25 cases for which
there were chest CT images with defined abnormal findings. From the 1/0 judgment data for the 25 cases with CT
images (column) read by the 24 readers (row), each reader's potential ability to detect the abnormal findings (θ), the rate
of "1" judgment by each reader, i.e. confidence level for TP and FP, P(θ), and the individual image response
characteristic curves with the image as the item were calculated, from which ROC curves that represent the ability of
each reader to detect abnormal findings were created. In addition, from the 1/0 judgment data for the 25 cases with CT
images (row) read by the 24 readers (column), the potential detectability of abnormal findings for each CT image (θ)
and the rate of "1" judgment for the image by readers, i.e. confidence level for TP and FP, P(θ), were calculated, from
which ROC curves that represent the detectability of the abnormal finding in each case were created.
In this paper, we propose a methodology for evaluating whether the use of CAD is effective for any given reader or
case, first analyzing the results of readers' judgments (0 or 1) by the technique known as analysis of bias-variance
characteristics (BVC)1,2, then by combining this with ROC analysis, elucidating the internal structure of the ROC curve.
The mean and variance are first calculated for the situation when multiple readers examine a medical image for a single
case without CAD and with CAD, and assign the values 0 and 1 to their judgment of whether abnormal findings are
absent or present or whether the case is normal or abnormal. The mean of these values represents the degree of bias
from the true diagnosis for the particular case, and the variance represents the spread of judgments between readers.
When the relationship between the two parameters is examined for several cases with differing degrees of diagnostic
difficulty, the mean (horizontal axis) and variance (vertical axis) show a bell-shaped relation. We have named this
typical phenomenon arising when images are read, the bias-variance characteristic (BVC) of diagnosis. The mean of the
0 and 1 judgments of multiple readers is regarded as a measure of the confidence level determined for the particular
case. ROC curves were drawn by usual methods for diagnoses made without CAD and with CAD. From the difference
between the TPF obtained without CAD and with CAD for the same FPF on the ROC curve, we were able to quantify
the number of cases, the total number of readers, and the total number of cases for which CAD support was beneficial.
To demonstrate its usefulness, we applied this method to data obtained in a reading experiment that aimed to evaluate
detection performance for abnormal findings and data obtained in a reading experiment that aimed to evaluate
diagnostic discrimination performance for normal and abnormal cases. We analyzed the internal structure of the ROC
curve produced when all cases were included, and showed that there is a relationship between the degree of diagnostic
difficulty of the case and the benefit of CAD support and demonstrated that there are patients and readers for whom
CAD is of benefit and those for whom it is not.
The purpose of our research is to make clear the mechanism that a reader (physician or radiological technologist) effectively identify abnormal findings in CT images of lung cancer screening by using with CAD system. A method guessing the 2X2 decision matrix between reader / CAD and reader / reader with CAD was investigated. We suppose the next scene to be it. At first, a reader judges whether abnormal findings per one patient per one CT image are present (1) or absent (0) without CAD results. The second, a reader judges whether abnormal findings are present (1) or absent (0) with CAD results. We expresses the correlation between diagnoses by a reader and CAD system for abnormal cases and for normal cases by following formula using phi correlation coefficient:φ=(cd-ab)/√(a+c)(b+d)(b+c)(a+d). a,b,c,d: 2X2 decision matrix parameters. If TPR1=(a+c)/n, TPR2=(b+c)/n and TPR3=(a+b+c)/n for abnormal cases, TPR3=TPR1+TPR2 - TPR1×TRR2 - φ√TPR1(1-TPR1)TPR2(1-TPR2). Therefore, a=n (TPR3 - TPR1), b=n (TPR3 - TPR2), c=n (TPR1 + TPR2 -TPR3), d=n (1.0 - TPR3). This theory was applied for the experimental data. The 41 students interpreted the same CT images [no training]. A second interpretation was performed after they had been instructed on how to interpret CT images [training], and third was assisted by a virtual CAD [training + CAD]. The mechanism that makes up for a good point of a reader and a CAD with CAD in interpreting CT images was theoretically and experimentally investigated. We concluded that a method guessing the decision matrix (2X2) between a reader and a CAD decided the "presence" or "absence" of abnormal findings explain the improvement mechanism of diagnostic performance with CAD system.
When physicians inspect an image, they make up a certain degree of confidence that the image are abnormal; p(t), or normal; n(t)[n(t)=1-p(t)]. After infinite time of the inspection, they reach the equilibrium levels of the confidence of p*=p(∞) and n*=n(∞). There are psychological conflicts between the decisions of normal and abnormal. We assume that the decision of "normal" is distracted by the decision of "abnormal" by a factor of k(1 + ap), and in an inverse direction by a factor of k(1 + bn), where k ( > 0) is a parameter that relates with image quality and skill of the physicians, and a and b are unknown constants. After the infinite time of inspection, the conflict reaches the equilibrium, which satisfies the equation, k(1 + ap*)n* = k(1 + bn*)p*. Here we define a parameter C, which is 2p*/[p*(1 - p*)]. After the infinite time of inspection, the conflict reaches the equilibrium, which satisfies t that changes in the confidence level with the time (dp/dt) is proportional to [k(1+ap)n - k(1+bn)p], i.e. k[-cp2 + (c - 2)p + 1]. Solving the differential equation, we derived the equation; t(p) and p(t) depending with the parameters; k, c, S. S (0-1) is the value arbitrary selected and related with probability of "abnormal" before the image inspection (S = p(0)).
Image reading studies were executed for CT images. ROC curves were generated both by the traditional 4-step score-based method and by the confidence level; p estimated from the equation t(p) of the DDC model using observed judgment time. It was concluded that ROC curves could be generated by measuring time for dichotomous judgment without the subjective scores of diagnostic confidence and applying the DDC model.
The purpose of this study was to evaluate the performance of computer-aided diagnosis (CAD) system detecting pulmonary nodules for the various CT image qualities of the low dose CT cancer screening. Sixty three chest examinations with sixty-four pulmonary nodules consisting mainly ground-glass opacity (GGO) were used. All the CT images were acquired by using a multi-slice CT scanner Asteion with 4 detector rows system (Toshiba Medical Systems, Japan) with 0.75-second rotating time and 30mA. After the examination, CT image reconstructions were performed for every CT data set using seven reconstruction kernels and three sorts of slice thickness. Totally twenty-one data sets for a patient, namely 1323 data sets with about 60 thousands CT images which is 30.1GB data sets were investigated. Nodule detections were carried out using a computer-aided diagnosis system developed by Fujitsu Ltd, Japan. The mean nodule size was 0.69±0.28 (SD)[cm](range, 0.3-1.7cm). The CAD system identified 42 to 48 nodules out of the 64 nodules, in the slice thickness of 8mm for the seven reconstruction kernels, yielding a true-positive rate (TPR) of 65% to 75%. In the slice thickness of 5mm our CAD system indicates a TPR from 70% to 80%. In the slice thickness 10mm, TPR were resulted from 50% to 64%. Some kernel indicated relatively high TPR with high FP, other kernel showed high sensitivity with relatively low FP. CT image data sets with multi-reconstruction conditions is useful in assessing the robust characteristics of a CAD system detecting pulmonary nodule by multi-slice low dose CT screening.
The increasing number of CT images to be interpreted in mass screening requires radiologists to interpret a huge number of CT images, and the capacity for screening has therefore been limited by the capacity to process images. To remedy this situation we considered paramedical staff, especially radiological technologists, as "potential screeners," and investigated their capacity to detect abnormalities in CT images of lung cancer screening with and without the assistance of a computer-aided diagnosis (CAD) system. We then compared their performances with those of physicians. A set of 100 slices of thoracic CT images from 100 cases ( 73 abnormal and 27 normal), one slice per case, was interpreted by 43 paramedical college students. A second interpretation by the students was performed after they had been instructed on how to interpret CT images, and a third interpretation was assisted by a virtual CAD system. We calculated the areas under the ROC curve (Az values) for both students and physicians. For the first set of interpretations, the Az values of 40% out of students placed the Az values within the range of Az values of the physicians, which varied from 0.870 to 0.964. For the second set of interpretations after the students had been instructed on CT image interpretation, the students' rate was 86%, and for the third set of virtual CAD-assisted interpretations it was 95%. The performance of paramedical college students in detecting abnormalities from thoracic CT images proved to be sufficient to qualify them as "potential screeners."
We carried out an observer performance study for evaluating the performance of 16 radiologists without and with a computer-aided diagnostic (CAD) scheme for determination of the likelihood of malignancy of lung nodules on HRCT in a database of 28 primary lung cancers and 28 benign nodules. The results of our observer study showed that radiologists’ performance was improved with the CAD scheme, and their performance with the CAD scheme was greater than either radiologists alone or computer alone. Our purpose in this study was to analyze radiologists’ responses with the CAD scheme in their task of differentiation between malignant and benign nodules on HRCT. Our results indicated that the average change in radiologists’ ratings (difference between radiologists' ratings with the CAD scheme and radiologists' initial ratings) was strongly related to (A) the likelihood of malignancy (the computer output) and also (B) the difference of their initial ratings from the computer outputs, where the correlation coefficients were 0.93 and 0.90, respectively. Our detailed analysis showed that radiologists changed the majority of their ratings in agreement with the computer results, and the majority of these changes contributed to the improvement in their performance. They were able to maintain some of their correct ratings despite incorrect computer results. For some cases, they increased their confidence in their judgments above the computer output. Thus, the improvement in radiologists' performance above the computer performance was produced by the synergistic effect of the radiologists' decision making and the computer outputs.
In this paper we present two methods of evaluating the effectiveness of double check (by two radiologists or by a CAD system and a radiologist): One method uses ROC analysis and the other uses the phi correlation coefficient (φ). We used the first method to evaluate the effectiveness of two radiologists conducting double check through discussion (i.e. the radiologists confer; conference system). We used the second method to evaluate the effectiveness of double check in which Reader 2 makes a final assessment by referring to the assessment of Reader 1 (reference system). It is suggested that double check conducted by two radiologists through discussion may not be so effective; however, double check in which Reader 2 makes a final assessment by referring to the assessment or Reader 1 may be very effective. In addition, we discuss problems that may occur in relation to Reader 2 deciding whether to adopt the assessment of Reader 1, and practical models of double check by a CAD system and a radiologist. Continued research is necessary to establish a double check system that improves diagnostic accuracy in practical situations, i.e. it is unknown if assessments are correct.
The objective of this study was to measure the image exploration activity of physicians, and thereby contribute to the development of a support system for CRT image interpretation in thoracic CT screening. In this study, we examined how the pupil diameters of five physicians changes over time during interpretation of a large quantity of CT images on a CRT monitor, and how this might be related to the accuracy of diagnosis. The study showed that, when a large quantity of CT images were viewed through a CRT monitor in a dimly lit room, the pupil diameter decreased during the second half of the long interpretation procedure in three of the five physicians. Furthermore, the pupil diameter frequently became approximately zero because the physician became drowsy. However, when the relationship between these phenomena and the accuracy of diagnosis was analyzed in one of the physicians, proof that such phenomena might lead to statistically significant false negatives or false positives was not found. Despite such results, the potential risk of misdiagnosis cannot be ignored. It may be necessary to devise both equipment and work conditions that will not cause the pupil diameter to become approximately zero during interpretation of images on a CRT monitor.
In this study, we investigated a pattern-classification technique which can be trained with a small number of cases with a massive training artificial neural network (MTANN) for reduction of false positives in computerized detection of lung nodules in low-dose CT (LDCT). The MTANN consists of a modified multilayer artificial neural network (ANN), which is capable of operating on image data directly. The MTANN is trained by use of a large number of sub-regions extracted from input images together with the teacher images containing the distribution for the "likelihood of being a nodule." The output image is obtained by scanning of an input image with the MTANN. In the MTANN, the distinction between nodules and non-nodules is treated as an image-processing task, in other words, as a highly nonlinear filter that performs both nodule enhancement and non-nodule suppression. This allows us to train the MTANN not on a case basis, but on a sub-region basis. Therefore, the MTANN can be trained with a very small number of cases. Our database consisted of 101 LDCT scans acquired from 71 patients in a lung cancer screening program. The scans consisted of 2,822 sections, and contained 121 nodules including 104 nodules representing confirmed primary cancers. With our current CAD scheme, a sensitivity of 81.0% (98/121 nodules) with 0.99 false positives per section (2,804/2,822) was achieved. By use of the MTANN trained with a small number of training cases (n=10), i.e., five pairs of nodules and non-nodules, we were able to remove 55.8% of false positives without a reduction in the number of true positives, i.e., a classification sensitivity of 100%. Thus, the false-positive rate of our current CAD scheme was reduced from 0.99 to 0.44 false positive per section, while the current sensitivity (81.0%) was maintained.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.