
This presentation is dedicated to all who are not
with us today, as a result of a Negligent Technical System Failure throughout
the World…
·
September 11th
·
Therac-25
· Kursk’ tragedy
· Chernobyl

·
The recent Institute of ( IOM
) report on the quality of care, states that hospital errors cause between 44,000 - 98,000
deaths every year in American hospitals.1 - Source: Dr. Brennan, New England Journal of Medicine
·
Although, the use of computer-aided technology is urged and has gained
wide popularity throughout the medical field, its performance according to some
physicians is far from adequate.2
- Source: Dr. Kassirer, New England Journal of Medicine
·
Information Technology has become so
pervasive that some authors have
pointed out that this pervasiveness is a clear sign that we have moved from the
industrial revolution to the information revolution.3(p. 1)
- Source: Martin WainWright Information Management System
·
Yet, as sophisticated as we are, there is a great lack of information
on hospital adverse events, and there is even less available data on computer
related adverse events.
·
An organized and standardized report of errors is not available to
study mistakes and what can be done about them. This type of analysis is available in the aircraft industry and
has contributed greatly to improving the safety of flying.
·
A plea has been made for the institution of a non-partisan data
collection agency to which fatalities and anonymous error reports could be
sent, and which would analyze and publish this data on a periodic basis. 10
- Source: Dr. Myhre, Tranfusion Medical Journal
·
Nonetheless, with all the limitation that we have, there is a universal
attempt to improve the healthcare system today as much as we can not only
within the United States, but also abroad. 11
- Source: Dombal de FT, New England Journal
of Medicine
·
According to Eta S. Berner,
one of the authors of the article on Performance
of four Computer - Based Diagnostic Systems, computer-based diagnostic
systems are available commercially, but there has been limited evaluation of
their performance.
What were the Research
Methods?
Overview
·
10 expert clinicians created a set of 105 diagnostically challenging
clinical case summaries that involved the actual patients.
·
These experts consisted of nationally recognized consultants in the
fields of general internal medicine, 8
subspecialties of internal medicine, and neurology.
·
Clinical data were entered into each program with the vocabulary
provided by the program’s developer.
·
The group of experts produced a ranked list of possible diagnosis for
each patient.
·
Then, each of the systems produced a ranked list of possible diagnoses.
· When they compared, the list of possible diagnoses of the experts with the computer system’s list of possible diagnoses, the scores were calculated on several 5 performance measures for each computer program.
Score
Interpretation/ 5 Measures involved
·
The first 2 scores were based on the entire list of diagnoses that the programs
generated:
1)
Correct Diagnosis Score - reflected the
proportion of the diagnoses generated by the computer that were correct or
closely related to the diagnosis that was considered to be correct. Ex: Is it correct?
2)
Rank Score - reflected the average rank of the
correct ( or closely related ) diagnosis as it appeared on the
computer-generated list. Ex: How
correct was it in terms of a score?
·
3 other scores were derived by reviewing the first 20 diagnoses listed
by each program.
1)
Comprehensiveness Score - reflected the
average proportion of the appropriate diagnoses agreed by the experts that was included on a
computer-generated list. It reflected
the extent to which the computer suggested all the diagnoses that the experts
had not originally listed, but in retrospect they agreed were reasonable to
consider. Ex: How well did the program understood the
patient’s dillemma?
2)
Relevance Score – reflected the average proportion of computer
–generated diagnosis that the experts found reasonable to consider, given the
clinical data. 9 Ex: Was the particular diagnosis relevant?
· Additional Diagnosis Score – reflected the average number of additional diagnoses suggested by the computer that the experts considered appropriate after their final review of other cases. 9 Ex: Are additional diagnoses appropriate or not?
How did they
select their Data?
·
All experts contributed 15 detailed clinical summaries describing
patients who had been referred for diagnostic consultation.
·
The summaries included data such as history, findings of physical
examination, and results of lab tests that were available at the time of
the first initial consultation, and
that indicated both normal and abnormal conditions.
·
The definitive tests information that confirmed the exact diagnosis was
omitted for the purpose of this study.
·
To ensure that data was optimal, program developers were asked to
indicate how they would enter specific clinical data for their particular
programs.
·
The vocabulary selection might
have been biased if the program developers chosen the vocabulary used in a specific context. However, here it was avoided by having them
express it in the language of their program as a master list of a discrete
data. The data was collected previously
from all other cases and listed alphabetically under the general categories of
history, physical examination, and laboratory assessment. 9
How did they
select their Cases?
· All cases involved the entire field of general medicine, including neurology. They were selected to present a spectrum of diagnostic difficulty, but were all considered to be cases in which a physician might be prompted to seek diagnostic help from a colleague.
·
These cases included atypical presentations, rare diseases, multiple
disorders presenting simultaneously, or elements sufficiently complex that the
physician would be likely to request a diagnostic consultation.
·
The group of experts decided
which case was appropriate to consider and which was not. They categorized it by the organ system, or
systems involved, the cause of the disease, and the diagnostic difficulty.
·
After this review, 121/150 cases were finally selected for further
consideration.
What was their Procedure?
·
Using developer’s terms for the clinical data on the master list, the
data from each case was entered into each program.
·
However, due to some program’s limitations, some data
could only be approximated in some programs, or could not be entered at
all.
·
Further, the data was analyzed by each program, and each produced a
list of possible diagnoses.
·
Top 20 diagnoses on each list were combined in a master list. The group of experts reviewed diagnoses for their
appropriateness and correctlness without any prior knowledge on which program
had suggested which diagnosis.
Results of Analyses
·
When all cases were considered, scores for Correct Diagnosis
showed that the mean scores for Dxplain and Meditel were significantly higher
than the scores for Iliad and QMR.
·
For 9 cases, none of the programs included the correct diagnosis. However, when it came to a Rank Score,
due to the fact that the samples varied in size, the significance of the
differences could not be calculated.
·
For the Comprehensiveness score, the mean scores for Dxplain and
Meditel were significantly higher than for Iliad and QMR.
·
Although,
Additional Diagnoses score showed on the average that approximately 2
appropriate diagnoses were generated by all 4 programs which had not been
originally listed by the experts, there
were no significant differences among the systems with regard to this measure.
·
All programs produced moderately long lists of potential
diagnoses. The list included many
diagnoses that a knowledgable physician would regard as not being particularly
helpful in explaining the case or guiding further studies.
·
On the other hand, each program suggested some diagnoses that the
experts later agreed on - were worthy to be included in the future
diagnosis.
·
Although, each program performed better or worse than the others on
some of the performance measures, none performed consistently better or worse
on all the measures.
·
The programs also had additional functions that were not
evaluated. Those functions included:
interactiveness, display of signs and symptoms associated with deseases,
suggestion of potentially relevant laboratory tests. 9
·
The increasing popularity of computer-based diagnostic systems suggests
that at least some physicians have found them helpful.
·
However, such anecdotal data
does not permit a systematic assessment of the clinical contexts in which these
programs are most useful or how they actually perform.
·
This study arouses concern that important diagnostic considerations may
be so obscured by other diagnoses that the value of the program may be
significantly decreased, or that it could lead to excessive or costly
interventions in inexperienced hands.
·
Although, some physicians may
use the programs that were described in this article, most would probably enter
selected key findings and use some of the other functions of the system to
refine the list of diagnoses. 9
·
Medically knowledgeable persons would probably not only decide what
data to enter, but also distinguish between diagnoses that are worthy of
consideration and dismiss many of the poorly integrated diagnoses.16
·
The developers of these systems intend these programs to serve a
prompting function, reminding physicians of diagnoses they may not have
considered or triggerring their thinking about related diagnostic
possibilities. 9
·
The results found stated that no single computer program scored better
than the others on all performance measures. On average, less than half diagnoses on the expert’s original list
were suggested by the 4 programs.
·
Yet, on the average - each
program suggested at least 2 additional diagnoses per case that the experts
found relevant that they had not origianally considered.
·
Clearly, as others indicated, the next step in the evaluation of these
programs will have to include examining the performance of the physician and
the computer together. 9
Discussion