Can a computer aid in the quest to eliminate diagnostic lapses?

Katie Hafner / New York Times News Service /

SAN FRANCISCO — The man on stage had his audience of 600 mesmerized. Over the course of 45 minutes, the tension grew. Finally, the moment of truth arrived, and the room was silent with anticipation.

At last he spoke. “Lymphoma with secondary hemophagocytic syndrome,” he said. The crowd erupted in applause.

Professionals in every field revere their superstars, and in medicine the best diagnosticians are held in particularly high esteem. Dr. Gurpreet Dhaliwal, 39, a self-effacing associate professor of clinical medicine at the University of California, San Francisco, is considered one of the most skillful clinical diagnosticians in practice.

The case Dhaliwal was presented, at a medical conference last year, began with information that could have described hundreds of diseases: The patient had intermittent fevers, joint pain, and weight and appetite loss.

To observe him at work is like watching Steven Spielberg tackle a script or Rory McIlroy a golf course.

He was given new information bit by bit — lab, imaging and biopsy results. Over the course of the session, he drew on an encyclopedic familiarity with thousands of syndromes. He deftly dismissed red herrings while picking up on clues that others might ignore, gradually homing in on the accurate diagnosis.

Just how special is Dhaliwal’s talent? More to the point, what can he do that a computer cannot? Will a computer ever successfully stand in for a skill that is based not simply on a vast fund of knowledge but also on more intangible factors like intuition?

The history of computer-assisted diagnostics is long and rich. In the 1970s, researchers at the University of Pittsburgh developed software to diagnose complex problems in general internal medicine; the project eventually resulted in a commercial program called Quick Medical Reference. Since the 1980s, Massachusetts General Hospital has been developing and refining DXplain, a program that provides a ranked list of clinical diagnoses from a set of symptoms and laboratory data.

And IBM, on the heels of its triumph last year with Watson, the Jeopardy-playing computer, is working on Watson for Healthcare.

In some ways, Dhaliwal’s diagnostic method is similar to that of another IBM project: the Deep Blue chess program, which in 1996 trounced Garry Kasparov, the world’s best player at the time, to claim an unambiguous victory in the computer’s relentless march into the human domain.

Although lacking consciousness and a human’s intuition, Deep Blue had millions of moves memorized and could analyze as many each second. Dhaliwal does the diagnostic equivalent, though at human speed.

Since medical school, he has been an insatiable reader of case reports in medical journals, and case conferences from other hospitals. At work he occasionally uses a diagnostic checklist program called Isabel, just to make certain he hasn’t forgotten something. But the program has yet to offer a diagnosis that Dhaliwal missed.

Dhaliwal regularly receives cases from physicians who are stumped by a set of symptoms. At medical conferences, he is presented with one vexingly difficult case and is given 45 minutes to solve it. It is a medical high-wire act; doctors in the audience squirm as the set of facts gets more obscure and all the diagnoses they were considering are ruled out. After absorbing and processing scores of details, Dhaliwal must commit to a diagnosis. More often than not, he is right.

When working on a difficult case in front of an audience, Dhaliwal puts his entire thought process on display, with the goal of “elevating the stature of thinking,” he said. He believes this is becoming more important because physicians are being assessed on whether they gave the right medicine to a patient, or remembered to order a certain test.

Without such emphasis, physicians and training programs might forget the importance of having smart, thoughtful doctors.

“Because in medicine,” Dhaliwal said, “thinking is our most important procedure.”

He added: “Getting better at diagnosis isn’t about figuring out if someone has one rare disease versus another. Getting better at diagnosis is as important to patient quality and safety as reducing medication errors, or eliminating wrong site surgery.”

Clinical precision

Dhaliwal does half his clinical work on the wards of the San Francisco VA Medical Center, and the other half in its emergency department, where he often puzzles through multiple mysteries at a time.

One recent afternoon in the ER, he was treating a 66-year-old man who was mentally unstable and uncooperative. He complained of hip pain, but routine lab work revealed that his kidneys weren’t working and his potassium was rising to a dangerous level, putting him in danger of an arrhythmia that could kill him — perhaps within hours. An ultrasound showed that his bladder was blocked.

There was work to be done: drain the bladder, correct the potassium level. It would have been easy to dismiss the hip pain as a distraction; it didn’t easily fit the picture. But Dhaliwal’s instinct is to hew to the ancient rule that physicians should try to come to a unifying diagnosis. In the end, everything — including the hip pain — was traced to metastatic prostate cancer.

“Things can shift very quickly in the emergency room,” Dhaliwal said. “One challenge of this, whether you use a computer or your brain, is deciding what’s signal and what’s noise.” Much of the time, it is his intuition that helps figure out which is which.

An expert clinical diagnostician like Dhaliwal might make a decision without being able to explain exactly what is going on in the back of his mind, as his subconscious continuously sifts the wheat from the chaff.

While computers are good at crunching numbers, people are naturally good at matching patterns. To make a decision, physicians must combine logic and knowledge with their pattern-matching instincts.

Isabel, the diagnostic program that Dhaliwal sometimes uses, was created by Jason Maude, a former money manager in London, who named it for his daughter. At age 3, Isabel came down with chickenpox and doctors failed to spot a far more dangerous complication — necrotizing fasciitis, a flesh-eating infection. By the time the disease was identified, Isabel had lost so much flesh that at age 17 she is still having plastic surgery.

Maude said that while someone like Dhaliwal would probably have thought of necrotizing fasciitis, his daughter’s doctors were so stuck in what is called anchoring bias — in this case, Isabel’s simple chickenpox — they couldn’t see beyond it.

Had they entered her symptoms — high fever, vomiting, skin rash — into a diagnostic program, Maude said, the problem would probably have been identified.

Thousands of diseases are known, and many are rare.

“Low-frequency events are hard to put on the brain’s palette, and that’s part of Isabel’s strength,” Maude said. “It’s impossible for any one person to remember how each of those diseases presents, because each presents with a different pattern.”

He added that Isabel was aimed not so much at the Dhaliwals of the world, but at more typical physicians.

Dr. David Brailer, chief executive of Health Evolution Partners, which invests in health care companies, agreed.

“If everyone was a diagnostic genius, we wouldn’t need these decision support tools,” he said.

Diagnostic mistakes account for about 15 percent of errors that result in harm to patients, according to the Institute of Medicine. Yet diagnostic software has been slow to make its way into clinical settings, and Dhaliwal, who uses Isabel as a “second check,” said he could understand why.

Not only is it hard to integrate software into an already busy daily work flow, he said, but “most of us don’t think we need help at diagnosis, especially with routine cases, which account for the majority of our work.”

Dr. Henry Lowe, an internist at Stanford University and director of its Center for Clinical Informatics, doubts that a computer could ever replace a diagnostic wizard like Dhaliwal, or even a competent clinician.

“Designing computer systems that work well with incomplete or imprecise information is challenging,” Lowe said. “Particularly in medicine, where the consequences of defective decision-making may be catastrophic.”

Mimicking human analysis

IBM’s Watson for Healthcare has yet to focus directly on diagnosis. The company is working with Memorial Sloan-Kettering Cancer Center to teach Watson to interpret clinical information and, eventually, help determine treatment. IBM also recently began a collaboration with Cleveland Clinic to broaden Watson’s analytical capabilities into the area of medicine.

Dr. Martin Kohn, chief medical scientist for IBM Research, is careful to point out that Watson for Healthcare is intended to be “neither omniscient nor omnipotent.” Yet, Kohn noted, most physicians set aside five hours or less each month to read medical literature, while Watson can analyze the equivalent of thousands of textbooks every second. The program relies heavily on natural language processing. It can understand the nature of a question and review large amounts of information, such as a patient’s electronic medical record, textbooks and journal articles, then offer a list of suggestions with a confidence level assigned to each.

For physicians, Kohn said, one problem is what he calls “the law of availability.”

“You aren’t going to put anything on a list that you don’t think is relevant, or didn’t know to think of,” he said. “And that could limit your chances of getting a correct diagnosis.”

Dhaliwal agreed, citing the recent outbreak of hantavirus at Yosemite. Ten people contracted the virus, and three died. “It’s a febrile illness that looks like the flu,” he said. “It’s so rare, the last time you might have seen it was your medical school classroom.”

Had Isabel or a similar program been used, the deaths might have been prevented, Dhaliwal said. “You might think you’re in familiar territory, but the computer is here to remind you there are other things.”