My trip home from vacation on Tuesday was interrupted when the airline decided the flight was too heavy. We stayed any extra day in Mexico, which sounds great, expect that day was spent moving four young kids to and from an airport, and hanging out in an airport hotel. I am back now, and have a backlog of stuff I would like to write about. Without further adieu…
Josh Tamayo-Sarver is an ER doc who is worried about using ChatGPT for medical diagnosis. He writes about it on Medium:
I anonymized my History of Present Illness notes for 35 to 40 patients — basically, my detailed medical narrative of each person’s medical history, and the symptoms that brought them to the emergency department — and fed them into ChatGPT.
The specific prompt I used was, “What are the differential diagnoses for this patient presenting to the emergency department [insert patient HPI notes here]?”
The results were fascinating, but also fairly disturbing.
What Josh found disturbing was that ChatGPT made some mistakes that he would not have made:
OpenAI’s chatbot did a decent job of bringing up common diagnoses I wouldn’t want to miss — as long as everything I told it was precise, and highly detailed. Correctly diagnosing a patient as having nursemaid’s elbow, for instance, required about 200 words; identifying another patient’s orbital wall blowout fracture took the entire 600 words of my HPI on them.
For roughly half of my patients, ChatGPT suggested six possible diagnoses, and the “right” diagnosis — or at least the diagnosis that I believed to be right after complete evaluation and testing — was among the six that ChatGPT suggested.
Not bad. Then again, a 50% success rate in the context of an emergency room is also not good.
He goes onto detail some examples of the mistakes, including a woman with a ectopic pregnancy. He explains that it is not uncommon for pregnant women to come into the ER not knowing they are pregnant. When they are asked if they are pregnant they might say “I can’t be”. What they mean is “It would be really bad if I was, so I deny the possibility”, rather than “it is not possible that I could be.” Josh does not think GPT has the “experience” to be able to understand that a woman could be pregnant even though she denies it, which explains why ChatGPT’s diagnosis did not even suggest that this particular patient could have been pregnant.
Josh argues that ChatGPT passed the Medical Licensing Exam, “Not because it’s “smart,” but because the classic cases in the exam have a deterministic answer that already exists in its database”. He claims, “ChatGPT rapidly presents answers in a natural language format (that’s the genuinely impressive part), but underneath that is a knowledge retrieval process similar to Google Search. And most actual patient cases are not classic.”
It is possible that Josh is right and that AI is just a nice way to do Google search. It is also possible that Upton Sinclair was right and that “it is difficult to get a man to understand something, when his salary depends on his not understanding it.”
Why Change?
I am working with a company right now that is over-spending on every marketing channel. In almost every case their CAC is 4-10x their CLV (customer lifetime value). On every call we examine a new marketing channel and find that the immediate answer should be “cut back on spend now!”. Stop wasting money TODAY! They don’t. On a successful call they will agree to reduce spend somewhat, or to run a test on incrementally or some other activity. Anything to avoid admitting that they do not have viable marketing spend, and they need to reduce it by ~90% and start searching for ROI from scratch.
Another company is employing a team of content generators. When I asked them how they are using generative AI, they say they are not. They have legal concerns. They are convinced it would not work for their business. And the content team itself has never even looked at ChatGPT.
It is very hard to change. Most people, most of the time, like what they are doing and they don’t want to change. It is easy to find excuses for why new things aren’t going to work in your special case.
Josh may be right that ChatGPT will not be able to catch edge cases, and that a competent doctor will do much better, and if doctor diagnosis are replaced then patients will die. But I don’t think so.
Josh’s Medium post went live on Medium last week, but it was originally published in Fast Company on March 13th — the day before ChatGPT launched with GPT-4. GPT-4 is an order of magnitude better than GPT-3.5, and I would be shocked if 4.0 would miss entropic pregnancies. Google’s Med-PaLM-2 is apparently even better than GPT-4 at medical diagnosis. If Josh thinks his job is safe because GPT is not going to be as good at diagnosis, he is in for a surprise.
Josh’s job may be safe, but it won’t be because AI isn’t good at his job. His job is safe for the same reason my client is continuing to spend on marketing channels that are ROI negative. Talking to an AI doctor would be weird and different. And most people would rather talk to a “Josh” who makes mistakes, than an AI that saves their life.
Eventually that will change. The company will cut their bad marketing spend. The organization will cut their over-staffed content creators. People will be willing to talk to a nurse-specialist sitting in front of GPT-5. But it will take some time. Josh is likely safe until he retires. And he will likely rationalize why the AI can’t do his job long after that.
Keep it simple,
Edward
P.S., After about 30 minutes of attempts I could not get MidJourney to create an image of an ostrich with its head in the sand (even without the doctor scrubs and lab coat). If you can come up with a prompt that makes that happen I would love to hear from you (and I will update the image). I even tried reverse engineering it with /describe functions — to no avail.