7 Questions for Dr. John Halamka

By Brian Doty, principal, Deloitte Consulting LLP, and Jay Bhatt, D.O., managing director of the Deloitte Health Equity Institute and the Deloitte Center for Health Solutions, Deloitte Services, LP

Generative artificial intelligence (AI) potentially holds enormous promise for health care and could usher in a new era of tools. But the technology is still evolving, the accuracy is not yet reliable, and few rules or regulatory guardrails exist.

In April, the Coalition for Health AI (CHAI) released its 24-page Blueprint for Trustworthy AI and is finalizing an AI Maturity Evaluation tool, which could help health care organizations determine their readiness and capacity to use AI.¹ John Halamka, M.D., M.S. cofounded the CHAI, which includes representatives from federal regulatory agencies, the White House, large technology companies, and leading academic health systems. The Coalition’s objective is to provide guidelines for the use of health AI tools, to ensure the quality of the information, and to increase credibility among users. Dr. Halamka is also president of the Mayo Clinic Platform and the author of several books on the use of technology in health care. We recently had an opportunity to talk with Dr. Halamka about the current state of generative AI and its potential implications for health care. Here is an excerpt from that conversation:

Brian: We have been involved in a lot of discussions about generative AI and the role it might play in health care. What is predictive AI versus generative AI?

Dr. Halamka: AI is an enormous field that describes a set of decision-support rules, or Bayesian probabilities. Predictive AI guesses an outcome based on continuous training and observations that may or may not be supervised or reinforced by humans. Predictive AI is math, not magic. Generative AI is a completely different animal. It could be used to generate a picture or text. Large language models (LLM) can generate human-like communication. The AI anticipates the next word in a sentence based on billions of previous human communications. It might also be able to predict what a patient’s ECG might look like five minutes into the future. It is compelling technology, but it is often inaccurate. It is possible to measure the quality of predictive AI because it is possible to determine if a prediction was right or wrong. Measuring the quality of generative AI is much more challenging.

Jay: What do you see as some of the top challenges for health system executives, and how do you think generative AI might be used to address them?

Dr. Halamka: Hospital CEOs tend to share three core business challenges: Negative margins, recruitment and retention of staff, and staff burnout. What if generative AI could remove some of the administrative and documentation burdens, help with the pre-op workflow, and assist with appealing claims denials? There are many potential low-risk use-cases that could result in more time for clinicians to spend with patients. But what if payers start to use generative AI to deny claims, and health systems respond by using generative AI to appeal claims? Before you know it, generative AI is battling generative AI and 1 billion transactions are taking place per second! It's a silly example, but it illustrates the power of this technology and the scenarios that need to be considered.

Brian: You recently wrote a blog about a cardiology patient whose diagnosis was produced by generative AI. While the prediction was compelling, it was inaccurate based on what actually happened. What do you think needs to happen from an ethical perspective, from an accuracy perspective, and from a regulatory perspective to improve the accuracy and increase the trust of generative AI?

Dr. Halamka: Current generative models were not trained on health care data. They were trained on various news and information websites. It can cost hundreds of millions of dollars to train a sophisticated LLM. If protected health information (PHI) or personally identifiable information (PII) was included in that training, you still might need to throw it away. There is some optimism that—given the nature of data and interoperability and the ability to de-identify information—that good health data could be found and used to retrain foundation LLM models. Or you could take a LLM that exists and refine the parameters to understand issues in other areas, like oncology or cardiology. Imagine a LLM is connected to a search engine and able to read everything about a topic from the National Library of Medicine, The Mayo Clinic, and other trusted sources of medical information, and summarize it. That is probably going to be useful.

Brian: Can you define prompt engineering? How does it work?

Dr. Halamka: Let's talk through a real use-case using prompt engineering. We fed information about a simple patient case into an algorithm. It was a 59-year-old with substernal chest pain, shortness of breath, hypotension, and left leg pain. The LLM, which wasn't reading medical literature, had information from about a billion people. The AI concluded the patient had a myocardial infarction. But that diagnosis was incorrect. The patient had a dissecting aortic aneurysm. If the clinical team had accepted that diagnosis and followed the chatbot instructions, they would likely have killed the patient. The problem was that we asked AI for a diagnosis. We re-did the experiment and asked, “What is the one condition I shouldn’t miss?” It said, “dissecting aortic aneurysm.” Depending on how a question is asked, the AI could generate valuable and usable information or a hallucination. That's what prompt engineering is all about. [An AI hallucination or confabulation occurs when the model fills knowledge gaps with plausible sounding information that might not be accurate.²]

Jay: You are a Harvard Medical School professor who served in both the Barack Obama and George W. Bush administrations, and you have also worked with the Biden administration. What role do you think government should play, if any, when it comes to regulating AI?

Dr. Halamka: The label on a can of soup lists all of the ingredients and nutrition information. The shopper might decide not to buy it after seeing that it contains 1,000 milligrams of sodium and is 5,000 calories. AI currently lacks a soup label. An AI model doesn't disclose the training set that was used. It might perform better for some groups than others. Regulatory agencies would like to see more transparency. A so-called data card could help people understand how an LLM was trained and how it performs. Fair, Appropriate, Valid, Equitable and Safe (FAVES) is an acronym for evaluating AI algorithms. The government should set the standards so that private industry can run the metrics and provide the evidence for the transparency. When a purchaser gets something, they will have a sense of how useful it is likely to be.

JAY: A recent study found that most patients thought chatbot responses were more empathetic and just as accurate as comments from clinicians.³ How do you think generative AI might impact the patient-clinician relationship?

Dr. Halamka: The generative AI simply takes billions of human communications and creates patterns. It is programmed to produce compelling, highly readable text in plain English. But that information can be flawed. Imagine an 80-year-old patient whose husband died 10 years earlier. The AI might take that information and conclude the patient is a widow. But that might not be the case. The AI might not know that the patient had remarried. That's why AI results always require human review.

Brian: There is a lot of work and complexity involved in deploying AI at any level. What do you think a typical health care organization needs to have in place before considering an AI solution?

Dr. Halamka: Mayo has 160 predictive algorithms. We have selected a couple of affiliate hospitals and provided them with what might be described as an AI starter pack for early detection. These are small community hospitals that might not have the resources or staff to deploy AI on their own, but an experienced partner could get them started. We picked use cases like a fracture or bleed in the head or pneumo peritoneum—issues that you would not want to miss. AI might be used to evaluate all radiology. That can be done passively. Predictive models for cardiology could help clinicians understand who needs which services and when. Once a hospital’s clinicians understand AI’s potential with these use-cases, they might want to expand to other care areas, such as neurology. The affiliate hospitals are not owned by Mayo and do not use the same infrastructure, which demonstrates the transferability of these algorithms across organizations.

We believe AI has the potential to revolutionize some aspects of health care in areas such as medical imaging, disease diagnosis, and drug discovery. It also could take on repetitive tasks so that clinicians have more time to spend with patients. However, it is important that health care professions are mindful of trust and the ethical use of such technologies.

>> Read on Deloitte's website

The executive’s participation in this article is solely for educational purposes based on their knowledge of the subject and the views expressed by them are solely their own. This article should not be deemed or construed to be for the purpose of soliciting business for any of the companies mentioned, nor does Deloitte advocate or endorse the services or products provided by these companies.

This publication contains general information only and Deloitte is not, by means of this publication, rendering accounting, business, financial, investment, legal, tax, or other professional advice or services. This publication is not a substitute for such professional advice or services, nor should it be used as a basis for any decision or action that may affect your business. Before making any decision or taking any action that may affect your business, you should consult a qualified professional advisor.

Deloitte shall not be responsible for any loss sustained by any person who relies on this publication.

Global AI events calendar

Intelligent Health

13-14 September 2023

Basel, Switzerland

intelligenthealth.ai

World Summit AI

11-12 October 2023

Amsterdam, Netherlands

worldsummit.ai

World AI Week

9-13 October 2023

Amsterdam, Netherlands

worldaiweek.ai

World Summit AI Americas

24-25 April 2024
Montréal, Canada

americas.worldsummit.ai

Share your content with the Intelligent Health community

Got some interesting content you want to share with our community of AI and health Brains? You can send us anything from a published piece you have written online, white paper, article or interview. Submit it here

BLOG

Generative AI holds enormous promise for health care

Global AI events calendar

Share your content with the Intelligent Health community

Featured posts

Posts by Tag

Subscribe to the blog