Monthly Roundup: Summarizing clinical information with LLMs, digital care models bypassing traditional healthcare providers, user centered design for designing AI tools

Takeaways from articles about research, case study spotlights, and industry trends

Mar 07, 2024

Large language models perform better than medical experts in summarizing clinical information

Many outside of healthcare underestimate how many tasks performed by highly trained experts simply involve summarizing and communication information to another individual. Offloading this chunk of work to AI will have a transformative impact on healthcare.

This study from Stanford describes an analysis comparing the performance of LLMs against medical experts in performing a set of text summarization tasks from the following data sources: radiology reports, progress notes, patient questions, and patient-provider conversations.

The performance metrics included completeness, correctness, conciseness, and prevalence of fabricated information. In 81% cases, the LLM generated summaries were rated either equivalent (45%) or superior (36%) than those generated by their human counterparts.

While more studies will need to be done to validate this result, I think this is an important milestone in establishing the viability of LLMs for safely and effectively summarizing clinical text, which is a foundational and extensible task that underlies many healthcare workflows. Importantly, the study highlights that the benchmark for LLM performance is not perfection, but rather human experts. As a practicing physician, I can say that we are far from perfect!

Amazon and Eli Lilly launching new programs that directly identify and provide care to patients with chronic diseases

The landscape of chronic disease management is changing. While primary care providers (PCPs) remain the primary frontline access points for healthcare, nontraditional industry players such as Amazon and Eli Lilly are creating their own direct channels to patients. Strategically, these companies are betting that they can deliver care more efficiently by bypassing traditional healthcare delivery channels and employ a more digital first, consumer directed approach. These tend to focus on simplicity and efficiency of access rather than solving for complex medical issues. For Eli Lilly and potentially other biopharma companies, this is also an extension of their sales and marketing strategy for blockbuster medications (eg. semaglutide) by removing barriers to patients being prescribed those drugs.

These efforts are currently limited and only represent a small sliver of healthcare delivery in the US. However, incumbent health systems need to take notice, if only to observe and assess the demand for such services, as it will likely come from patients disillusioned with the inefficiencies of existing systems and searching for alternatives. I see a likely rise in the number of strategic partnerships between incumbent health systems looking to outsource certain parts of care pathways to these digital first companies in exchange for maintaining referral channels to complex specialty care.

Yet, the key to success for these efforts is clinical integration. These “direct to consumer” digital first care models often fall short in serving patients with more complex medical histories and needs. Even for the patient who may be just looking for Ozempic to lose weight, they likely have other comorbid medical problems that need to be assessed and managed. How these digital first companies integrate their care with the rest of the healthcare ecosystem will be a critical factor towards their success. Further, it will be interesting to see how they manage healthcare data. Will they join nationwide efforts around interoperability and data exchange that have created significant progress towards enabling transparency and sharing of medical data among healthcare providers, or will new “walled gardens” be formed where medical data collected by Amazon clinics will be sequestered within the “Amazon health ecosystem”?

A framework for designing an AI guided tool for communicating prognosis

Physicians often navigate conversations with patients laden with complexity, uncertainty, and deep emotions that require more than simply knowing the correct information. For example, communicating prognosis for a patient diagnosed with cancer involves more than just conveying an accurate survival rate; it requires translating and framing statistical outcomes into a narrative that resonates with the broader context of the patient’s life, hopes, and fears.

There exist many machine learning models that predict survival with the intention of eventually being integrated into clinical workflows, such as communicating prognosis, but simply presenting a survival probability often falls short of what is actually useful to a physician. The key lies in designing software that harmoniously blends ML predictions with additional information and design features that help providers create the narratives needed to drive effective conversations.

The authors of this study from the University of Utah developed a tool aimed at assisting oncologists in discussing prognosis with patients facing advanced solid tumors using a user centered design approach to integrate 6 month survival predictions into a tool that addresses these broader needs. The team underwent several rounds of iterative design sessions with oncologists. What stood out to me was how they intentionally sequenced their design steps in partnership with the end users to build towards the final product:

Use of Initial Interfaces: In rounds 1 and 2, initial interfaces provided a common visual and terminology for facilitating discussions between the team and clinician input.
Content Determination and Model Clarification: These interfaces helped in deciding what content should be presented, clarified the function of the model, and established a suitable threshold for classifying survival risk.
Exposure of Misassumptions: Misassumptions among clinical and technical experts were exposed, such as the belief that the model output represented expected survival rather than a risk classification.
Triggering of Questions and Additional Information: The discussions prompted questions and led to the presentation of additional information by technical experts, enabling further exploration of the model and its applications.
Clinical Relevance and Feature Assessment: Clinicians on the study team assessed the clinical relevance, examined the features of the model, and reviewed the gold standard mortality data.
Advice on Data Pre-processing: Clinicians advised on data pre-processing, highlighting the importance of clinician involvement in AI system design.
Application of Design Decisions: Based on the findings from rounds 1 and 2, design decisions were applied to enhance the model and interface, focusing on trust and transparency and ensuring a match between the system and the real world.
Progression of Interim Interfaces: The evolution of interim interfaces through these rounds was documented, showcasing the iterative design process and improvements made.

This iterative, dyadic partnership model facilitates the optimization of critical design and engineering decisions early on, based on what is genuinely useful to oncologists. This approach is crucial as many insights from end users often emerge only when they are presented with a visual representation, allowing spontaneous comments to be transformed into specific technical requirements.

Here is an example of a comment from an oncologist during a design session after viewing an early user interface:

“As oncologists we sometimes have rose-colored glasses on and like to overestimate the benefits of second, third, fourth line treatment. And so, I think that this can maybe ground you and bring you back to reality like, hey, look it’s probably not such a good idea. Let’s think about alternatives.” (#7) “I think this would be useful for family members.” (#3) “…helpful in…patients who…see cancer as like a battle that they have to fight…often unwilling to stop treatment despite all evidence that treatment might harm them rather than help them.”

A key insight here is how this prognosis communication tool can add value by helping oncologists recalibrate expectations and confront optimistic biases towards aggressive treatment courses, which is a specific use case that may require tailored design choices.

Communicating with patients in high stakes situations is complex and requires the art of marrying data with human connection so that the patient not only receives the correct information, but also feels genuinely heard and supported. Digital and AI enabled tools that support these interactions need to be designed with a deep understanding of the nuances of patient-provider communication that can only come from a solid dyadic partnership between the end-users and technical teams established from the outset.

References

Van Veen, D., Van Uden, C., Blankemeier, L. et al. Adapted large language models can outperform medical experts in clinical text summarization. Nat Med (2024).

Catherine J Staes, Anna C Beck, George Chalkidis, Carolyn H Scheese, Teresa Taft, Jia-Wen Guo, Michael G Newman, Kensaku Kawamoto, Elizabeth A Sloss, Jordan P McPherson, Design of an interface to communicate artificial intelligence-based prognosis for patients with advanced solid tumors: a user-centered approach, Journal of the American Medical Informatics Association, Volume 31, Issue 1, January 2024, Pages 174–187

Byte to Bedside

Discussion about this post