Harnessing Generative AI to Improve Patient Communication and Reduce Disparities

January 22, 2025 | By Traci Farrell
Alicia Fernandez, MD, meets with her long-time patient Ana Luianes at San Francisco General Hospital and Trauma Center (2015). Photo by Susan Merrell.

Alicia Fernandez, MD, meets with her long-time patient Ana Luianes at San Francisco General Hospital and Trauma Center (2015). Photo by Susan Merrell.

A Career Rooted in Health Equity

Elaine Khoong, MD, MS
Elaine Khoong, MD, MS

Born to immigrant parents, Elaine Khoong, MD, MS, a general internist and assistant professor of medicine at UCSF, grew up with a strong sense of fairness instilled by her mother. While she believed in equal opportunities for all, her clinical experiences at Zuckerberg San Francisco General Hospital and Trauma Center revealed the persistent inequities many patients experience, particularly those with language barriers.

During her internal medicine residency, she improved her Cantonese language skills through patient interactions. However, written communication remained a challenge. Her patient interactions inspired her to investigate how technology, specifically machine translation tools, could address language discordance—the lack of shared language between patient and provider—and improve health equity. Around this time, in 2016, Google Translate introduced neural machine translation, enabling sentence-level translations.

“I would type my instructions in English,” Dr. Khoong said of initially using Google Translate. “And because I knew Chinese I could eyeball it for accuracy before showing it to patients.”

Her initial research highlighted the limitations of Google Translate, including issues with medical terminology, spelling errors, and literacy levels. For instance, a typo like “low back stain” instead of “low back strain” was mistranslated into phrases describing a spot or patch on the back in both Spanish and Chinese. These challenges underscored the need for more advanced translation tools.

The Potential of Generative AI in Healthcare Translation

Table showing three examples of inaccurate clinical translations and their potential for harm
Khoong et al. JAMA IM. 2019

Generative AI, particularly large language models (LLMs) like ChatGPT, offers a promising alternative. In a recent study, Dr. Khoong and her team compared Google Translate and GPT-4.0 in translating 50 sets of emergency discharge instructions into Chinese, Spanish, and Russian. They then used professional translators to back-translate the texts into English and had clinicians assess for errors and potential harm.

GPT-4 outperformed Google Translate in Chinese and Russian translations, though both tools showed similar performance in Spanish. Notably, GPT-4 correctly translated nuanced instructions like “hold the kidney medicine,” which Google mistranslated as “continue the medicine” in Russian.

In addition, the team explored whether GPT-4 could rewrite instructions for clarity and lower literacy levels. They tested this with 20 patients—10 English speakers and 10 Spanish speakers—who reviewed both original and GPT-rewritten instructions. Reactions varied: some patients preferred simpler language for certain instructions, while others valued detailed medical information.

“In some cases, patients preferred lower literacy instructions, while in others, they wanted more detailed content,” explained Alicia Fernandez, MD, professor of medicine, one of Dr. Khoong’s research partners. “It’s clear that patient preferences depend on the topic and context.”

Addressing Language Barriers in Real Time

Language discordance in healthcare is associated with worse outcomes. While interpreters and bilingual clinicians mitigate some disparities, written instructions are rarely translated in real time due to resource constraints.

UCSF Health is piloting new workflows to address this gap. For instance, clinicians can now share messages with their patient by first sending MyChart messages to certified translators who provide Spanish translations within 24 hours. Additionally, a quality improvement project led by first-year medical students in the Division of Hospital Medicine within UCSF Health significantly increased the percentage of patients at UCSF Health hospitals receiving discharge instructions in their preferred language—from 3% in fall 2023 to 21% by summer 2024.

Despite these advances, verbal communication remains a challenge in low-stakes interactions, such as check-ins or simple clarifications, where interpreters are not always available. Machine translation holds promise in expanding language access during these interactions. While the San Francisco Department of Public Health has no contract with either ChatGPT or Google, technology experts at the DPH and Zuckerberg San Francisco General continue to assess AI platforms for future use.

“That’s what makes this technology so exciting,” Dr. Fernandez said. “It has the potential to make patients feel more included and empowered.”

Barriers to Widespread Adoption

The U.S. Department of Health and Human Services currently prohibits machine translation for critical communications without human quality assurance, limiting its implementation. Additionally, healthcare systems like Epic are unlikely to integrate automated workflows until clear policies address liability for translation errors.

“Who’s responsible if there’s a bad translation?” asked Dr. Khoong.

To address these challenges, Dr. Khoong and Dr. Fernandez are focused on further research to understand machine translation’s limitations and build evidence on its error rates.

The Road Ahead: Centering the Patient Perspective

Dr. Khoong’s next steps include mapping the patient journey to identify gaps in language access—from scheduling appointments to filling prescriptions. She also plans to gather insights into how patients perceive tradeoffs between accuracy and accessibility.

“We have plenty of clinician perspectives, but we rarely hear from patients—especially those with limited literacy or who are historically excluded,” Dr. Khoong said. “Understanding their needs is key to shaping the future of language access.”

As research continues, the hope is that generative AI tools can bridge language barriers, empower patients, and reduce health disparities.

“More inclusive technology means more patients will feel seen, heard, and understood,” Dr. Fernandez said.