New Paper Analyzes Health Equity and Disparities From IRS Tax Documentation Submitted by US Nonprofit Hospitals

Source: iStock
Copyright: Sean Anthony Eddy
License: Licensed by the authors

A new paper by experts at RTI International, a nonprofit research institute, was published in the Journal of Medical Internet Research, the leading peer-reviewed journal for digital medicine and health and health care in the internet age. The paper, “Text Analysis of Trends in Health Equity and Disparities From the Internal Revenue Service Tax Documentation Submitted by US Nonprofit Hospitals Between 2010 and 2019: Exploratory Study,” was authored by Emily HadleyLaura MarcialWes Quattrone, and Georgiy Bobashev.

Many US hospitals are classified as nonprofits and receive tax-exempt status partially in exchange for providing benefits to the community. The RTI authors used text analysis to examine trends in health equity and disparities based on Internal Revenue Service (IRS) tax documentation submitted by these hospitals.

“Hospital community benefits tax documentation has historically been cumbersome for both researchers and the public,” said Emily Hadley, a research data scientist at RTI. “It was exciting to use data science tools to illuminate national trends in health equity and disparities and highlight opportunities for hospitals to seek better alignment with community needs. Our work demonstrates the potential for text analysis to support greater transparency and accountability and facilitate stakeholder-driven research with large amounts of text data.”

The IRS collects proof of compliance using the Schedule H form that nonprofit hospitals submit as part of the annual IRS Form 990 (F990H). This includes a free-response text section known for being ambiguous and difficult to audit. For this reason, the researchers used natural language processing (NLP) to evaluate this text section with a focus on health equity and disparities. This research is among the first to use NLP for text analysis of this form.

When the team analyzed the text, they found an increased use of text related to 29 themes around health equity and disparities. They also found that more than 90% of hospital reporting entities used a term in 2018 and 2019 related to affordability, government organizations, mental health, and data collection. The themes with the largest relative increase were LGBTQ (lesbian, gay, bisexual, transgender, queer; 1676.6%), social determinants of health (SDOH; 958.4%), and environment (522%).

The authors also found that terms related to homelessness varied geographically from 2010 to 2018, and terms related to equity, health IT, immigration, LGBTQ, oral health, rural, SDOH, and substance use had statistically significant geographic variation in 2018. Terms related to substance use saw the largest raw percentage point increase: only a quarter of hospital reporting entities used any substance use language in 2010, while more than two-thirds of hospital reporting entities used a substance use term in 2019. However, the use of themes like LGBTQ, disability, oral health, and race and ethnicity ranked lower than the public interest in these topics, and some increased mentions of themes with large increases in use were to explicitly say that no action was taken by a hospital on those themes.

Overall, the paper reveals that hospital reporting entities are demonstrating an increasing awareness of health equity and disparity topics in community benefits tax documentation, but these do not necessarily correspond with general population interests or additional action.

Learn more about RTI’s work around community benefits:


Emily Hadley
Laura Haak Marcial
Wes Quattrone 
Georgiy Bobashev

Original article:

Hadley E, Marcial LH, Quattrone W, Bobashev G. Text Analysis of Trends in Health Equity and Disparities From the Internal Revenue Service Tax Documentation Submitted by US Nonprofit Hospitals Between 2010 and 2019: Exploratory Study. J Med Internet Res. 2023;25:e44330.




About JMIR Publications

JMIR Publications is a leading, born-digital, open access publisher of 30+ academic journals and other innovative scientific communication products that focus on the intersection of health, and technology. Its flagship journal, the Journal of Medical Internet Research, is the leading digital health journal globally in content breadth and visibility, and is the largest journal in the medical informatics field.

To learn more about JMIR Publications, please visit or connect with us via TwitterLinkedInYouTubeFacebook, and Instagram.

Head office: 130 Queens Quay East, Unit 1100, Toronto, ON, M5A 0P6 Canada

Media contact:

The content of this communication is licensed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, published by JMIR Publications, is properly cited.