RTX BBN Technologies to support ARPA-H AI-powered medical chatbots reliability evaluation effort
BBN developing technology to assess the reliability and accuracy of healthcare responses
RTX BBN Technologies received an award to support the Advanced Research Projects Agency for Health’s (ARPA-H) Chatbot Accuracy and Reliability Evaluation (CARE) Exploration Topic under an Other Transaction Agreement. CARE aims to develop advanced tools and technologies for evaluating medical chatbots in patient-facing applications, addressing the critical need for reliable health information in situations where accuracy may influence patient outcomes.
Despite the potential of medical chatbots, significant limitations threaten their effectiveness. Many AI systems generate factually inaccurate or misleading responses that may cause confusion and pose potential risk to patients. As healthcare evolves, a scalable system is needed to ensure consistent medical chatbot performance in any setting. This need is intensified by ongoing lack of standardization, which continues to undermine confidence.
Evaluating medical chatbots requires more than simply checking for correct answers; it demands a deep understanding of how these systems address the complex needs of diverse users,
said Dr. Damianos Karakos, BBN principal investigator on the effort.
To address this problem, BBN will use its expertise in machine learning, language-based information processing and large language models to develop the Monitoring, Evaluation and Diagnosing of Intelligent Chatbots (MEDIC) system. This comprehensive solution will function as a technological framework for evaluating medical chatbots, featuring core capabilities such as:
- Integration of insights from caregivers, patients and medical professionals to optimize chatbot interactions and effectively address their concerns and expectations.
- Retrieval of relevant medical texts to validate chatbot responses against evidence-based data sources.
- Advanced prompt engineering to create realistic interactions from various demographic perspectives.
- Detection of missing or inaccurate information in chatbot outputs using multiple evaluative methods, which use advanced information extraction and machine learning techniques.
“Our goal is to develop an adaptable framework that rigorously assesses chatbot performance in real-world scenarios, focusing on key aspects like bias, fairness and the risk of generating misleading information,” said Karakos. “For example, in prenatal care, it’s crucial that expectant mothers receive accurate dietary guidance to support fetal health. MEDIC will assess the dietary advice given by medical chatbots and escalate any ambiguous responses to healthcare professionals for further review.”
This initiative aims to improve AI-integrated care in a variety of healthcare settings.
The BBN-led team includes Johns Hopkins University (Prof. Mark Dredze), Johns Hopkins University School of Medicine and Howard University Hospital. Work on this effort is being performed in Cambridge, Massachusetts; Washington, D.C.; and Baltimore, Maryland.
This research was, in part, funded by the Advanced Research Projects Agency for Health (ARPA-H). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the United States Government.
RTX BBN Technologies to support ARPA-H AI-powered medical chatbots reliability evaluation effort, source.
Read the lates AI news at thinkaivolution.