A novel evaluation framework for medical LLMs: Combining fuzzy logic and MCDM for medical relation and clinical concept extraction

Dianes, David Emmanuel

QR Code Link :
Type :	Article
Subject :	T Technology (General)
ISBN :	0148-5598
Main Author :	Dianes, David Emmanuel
Additional Authors :	Garfan, Salem Abdullah
Title :	A novel evaluation framework for medical LLMs: Combining fuzzy logic and MCDM for medical relation and clinical concept extraction
Hits :	363

Place of Production :	Tanjung Malim
Publisher :	Fakulti Komputeran & Meta-Teknologi
Year of Publication :	2024
Notes :	Journal of Medical Systems
Corporate Name :	Universiti Pendidikan Sultan Idris
HTTP Link :	Click to view web link
PDF Full Text :	You have no permission to view this item.

Abstract : Universiti Pendidikan Sultan Idris

Artificial intelligence (AI) has become a crucial element of modern technology, especially in the healthcare sector, which is apparent given the continuous development of large language models (LLMs), which are utilized in various domains, including medical beings. However, when it comes to using these LLMs for the medical domain, there’s a need for an evaluation platform to determine their suitability and drive future development efforts. Towards that end, this study aims to address this concern by developing a comprehensive Multi-Criteria Decision Making (MCDM) approach that is specifically designed to evaluate medical LLMs. The success of AI, particularly LLMs, in the healthcare domain, depends on their efficacy, safety, and ethical compliance. Therefore, it is essential to have a robust evaluation framework for their integration into medical contexts. This study proposes using the Fuzzy-Weighted Zero-InConsistency (FWZIC) method extended to p, q-quasirung orthopair fuzzy set (p, q-QROFS) for weighing evaluation criteria. This extension enables the handling of uncertainties inherent in medical decision-making processes. The approach accommodates the imprecise and multifaceted nature of real-world medical data and criteria by incorporating fuzzy logic principles. The MultiAtributive Ideal-Real Comparative Analysis (MAIRCA) method is employed for the assessment of medical LLMs utilized in the case study of this research. The results of this research revealed that “Medical Relation Extraction” criteria with its sub-levels had more importance with (0.504) than “Clinical Concept Extraction” with (0.495). For the LLMs evaluated, out of 6 alternatives, (A4) “GatorTron S 10B” had the 1st rank as compared to (A1) “GatorTron 90B” had the 6th rank. The implications of this study extend beyond academic discourse, directly impacting healthcare practices and patient outcomes. The proposed framework can help healthcare professionals make more informed decisions regarding the adoption and utilization of LLMs in medical settings. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.

References

C. Xingxin, Z. Xin, and W. Gangming, "Research on online fault detection tool of substation equipment based on artificial intelligence," Journal of King Saud University-Science, vol. 34, no.6, p. 102149, 2022.

M. Pournader, H. Ghaderi, A. Hassanzadegan, and B. Fahimnia, "Artificial intelligence applications in supply chain management," International Journal of Production Economics, vol. 241, p. 108250, 2021.

A. Zirar, S. I. Ali, and N. Islam, "Worker and workplace Artificial Intelligence (AI) coexistence: Emerging themes and research agenda," Technovation, vol. 124, p. 102747, 2023.

A. R. Malik, Y. Pratiwi, K. Andajani, I. W. Numertayasa, S. Suharti, and A. Darwis, "Exploring Artificial Intelligence in Academic Essay: Higher Education Student's Perspective," International Journal of Educational Research Open, vol. 5, p. 100296, 2023.

G. Kaur, P. Tomar, and M. Tanque, Artificial intelligence to solve pervasive internet of things issues. Academic Press, 2020.

S. Tuli et al., "AI augmented Edge and Fog computing: Trends and challenges," Journal of Network and Computer Applications, p. 103648, 2023.

K. Panesar and M. B. P. C. de Alba, "Natural language processingdriven framework for the early detection of language and cognitive decline," Language and Health, 2023.

O. Nov, N. Singh, and D. M. Mann, "Putting ChatGPT's medical advice to the (Turing) test," medRxiv, p. 2023.01. 23.23284735, 2023.

T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, "Large language models are zero-shot reasoners," Advances in neural information processing systems, vol. 35, pp. 22199-22213, 2022.

C. Zhang, J. Chen, J. Li, Y. Peng, and Z. Mao, "Large language models for human-robot interaction: A review," Biomimetic Intelligence and Robotics, p. 100131, 2023.

A. H. Huang, H. Wang, and Y. Yang, "FinBERT: A large language model for extracting information from financial text," Contemporary Accounting Research, vol. 40, no. 2, pp. 806-841, 2023.

R. Taylor et al., "Galactica: A large language model for science," arXiv preprint arXiv: 2211. 09085, 2022.

X. Yang et al., "A large language model for electronic health records," NPJ Digital Medicine, vol. 5, no. 1, p. 194, 2022.

H. Jung et al., "Enhancing Clinical Efficiency through LLM: Discharge Note Generation for Cardiac Patients," arXiv preprint arXiv: 2404. 05144, 2024.

This material may be protected under Copyright Act which governs the making of photocopies or reproductions of copyrighted materials.
You may use the digitized material for private study, scholarship, or research.

Back to search page