
Large language models such as ChatGPT and Google Bard could assist heart teams in decision making, but physicians should be aware of current limitations of the technology.
This is the conclusion of a preliminary proof-of-concept trial that sought to examine whether three artificial intelligence (AI) algorithms—ChatGPT v3.5, ChatGPT v4.0 and Google Bard—to see whether they could accurately classify patients into surgical or interventional procedures. Google Bard has since been replaced by a later product, Gemini, whilst newer models of ChatGPT are also available.
Findings of the study have been published in a research letter in JACC: Cardiovascular Interventions, with authors Edward Itelman (Rabin Medical Center, Petah Tikva, Israel) and colleagues writing that the introduction of generative AI as an assistant in the decision-making process for whether patients are more suited to a percutaneous or surgical procedure “offers a novel paradigm that harmonises vast data points and clinical expertise to enhance the precision and personalisation of care”.
To test this theory, Itelman and colleagues fed 40 cases written by expert interventional cardiologists in both coronary and structural interventions into each of the AI models, instructing them to consider themselves “as part of a multidisciplinary discussion about the best strategy for this specific patient”. Questions were constructed according to current European Society of Cardiology (ESC) guidelines, in order to have characteristics that will sway the decision clearly to one approach or the other, with variables including age, surgical risk, the feasibility of the transfemoral approach and concomitant valvular or coronary disease.
Of the cases selected, 20 involved coronary disease, with 10 leaning towards percutaneous coronary intervention (PCI), whilst of the 20 structural cases, 10 leaned toward transcatheter aortic valve intervention (TAVI).
Itelman et al report that in the structural cases, both ChatGPT models accurately decided between TAVI and surgical aortic valve replacement (SAVR) in 100% of cases, whilst Google Bard correctly assigned therapy in 70% of cases, 90% of the TAVI-leaning cases and 50% of the SAVR-leaning cases.
In the coronary cases, ChatGPT v3.5 and Google Bard both achieved 70% accuracy, whilst ChatGPT v4.0 achieved 100% accuracy.
The models were then asked to explain their decision-making process, with Itelman et al acknowledging that “parts of the explanation were very logical and up to date with current medical guidelines”. But, in some instances they provided relatively outdated references to justify decisions, despite newer evidence being available.
“The use of an outdated reference is expected to be improved with newer models acquiring better citing capabilities and access to more recent knowledge,” the study’s authors note.
They also found that different models appeared to exhibit different biases. Google Bard, for example, tended to mistakenly classify patients to percutaneous intervention, and when asked to explain, cited young age, low surgical risk and bicuspid valve as factors favouring TAVI. ChatCPT v3.5, meanwhile, tended to do the opposite, and most of the mistakes were incorrectly misclassified as labelling chest deformations as a feature favouring surgery.
Users were also unaware of the source of information the models were using, Itelman and colleagues note, pointing out that ESC and American Heart Association (AHA) can differ significantly on certain parameters.
“Assisting role”
“We believe these tools can potentially serve an assisting role in the decision-making process of our clinical practice,” the authors write, noting however, that they are currently in no position to replace human physicians, but highlighting that clinicians should be aware of their capabilities and limitations as patients may turn to them for a second opinion when facing different therapeutic options.
“AI tools already have a vast use in almost every medical profession, and with proper regulation and supervision, large language models might be able to assist human physicians in a wide range of activities in cardiovascular medicine,” they conclude.
“It is important to say that this paper examines the landscape when it was written,” Itelman tells Cardiovascular News. “Technology will advance, but we believe the path forward should involve tech companies, medical establishments, and regulators working together to ensure validated and safe models that can be incorporated into actual clinical care.”