Biomedical visualization specialists haven't come to terms with how or whether to use generative AI tools when creating images for health and science applications. But there's an urgent need to develop guidelines and best practices because incorrect illustrations of anatomy and related subject matter could cause harm in clinical settings or as online misinformation.
Researchers from the University of Bergen in Norway, the University of Toronto in Canada, and Harvard University in the US make that point in a paper titled, "'It looks sexy but it's wrong.' Tensions in creativity and accuracy using GenAI for biomedical visualization," scheduled to be presented at IEEE's Vis 2025 conference in November.
In their paper, authors Roxanne Ziman, Shehryar Saharan, Gaël McGill, and Laura Garrison present various illustrations created by OpenAI's GPT-4o or DALL-E 3 alongside versions created by visualization experts.
Screenshot from paper.
Top row: Incorrect GPT-4o or DALL-E 3 images; Bottom row: images created by BioVisMed illustrators - Click to enlarge
Some of the examples cited diverge from reality in subtle ways. Others, like "the infamously well-endowed rat" in a now-retracted article published in Frontiers in Cell and Developmental Biology, would be difficult to mistake for anything but fantasy.
Either way, imagery created by generative AI may look nice but isn't necessarily accurate, the authors say.
"In light of GPT-4o Image Generation’s public release at the time of this writing, visuals produced by GenAI often look polished and professional enough to be mistaken for reliable sources of information," the authors state in their paper.
"This illusion of accuracy can lead people to make important decisions based on fundamentally flawed representations, from a patient without such knowledge or training inundated with seemingly accurate AI-generated 'slop,' to an experienced clinician who makes consequential decisions about human life based on visuals or code generated by a model that cannot guarantee 100 percent accuracy."
Show me a pancreas, and MidJourney is like, here is your pile of alien eggs!
Co-author Ziman, a PhD fellow in visualization research at the University of Bergen, told The Register in an email, "While I’ve not yet come across real-world examples where AI-generated images have directly resulted in harmful health-related outcomes, one interview participant shared with us this case involving an AI-based risk-scoring system to detect fraud and wrongfully accused (primarily foreign parents) of childcare benefits fraud in the Netherlands.
"With AI-generated images, the more pervasive issue is the use of inaccurate imagery in medical and health-related publications, and scientific research publications broadly. While the potential harm isn’t immediately apparent, the increased use of inaccurate images like this, and problems like reinforcing stereotypes in healthcare, to communicate health and medical information is troubling."
Ziman said that the larger problem, echoed in a series of interviews discussed in the paper, is the way inaccurate imagery affects how the public sees scientific research. She pointed at the "well-endowed rat" and how it was featured on The Late Show with Stephen Colbert.
"Satirical criticism by such public figures (that people may tend to trust more than ‘legitimate’ news sources) can throw into question the legitimacy of the scientific research community at large, and the public can come to distrust (even more) or not take seriously what they hear coming out of the scientific research community," said Ziman.
"Think of the consequences then for public health communications as during COVID, vaccine campaigns, etc. And bad actors now have greater ease of quickly creating and sharing misleading but convincing-looking imagery."
Ziman said while AI-generated medical images often get shared in the biomedical visualization (BioMedVis) community for a laugh and criticism, practitioners have yet to figure out how to mitigate the risks.
- Caught a vibe that this coding trend might cause problems
- As AI becomes more popular, concerns grow over its effect on mental health
- ServiceNow eyes $100M in AI-powered headcount savings
- Microsoft CEO feels weighed down by job cuts
Toward that end, the authors surveyed 17 BioMedVis professionals to assess how they see generative AI tools and how they use those tools in their work. The survey respondents, referred to by pseudonyms in the paper, reported a wide range of views about generative AI. The authors grouped them into five personas: Enthusiastic Adopters, Curious Adapters, Curious Optimists, Cautious Optimists, and Skeptical Avoiders.
Some of the respondents appreciated the abstract and otherworldly aesthetics of images generated by AI models, saying the images helped advance conversations with clients. Others (about half) were critical of GenAI style, agreeing with "Frank," who said the generic look in those images is boring.
Irrelevant or hallucinated references remain a problem, as do invented new terms, such as the 'green glowing protein.'
The survey takers also sometimes used text-to-text models for captions and descriptive assistance, though not always to the satisfaction of respondents. As the paper observes, "Irrelevant or hallucinated references remain a problem, as do invented new terms, such as the 'green glowing protein.'"
Some of the survey respondents see generative AI as useful for handling rote coding tasks, like generating boilerplate code or cleaning data. Others, however, feel they've already invested time learning to code and prefer to use those skills rather than delegate them.
The researchers also note a contradictory attitude among respondents, who "express grave concerns about intellectual property violations that are, for the moment, baked into public GenAI tools" when using them for commercial purposes, while also largely accepting use of generative AI on a personal basis.
Even though 13 of the 17 surveyed already incorporate GenAI into their production workflows to some degree, BioMedVis developers and designers still prioritize accuracy in their images, and "GenAI in its current state is unable to achieve this benchmark," the authors observe.
They point to remarks attributed to "Arthur": "While it’s still scraping the digital world for references it can use to generate art, it’s not yet able to know the difference between the sciatic nerve and the ulnar nerve. It’s just, you know, wires."
They also quote "Ursula" about GenAI's inability to produce accurate anatomy: “Show me a pancreas, and MidJourney is like, here is your pile of alien eggs!”
The paper says while erroneous AI output may be obvious in many cases, BioMedVis folk expect these errors to become harder to detect as technology improves and as people become accustomed to trusting these systems.
The researchers also raise more general concerns about the blackbox nature of machine learning and the difficulty of addressing bias.
The paper explains, "Inaccurate or unreliable outputs, whether the anatomical visuals …or blocks of code, can mislead and diffuse responsibility. Participants questioned who should be held accountable in instances where GenAI is used and lines of accountability blur." Blackbox models prevent that sort of accountability. As survey respondent "Kim" said, "There should be someone who can explain the results. It is about trust, and [...] about competence."
As a community, we should feel comfortable sharing our thoughts, questions, and concerns about these tools
Co-author Shehryar Saharan from the University of Toronto told The Register in an email that he hopes this research encourages people to think critically about how generative AI fits into the work and values of BioMedVis professionals.
"These tools are becoming a bigger part of our field, and it's important that we don’t just use them, but critically reflect on what they mean for how we work and why we do what we do," said Saharan.
"As a community, we should feel comfortable sharing our thoughts, questions, and concerns about these tools. Without open conversation and a willingness to reflect, we risk falling behind or using these technologies in ways that don’t align with what we actually care about. It’s about making space to think and reflect before we move forward." ®