MedGemma: Google's most capable open models for health AI development

10 hours ago 4

Healthcare is increasingly embracing AI to improve workflow management, patient communication, and diagnostic and treatment support. It’s critical that these AI-based systems are not only high-performing, but also efficient and privacy-preserving. It’s with these considerations in mind that we built and recently released Health AI Developer Foundations (HAI-DEF). HAI-DEF is a collection of lightweight open models designed to offer developers robust starting points for their own health research and application development. Because HAI-DEF models are open, developers retain full control over privacy, infrastructure and modifications to the models. In May of this year, we expanded the HAI-DEF collection with MedGemma, a collection of generative models based on Gemma 3 that are designed to accelerate healthcare and lifesciences AI development.

Today, we’re proud to announce two new models in this collection. The first is MedGemma 27B Multimodal, which complements the previously-released 4B Multimodal and 27B text-only models by adding support for complex multimodal and longitudinal electronic health record interpretation. The second new model is MedSigLIP, a lightweight image and text encoder for classification, search, and related tasks. MedSigLIP is based on the same image encoder that powers the 4B and 27B MedGemma models.

MedGemma and MedSigLIP are strong starting points for medical research and product development. MedGemma is useful for medical text or imaging tasks that require generating free text, like report generation or visual question answering. MedSigLIP is recommended for imaging tasks that involve structured outputs like classification or retrieval. All of the above models can be run on a single GPU, and MedGemma 4B and MedSigLIP can even be adapted to run on mobile hardware.

Full details of MedGemma and MedSigLIP development and evaluation can be found in the MedGemma technical report.

MedGemma: A multimodal generative model for health

The MedGemma collection includes variants in 4B and 27B sizes, both of which now accept image and text inputs and produce text outputs.

  • MedGemma 4B Multimodal: MedGemma 4B scores 64.4% on MedQA, which ranks it among the best very small (<8B) open models. In an unblinded study, 81% of MedGemma 4B–generated chest X-ray reports were judged by a US board certified radiologist to be of sufficient accuracy to result in similar patient management compared to the original radiologist reports. It additionally achieves performance on medical image classification tasks that is competitive with task-specific state-of-the-art models.
  • MedGemma 27B Text and MedGemma 27B Multimodal: Based on internal and published evaluations, the MedGemma 27B models are among the best performing small open models (<50B) on the MedQA medical knowledge and reasoning benchmark; the text variant scores 87.7%, which is within 3 points of DeepSeek R1, a leading open model, but at approximately one tenth the inference cost. The MedGemma 27B models are competitive with larger models across a variety of benchmarks, including retrieval and interpretation of electronic health record data.

We developed these models by training a medically optimized image encoder (independently released as MedSigLIP, described below), followed by training the corresponding 4B and 27B versions of the Gemma 3 model on medical data. We took care to retain the general (non-medical) capabilities of Gemma throughout this process. This allows MedGemma to perform well on tasks that mix medical and non-medical information and preserve instruction-following and capabilities in non-English languages.

A key aspect of these models is their adaptability. For instance, after fine-tuning, MedGemma 4B is able to achieve state-of-the-art performance on chest X-ray report generation, with a RadGraph F1 score of 30.3. The straightforward ability for developers to improve performance on their target applications highlights the value of MedGemma as a starting point for developers looking to build AI for healthcare.

MedSigLIP: A specialized image encoder for healthcare

MedSigLIP is a lightweight image encoder of only 400M parameters that uses the Sigmoid loss for Language Image Pre-training (SigLIP) architecture. MedSigLIP was adapted from SigLIP via tuning with diverse medical imaging data, including chest X-rays, histopathology patches, dermatology images, and fundus images, allowing the model to learn nuanced features specific to these modalities. Importantly, we also took care to ensure that MedSigLIP retains strong performance on the natural images on which the original SigLIP model was trained, maintaining its versatility.

MedSigLIP is designed to bridge the gap between medical images and medical text by encoding them into a common embedding space. MedSigLIP achieves similar or improved classification performance compared to task-specific vision embedding models while being far more versatile across medical imaging domains.

MedSigLIP is ideal for:

  • Traditional image classification: Build performant models to classify medical images.
  • Zero-shot image classification: Classify images without specific training examples by comparing image embeddings to the embeddings of textual class labels.
  • Semantic image retrieval: Find visually or semantically similar images from large medical image databases.

The power of open models

Because the MedGemma collection is open, the models can be downloaded, built upon, and fine-tuned to support developers’ specific needs. Particularly in the medical space, this open approach offers several distinct advantages over API-based models:

  • Flexibility and privacy: Models can be run on proprietary hardware in the developer’s preferred environment, including on Google Cloud Platform or locally, which can address privacy concerns or institutional policies.
  • Customization for high performance: Models can be fine-tuned and modified to achieve optimal performance on target tasks and datasets.
  • Reproducibility and stability: Because the models are distributed as snapshots, their parameters are frozen and unlike an API, will not change unexpectedly over time. This stability is particularly crucial for medical applications where consistency and reproducibility are paramount.

To ensure broad accessibility and ease of use, our Hugging Face collection offers MedSigLIP and MedGemma in the popular Hugging Face safetensors format.

What developers are building with MedGemma & MedSigLIP

Researchers and developers have been exploring the MedGemma models for their use cases and have found the models adept at solving some crucial problems. Developers at DeepHealth in Massachusetts, USA have been exploring MedSigLIP to improve their chest X-ray triaging and nodule detection. Researchers at Chang Gung Memorial Hospital in Taiwan noted that MedGemma works well with traditional Chinese-language medical literature and can respond well to medical staff questions. Developers at Tap Health in Gurgaon, India, remarked on MedGemma’s superior medical grounding, noting its reliability on tasks that require sensitivity to clinical context, such as summarizing progress notes or suggesting guideline-aligned nudges.

We’re excited to continue to learn about these and other use cases from developers as they create the next generation of Health AI tools with MedGemma and MedSigLIP.

Get started and explore

To help developers get started, we’ve provided detailed notebooks on GitHub for MedGemma and MedSigLIP that demonstrate how to create instances of MedSigLIP and MedGemma for both inference and fine-tuning on Hugging Face. When developers are ready to scale, MedGemma and MedSigLIP can be seamlessly deployed in Vertex AI as dedicated endpoints, and we provide examples in GitHub of how to run inference on these endpoints. We’ve also added a new demo to our HAI-DEF Hugging Face demo collection that shows how MedGemma can be built into an application to streamline pre-visit information gathering ahead of a patient appointment.

Refer to the following table to understand which model from the MedGemma family is ideal for your use case.

Please visit the HAI-DEF site for these resources and to learn more about the MedGemma collection and other Health AI Developer Foundations models. The HAI-DEF forum is available for questions or feedback.

Note on training datasets

Models were trained on a mix of public and private de-identified datasets. Google and its partners utilize datasets that have been rigorously anonymized or de-identified to ensure the protection of individual research participants and patient privacy.

Disclaimer

MedGemma and MedSigLIP are intended to be used as a starting point that enables efficient development of downstream healthcare applications involving medical text and images. MedGemma and MedSigLIP are not intended to be used without appropriate validation, adaptation and/or making meaningful modification by developers for their specific use case. The outputs generated by these models are not intended to directly inform clinical diagnosis, patient management decisions, treatment recommendations, or any other direct clinical practice applications. Performance benchmarks highlight baseline capabilities on relevant benchmarks, but even for image and text domains that constitute a substantial portion of training data, inaccurate model output is possible. All model outputs should be considered preliminary and require independent verification, clinical correlation, and further investigation through established research and development methodologies.

Read Entire Article