Google DeepMind has debuted AlphaEarth Foundations, an AI model that treats Earth like a living dataset, tracking crop cycles, coastlines, urban expansion, melting ice, and much, much more. AlphaEarth weaves together disparate data streams, from satellite imagery and sensor data to geotagged Wikipedia entries, into a unified digital representation that scientists can probe to uncover patterns unfolding worldwide.
AlphaEarth produces a 64-dimensional “embedding” for every 10-by-10-meter cell of the planet annually from 2017 to 2024, covering both raw imagery and the relationships present in the underlying data. An embedding is a dense numeric summary of a place’s key features, making locations directly comparable. This approach cuts storage needs sixteenfold while preserving fine spatial and temporal detail. Altogether, the system amounts to over 1.4 trillion embeddings per year.
Detailed snapshots of year-round surface conditions will prove valuable in a wide range of fields, including planetary analysis, urban planning, ecosystem tracking, wildlife conservation, and wildfire risk management.
Digital Embeddings of Earth
A key challenge in building the model was handling the messy sprawl of geospatial data itself. Traditional satellites capture large volumes of information-rich images and measurements that can be difficult to connect and efficiently analyze.
The AlphaEarth Foundations team told IEEE Spectrum that one limitation in Earth observation is the inherent irregularity and sparsity of the data. Unlike a continuous video feed, satellite data is a collection of intermittent snapshots with frequent gaps caused by factors like persistent cloud cover.
To ensure consistent performance, the model needed a wide net of training data: A global sample of images covering more than 5 million locations acquired from the Google Earth Engine public data catalog, including optical imagery, radar, climate models, topographic maps, lidar, gravitational field strength, and surface temperature measurements. To enrich the dataset, the team also incorporated Wikipedia articles on landmarks and other features.
That diversity makes the model’s representations more detailed, but still broad enough to be relevant across different regions and scientific tasks. In Ecuador, for example, embeddings enable analysts to see through persistent cloud cover, revealing agricultural plots in various development stages.
“Given we were aiming to integrate this data into a unified digital representation to provide scientists with a more complete and consistent picture of our planet’s evolution, we had to grapple with petabytes of multi-source, multi-resolution imagery and other geospatial datasets,” says Chris Brown, a senior research engineer at Google DeepMind.
The team first had to get data pipelines and modeling infrastructure to a place where working on petabyte scales was feasible. “We prioritized respecting the nuances of geospatial data, such as projections, unique sensor properties, and sensor acquisition strategies, while ensuring the model and its outputs were robust and generally useful for a wide variety of applications,” says Brown.
AlphaEarth Foundations consistently outperformed other featurization approaches.Christopher F. Brown, Michal R. Kazmierski, Valeria J. Pasquarella, et al.
The team stresses that AlphaEarth isn’t a generative model but a self-supervised framework designed to provide compact summaries of patterns in existing data. They worked to mitigate training bias through stratified sampling, which involves training the model on millions of locations to ensure diverse geographies and ecosystems are represented.
According to Emily Schechter, a senior product manager at Google Earth Engine, the team benchmarked AlphaEarth against both traditional approaches and other AI mapping systems across multiple time periods and tasks, such as estimating ground surface properties and tracking changes in how land is being used over time. The results, Schechter says, showed AlphaEarth consistently outperformed alternatives, even in situations where labeled data was scarce.
In a paper posted in late July, Google DeepMind reported that AlphaEarth had a 23.9 percent lower error rate on average than competing approaches. The researchers noted that the identity of the next-best baseline varies by dataset and task, signaling inconsistent prior progress in the field. AlphaEarth, on the other hand, shows consistent gains even in historically difficult mapping scenarios.
It’s also more effective at classifying data. When pulling embedding vectors from Earth Engine for a labeled set of sites, the model successfully classified 87 crop categories and land-cover types using only about 150 examples per class, something that usually demands thousands of labels. In other tasks DeepMind explored, AlphaEarth was able to detail intricate Antarctic terrain despite irregular satellite coverage, and to spot subtle shifts in Canadian farmland that are missing from standard imagery.
“To the best of my knowledge, this is the largest scale effort of its kind to date in terms of training data, model context size, and integrated modalities,” Brown says. “There’s so much potential for this technology to be applied, in different ways and across different use cases. [...] We’ll continue working with our partners to find ways to make this most useful for people.”
Unified Model for Earth Science
While AlphaEarth Foundations shares some similarities with digital twins—virtual replicas of real-world environments—it functions more as a groundwork than a full twin. By turning Earth’s raw data into a flexible public format, it supports a range of specialized models and analyses to plug in on top without rebuilding the data pipeline each time.
The satellite embedding dataset is available through the Earth Engine Data Catalog, which is free for non-commercial use. Google DeepMind has been running tests with over 50 organizations worldwide in the last year. Several universities and the Food and Agriculture Organization of the United Nations are already using the embeddings.
AlphaEarth’s embedding fields provide dozens of different ways to understand parcels of the Earth’s surface.Google DeepMind; Google Earth Engine
Schechter also pointed to examples such as the Brazilian non-profit MapBiomas, which is now mapping environmental changes in the Amazon rainforest, and the Global Ecosystems Atlas, a program classifying unmapped ecosystems into shrublands, deserts, wetlands, and other categories.
Beyond research, integration into the widely used spatial analytics platform CARTO puts AlphaEarth Foundations into the hands of insurers, telecommunications firms, and other users who can load the embeddings into their existing workflow to run risk models—like finding ZIP codes with environmental profiles resembling wildfire-prone areas—without API requests or extra storage.