Paper | Toto Model Card | BOOM Dataset Card | Blogpost
Toto is a foundation model for multivariate time series forecasting with a focus on observability metrics. This model leverages innovative architectural designs to efficiently handle the high-dimensional, complex time series that are characteristic of observability data.
This repository also hosts the code for evaluating time series models on BOOM (Benchmark of Observability Metrics), a large-scale forecasting dataset composed of real-world observability data.
- Zero-Shot Forecasting: Perform forecasting without fine-tuning on your specific time series
- State-of-the-Art Performance: Achieves top scores in benchmarks covering diverse time series forecasting tasks. This includes the established multi-domain benchmark GIFT-Eval, as well as our own observability-focused benchmark BOOM.
- Multi-Variate Support: Efficiently process multiple variables using Proportional Factorized Space-Time Attention
- Probabilistic Predictions: Generate both point forecasts and uncertainty estimates using a Student-T mixture model
- High-Dimensional Support: Handle time series with a large number of variables efficiently
- Decoder-Only Architecture: Support for variable prediction horizons and context lengths
- Pre-trained on Massive Data: Trained on over 2 trillion time series data points, the largest pretraining dataset for any open-weights time series foundation model to date.
Toto-Open, the open-weights release of Toto, is available on Hugging Face. Currently available checkpoints:
Toto-Open-Base-1.0 | 151M | The initial open relase of Toto. Achieves state-of-the-art performance on both general-purpose and observability-focused benchmarking tasks, as described in our paper. |
For optimal inference speed, it's recommended to install xformers and flash-attention as well.
Here's a simple example to get you started with forecasting:
For a comprehensive guide on using Toto for time series forecasting, check out our tutorial notebooks:
- Basic Inference Tutorial: Learn how to load the model and make forecasts
Toto was trained on a massive and diverse mixture of time series datasets:
The largest portion of pretraining data comes from a dataset of approximately 1 trillion time series points collected from Datadog metrics. These metrics are generated from Datadog's monitoring of internal systems, and do not include any customer data. They cover a diverse array of software stacks and types of services, and span wide variety of domains within observability, including application performance, infrastructure, networking, security, databases, and more.
To improve the performance of Toto on general-purpose time series forecasting across many domains, we include publcly availa
- GiftEval Pretrain
- Chronos pretraining data (Note: only a subset of this dataset was used to avoid leakage with the GiftEval benchmark)
To improve robustness, approximately 1/3 of the pretraining data mix consists of synthetically-generated time series.
Toto has been rigorously evaluated on multiple benchmarks, including both general-purpose datasets and observability-focused datasets like BOOM. Below, we provide instructions for reproducing our evaluation results.
To reproduce our results on the LSF datasets, follow these steps:
The LSF evaluation requires three datasets: ETT, Electricity, and Weather. You can download them from the Time-Series-Library repository. Follow the instructions in the repository to obtain the following already pre-processed datasets:
- ETT (Electricity Transformer Temperature): Includes four subsets: ETTh1, ETTh2, ETTm1, and ETTm2.
- Electricity
- Weather
After downloading, ensure the datasets are placed in the data/lsf_datasets/ directory within the repository, with the following structure:
Once the datasets are set up, you can run the LSF evaluation script as follows to reproduce our results:
To see all available options for the evaluation script, you can use the --help flag:
The script evaluates Toto's performance using Mean Absolute Error (MAE) and Mean Squared Error (MSE) across the specified datasets, context lengths, and prediction lengths. It displays a detailed table of results for each prediction length, along with a summary table that averages the results across prediction lengths for each dataset.
To reproduce the results presented in the paper, use the default arguments while setting --eval-stride 1 and specifying all datasets with --datasets ETTh1 ETTh2 ETTm1 ETTm2 weather electricity.
To reproduce our results on the GIFT-Eval benchmark, we provide a dedicated notebook:
- GIFT-Eval Evaluation Notebook: Step-by-step instructions for running Toto on the GIFT-Eval benchmark and reproducing the reported results.
For evaluating Toto on the BOOM (Benchmark of Observability Metrics) dataset, refer to:
- BOOM Evaluation Notebook: Example workflow for running Toto on the BOOM dataset.
- BOOM README: Detailed instructions and scripts for benchmarking on BOOM.
These resources provide all necessary steps to run and reproduce BOOM evaluation results with Toto.
- Python 3.10+
- PyTorch 2.5+
- CUDA-capable device (Ampere generation or newer recommended for optimal performance)
BOOM (Benchmark of Observability Metrics) is a large-scale, real-world time series dataset designed for evaluating models on forecasting tasks in complex observability environments. Composed of real-world metrics data collected from Datadog, a leading observability platform, the benchmark captures the irregularity, structural complexity, and heavy-tailed statistics typical of production observability data. Unlike synthetic or curated benchmarks, BOOM reflects the full diversity and unpredictability of operational signals observed in distributed systems, covering infrastructure, networking, databases, security, and application-level metrics.
Note: the metrics comprising BOOM were generated from internal monitoring of pre-production environments, and do not include any customer data.
For more information on the dataset, including details on its preparation and statistical properties, see the dataset card in Hugging Face.
For example evaluations of different time series models on the BOOM dataset, see the boom folder in this repository.
If you use Toto in your research, please cite our work:
Unless explicitly stated otherwise all files in this repository are licensed under the Apache-2.0 License - see LICENSE file for details.
This product includes software developed at Datadog (https://www.datadoghq.com/) Copyright 2025 Datadog, Inc.
We welcome contributions! Please check out our contributing guidelines to get started.