TimesFM 2.5 Review: Google's Time-Series Foundation Model

TL;DR

TimesFM 2.5 is Google Research’s open-weights, decoder-only foundation model for time-series forecasting — the time-series equivalent of “ship one pretrained model, prompt it on any series, get a usable zero-shot forecast.” Version 2.5 (the current release) is 200M parameters, supports up to 16k context tokens, and ranks #1 by MASE on the GIFT-Eval forecasting benchmark.

Key facts:

25.4K GitHub stars, +4.4K this week — currently top-trending on GitHub
Decoder-only transformer, pretrained on 100B real-world time points (Google Trends + Wikipedia pageviews, plus synthetic series)
TimesFM 2.5 vs 2.0: 200M params (down from 500M), 16k context (up from 2,048), continuous quantile head, no more frequency indicator, new forecasting flags
Apache-2.0 license for the open release; checkpoints on Hugging Face under google/timesfm-2.5-200m-pytorch
First-party Google deployments: BigQuery ML (AI.FORECAST), Connected Sheets, Vertex AI Model Garden — same model, three surface areas
Agent-ready: ships an AGENTS.md and a SKILL.md so Claude Code / Cursor / OpenClaw / Codex can call it as a tool
Zero-shot performance matches or beats supervised baselines on the Monash, Darts, and Informer ETT benchmarks
XReg covariate support for adding exogenous features (price, weather, promotions, holidays)
LoRA fine-tuning example via HuggingFace Transformers + PEFT, plus a Flax backend for faster inference
TimesFM-ICF (in-context fine-tuning) variant uses few-shot examples in the prompt — same idea as LLM in-context learning, applied to time series

If you’ve been hand-rolling Prophet, ARIMA, or per-series LightGBM models for every new forecasting problem, TimesFM is what stops that loop.

What “foundation model for time-series” actually means

The pitch is simple: one pretrained model, prompted with any historical series, returns a forecast — no per-series training, no manual feature engineering, no hyperparameter sweep.

This is the same pattern LLMs use, but adapted to numeric sequences instead of text tokens. Inputs are batched arrays of past observations; outputs are point forecasts (mean) plus optional 10th-to-90th-percentile quantiles when you enable the quantile head.

The architecture choices are deliberate:

Decoder-only transformer — mirrors GPT-style next-token prediction, but the “tokens” are patches of numeric values
Patching — chunks contiguous time-series into fixed-size patches before tokenization, drastically reducing sequence length vs. one-value-per-token
Patch length asymmetry — input patches are longer than output patches, which helps with multi-horizon forecasting without quality degradation
Decoder-only autoregression — generate future patches one at a time

The Google Research paper (arXiv:2310.10688, ICML 2024) shows that with 100B real time-points pretraining, the model generalizes to forecasting tasks it has never seen. That’s the foundation-model claim, and the GIFT-Eval results back it up.

What’s new in TimesFM 2.5

Most of the public coverage is still about 1.0 and 2.0. The current shipping model is 2.5 and the differences matter:

	TimesFM 2.0	TimesFM 2.5
Parameters	500M	200M (smaller)
Max context	2,048	16,384
Quantile head	base only	continuous quantile, 1k horizon (optional 30M head)
Frequency indicator	required	removed
Backends	PyTorch	PyTorch + Flax
Covariate support	XReg	XReg (re-added Oct 2025)

The 200M-param model is smaller than 2.0’s 500M and still tops the benchmark. The 16k context is the bigger practical win — you can feed in years of hourly data or weeks of minute-level data and let the model decide what’s relevant.

Removing the frequency indicator means you don’t have to pre-classify whether your series is hourly, daily, weekly, etc. The model infers it from the patch dynamics. One fewer footgun.

Install and a first forecast

The whole API is one import. Install:

# PyTorch
pip install timesfm[torch]
# Or Flax (faster inference)
pip install timesfm[flax]
# Add XReg if you need covariate support
pip install timesfm[xreg]

Or if you prefer uv:

git clone https://github.com/google-research/timesfm.git
cd timesfm
uv venv && source .venv/bin/activate
uv pip install -e .[torch]

Then a complete zero-shot forecast in 20 lines:

import torch
import numpy as np
import timesfm

torch.set_float32_matmul_precision("high")

model = timesfm.TimesFM_2p5_200M_torch.from_pretrained(
    "google/timesfm-2.5-200m-pytorch"
)

model.compile(
    timesfm.ForecastConfig(
        max_context=1024,
        max_horizon=256,
        normalize_inputs=True,
        use_continuous_quantile_head=True,
        force_flip_invariance=True,
        infer_is_positive=True,
        fix_quantile_crossing=True,
    )
)

point_forecast, quantile_forecast = model.forecast(
    horizon=12,
    inputs=[
        np.linspace(0, 1, 100),       # linear ramp
        np.sin(np.linspace(0, 20, 67)) # sine wave
    ],
)

print(point_forecast.shape)     # (2, 12)
print(quantile_forecast.shape)  # (2, 12, 10): mean + 10th–90th percentiles

No training, no fitting, no fitting per-series. Pass an array of past values, get a forecast.

The new forecasting flags in 2.5 — force_flip_invariance, infer_is_positive, fix_quantile_crossing — are quality-of-life toggles that handle common forecast pathologies (negative predictions on strictly-positive series, crossed quantile lines, etc.) without monkey-patching the output.

Adding covariates with XReg

Many real forecasts depend on more than just history — pricing decisions, weather, promotions, holiday calendars. TimesFM 2.5 supports these as exogenous regressors via the XReg API.

# Install with: pip install timesfm[xreg]
# Pseudocode shape — see timesfm-forecasting/examples for the full API
forecast = model.forecast_with_covariates(
    history=demand_series,
    horizon=14,
    static_covariates={"store_id": 42, "region": "NW"},
    dynamic_covariates={
        "price": price_history_and_future_plan,
        "is_promo": promo_indicator,
        "is_holiday": holiday_indicator,
    },
)

This brings TimesFM into Prophet/DeepAR/N-HiTS territory for retail-demand and operations forecasting, without losing the zero-shot/few-shot superpower of the base model.

Benchmarks: GIFT-Eval and beyond

This is where TimesFM 2.5 earns its spot above the noise.

GIFT-Eval (the comprehensive 2024-2025 time-series forecasting benchmark; arXiv:2410.10393): TimesFM 2.5 ranks #1 by MASE, ahead of Chronos, Moirai, MOMENT, TimeMixer, and the supervised baselines.
Monash benchmark (long-standing zero-shot evaluation suite): TimesFM zero-shot performance matches or surpasses supervised baselines trained per-dataset.
Darts benchmark: zero-shot TimesFM beats most classical and many deep-learning baselines.
Informer ETT (the electricity transformer temperature long-horizon dataset): TimesFM is competitive with the dedicated Informer / Autoformer architectures designed specifically for it.

The picture: a single pretrained checkpoint generalizes broadly enough to dominate benchmarks that have historically required bespoke models per dataset.

TimesFM-ICF: in-context fine-tuning

TimesFM-ICF is the more recent twist. It applies the LLM in-context learning paradigm to forecasting: you can include a few example series in the input, and the model treats them as demonstrations for the prediction task.

The mechanism is a special separator token between examples and the target series, plus training on prompts that look like [example_1] [SEP] [example_2] [SEP] [target_history] -> [target_future].

In practice this lets you “teach” the model about a new domain (e.g. a specific SKU’s seasonal pattern) without ever updating weights. The few-shot examples can come from related series, prior years, or sibling SKUs.

Where you can actually run it

This is where TimesFM differs from most research-only foundation models — Google ships it into production surfaces:

BigQuery ML — SELECT * FROM AI.FORECAST(MODEL ..., STRUCT(...)) directly in SQL, enterprise-scale, fully managed
Connected Sheets — point-and-click forecasting on your daily spreadsheets
Vertex AI Model Garden — Dockerized endpoint, useful for agent-style tool calling
Hugging Face — google/timesfm-2.5-200m-pytorch for self-hosted inference
Local PyTorch or Flax — runs on CPU, GPU, TPU, or Apple Silicon

The BigQuery integration in particular collapses a lot of forecasting infrastructure work. If your data already lives in BigQuery, you can forecast it without moving it.

Agent integration: AGENTS.md and SKILL.md

TimesFM 2.5 ships an AGENTS.md and a SKILL.md (in timesfm-forecasting/) so AI coding agents can use it as a tool. That means Claude Code, Cursor, OpenClaw, or Codex can be prompted with “forecast next month’s daily active users for this CSV” and they’ll pull TimesFM in, write the loader, run the forecast, and return the chart.

This is a small thing but it matters — the foundation-model wave is converging with the agent-skills wave, and Google shipping a first-class skill alongside the checkpoint is a strong signal that this is how forecasting will be invoked from now on.

Community reaction and trajectory

The repo is on the GitHub Trending front page this week (+4.4K stars in 7 days, 25.4K total). The notable signals:

The 2.5 release in Sept 2025 was followed by steady iteration — Flax backend, XReg, LoRA fine-tuning example, unit tests, agent skill — all checked off in public
The HuggingFace collection has all checkpoint versions co-existing (1.0, 2.0, 2.5) so you don’t have to rewrite for the latest model
Several recent third-party guides (explainx.ai, Pebblous, AIToolly) frame TimesFM as the time-series equivalent of “ship the pretrained model, fine-tune sparingly”
Production users on retail-demand and predictive-maintenance use cases report 30-50% reductions in pipeline-maintenance overhead vs. per-series classical models

The most common feedback is positive surprise that the 200M model beats 2.0’s 500M — smaller and better is rare in foundation-model releases.

Honest limitations

It’s not officially supported. The README says “this open version is not an officially supported Google product.” Production-critical workloads should use BigQuery ML, Vertex, or sign a contract — the open repo is research-grade.
Forecasting only, no anomaly detection or imputation built in. If you need anomaly detection in the same pipeline, you’ll pair it with Chronos, MOMENT, or a separate anomaly model.
Context must be contiguous (no missing values in history) for older checkpoints. 2.5 is more forgiving but you still want to pre-impute large gaps.
Quantile head is optional and adds ~30M params; some early demos skipped it and people complained the point-forecast-only setup was useless for risk-aware decisions. Always enable the continuous quantile head for production use.
Compute matters at 16k context. Apple Silicon and CPU are fine for short series; long-context, high-throughput inference benefits significantly from GPU or the Flax backend.
License nuance: open weights are Apache-2.0, but Google’s first-party deployments (BigQuery, Sheets, Vertex) are commercial products with their own pricing.

Frequently Asked Questions

How is TimesFM different from Chronos, Moirai, or MOMENT?

All four are time-series foundation models, but the architectures and trade-offs differ. Chronos (Amazon) tokenizes values as integers and uses T5; Moirai (Salesforce) is encoder-only and multi-scale; MOMENT (CMU) is encoder-only and strong on anomaly detection and imputation; TimesFM is decoder-only, matching the LLM paradigm, with 16k context and explicit BigQuery / Sheets / Vertex production deployments. On GIFT-Eval, TimesFM 2.5 currently ranks first by MASE. Pick TimesFM if you want forecasting + production-grade Google integration; pick MOMENT if you also need anomaly detection in the same model.

Can TimesFM use covariates like price and weather?

Yes, via XReg (pip install timesfm[xreg]). XReg lets you pass exogenous regressors — static (store ID, region) and dynamic (price, promotions, holidays, weather) — alongside the historical series. This brings TimesFM into the same use-case territory as Prophet, DeepAR, or N-HiTS while keeping the zero-shot foundation-model base.

Do I need to fine-tune TimesFM for my own data?

Usually no. The zero-shot performance is strong enough on most benchmarks to skip fine-tuning entirely. If you do want to specialize, the repo includes a LoRA fine-tuning example using HuggingFace Transformers + PEFT in timesfm-forecasting/examples/finetuning/. For domain-specific patterns without weight updates, try TimesFM-ICF and pass a few example series as in-context demonstrations.

What’s the largest forecast horizon TimesFM can handle?

TimesFM 2.5 supports up to 16k context tokens and the optional continuous quantile head supports forecast horizons up to 1,000 steps. In practice, longer horizons have wider uncertainty — always use the quantile head, not just the point forecast, when forecasting far into the future.

How does TimesFM run inside an AI agent like Claude Code or OpenClaw?

The repo ships an AGENTS.md and a SKILL.md in the timesfm-forecasting/ directory. Drop them into your agent’s skills directory and the agent automatically knows how to invoke TimesFM for forecasting tasks. From a prompt like “forecast next month’s daily users from this CSV”, the agent will install TimesFM, load the model, run the forecast, and chart the result — no manual orchestration.

Is the BigQuery ML version the same model as the open release?

It’s the same architecture and checkpoint family, exposed through BigQuery’s AI.FORECAST SQL function. BigQuery ML lets you forecast at scale on data that already lives in BigQuery without moving it; the open Hugging Face checkpoint lets you self-host and customize. Same model behavior, different deployment surface.

Verdict

TimesFM 2.5 is the rare research release that has hit “production foundation model” status — it’s smaller than its predecessor, ranks #1 on the leading benchmark, ships into three Google first-party surfaces, and now includes an agent skill so coding assistants can drive it.

If you’re maintaining a forecasting pipeline with bespoke Prophet / ARIMA / LightGBM models per series, TimesFM is the migration path that lets you delete most of that code. If you’re starting a new forecasting use case in 2026, TimesFM is the default starting point — fall back to specialized models only when the benchmarks tell you to.

⭐ TimesFM on GitHub · Hugging Face checkpoints · Google Research paper