Google's 200M-parameter time-series foundation model with 16k context

https://github.com/google-research/timesfm

GitHub - google-research/timesfm: TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting.

TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting. - google-research/timesfm

GitHub

I somehow find the concept of a general time series model strange. How can the same model predict egg prices in Italy, and global inflation in a reliable way?

And how would you even use this model, given that there are no explanations that help you trust where the prediction comes from…

What is not generally understood is that these models don’t predict egg prices or inflation in Italy.

They decompose a time series into trends, seasonality and residuals. That’s what they are actually modelling.

They cannot predict wars in the Middle East influencing inflation unless there is a seasonal pattern(s).

Wars in the middle east seem to have increasingly regular patterns tied to stock market opening hours, unfortunately.

I totally agree with the sentiment but from what I can tell, I’d say they tend happen immediately before or after markets open and close. Essentially, and to their maximum, screwing absolutely everyone who isn’t in the clique from participating in the trade.

FWIW— the only sure fire way to win the trade is to buy time and assume both gross incompetence and negligence when it comes action. The only caveat is if the markets tank enough, this administration will signal capitulation before hand, e.g. Trump mildly capitulating on tariffs last April after the markets proceed to relentlessly defecate themselves.

0-DTE options are typically, and for good reason, stupid gambles. But, right now they can’t even be considered gambling, because there’s zero chance of winning. Not just bad odds, but no odds. Again just signaling how truly malicious this admin is and its disdain for anyone and everyone not close to them.

I mean it's super obvious, it's directly tied to scrubs popularity.

New season of scrubs = new war in the middle east.

Wow, I didn't know. Thank you! Such a great show.
That's what traditional time-series modelling does. This is a foundational model, which means it's just a neural network trained on lots of time series. (So maybe OP's question still stands? But it's the same question as "how can LLMs be good at so many different kinds of conversations?")

Do these models predict on just a single time series then?

it is far more useful for predictions to look for correlations between time series. This is far more complex than looking for correlations in general because most time series trend up or down and therefore correlate.

What makes these models different from models used for e.g. audio?

Or other low-dimensional time domain signals?

> They cannot predict wars in the Middle East influencing inflation unless there is a seasonal pattern(s).

well...

I would say:

- decomposition: discover a more general form of Fourrier transform to untangle the underlying factors

- memorization: some patterns are recurrent in many domains such as power low

- multitask: exploit cross-domain connections such as weather vs electricity

My understanding is that the synthetic training data helps capture abstract time-series patterns that are common in all domains.

As they say in appendix 8:

> We create the synthetic data to reflect common time-series patterns using traditional statistical models. We start with four simple times series patterns:

> • Piece-wise linear trends (I), where the number of the piece-wise linear components is randomly chosen between 2 and 8.

> • ARMA(p, q) (II), where 1 ≤ p, q ≤ 8 and the corresponding coefficients are generated from either a multivariate Gaussian or a uniform, then normalized.

> • Seasonal patterns. In particular we create the sine (III) and the cosine (IV) waves of different random periods between 4 and max context length / 2 time-points and time delays.

If there were no such underlying patterns in the class of all time-series data, then even the idea of traditional time-series models would be fundamentally misplaced.

And since this is a transformer model, it also looks for patterns in the problem-specific input data at inference time, just like how the input context to an LLM influences its output's relevance.

> How can the same model predict egg prices in Italy, and global inflation in a reliable way?

How can the same lossy compression algorithm (eg JPG) compress pictures of everything in a reliable way?

It can't compress pictures of everything in a reliable way.

Text and anything with lots of high frequency components looks terrible

Reliably terrible.

It still doesn't pretty well on text. And we have newer formats and ideas that would also deal with that. (To be really dead simple: have a minimal container format that decides between png or jpg, use png for text.)

However: white noise is where it really struggles. But real pictures of the real world don't look like white noise. Even though in some sense white noise is the most common type of picture a priori.

Similar for real world time series: reality mostly doesn't look like white noise.

White noise is random, so it's incompressible by definition. By JPG or by any other method no matter how clever.

I have a very peculiar coin. With 1% probability it turns up heads and with 99% probability it turns up tails.

A string of flips is random, but it's very compressible.

In any case, my point was that reality ain't uniformly random. And not only that: pretty much anything you can point your camera at shares enough similarity in their distribution that we pretty much have universal compression algorithms for real world data.

It’s not really predicting “egg prices” or “inflation” — it’s mostly fitting patterns that happen to show up in those series.

The problem isn’t domain generalization, it’s that we keep pretending these models have any notion of what the data means.

People ask how one model can understand everything, but that assumes there’s any understanding involved at all.

At some point you have to ask: how much of “forecasting” is actually anything more than curve fitting with better marketing?