How to store/query forecast timeseries i.e. two timeseries keys pair

Hi, I find this issue on github, how to store/query forecast timeseries i.e. two timeseries keys pair · Issue #4314 · influxdata/influxdb · GitHub which was not an issue but a good question…
Here is an exact copy of the question as the author never posted here on googlegroup as suggested (or I didn’t find it)

hi there, is influxdb suitable for the following type of data? if so, could you please give me some hints on how shall I store them in order to query them efficiently? thanks

forecast_date, value_date, value
2015-01-01 23:00:00, 2015-01-01 00:00:00, 100.00
2015-01-01 23:00:00, 2015-01-01 12:00:00, 110.00
2015-01-02 00:00:00, 2015-01-02 12:00:00, 120.00
2015-01-02 00:00:00, 2015-01-03 00:00:00, 130.00

e.g. I want to query the latest forecast for each value_date point i.e.

2015-01-01 00:00:00, 100.00
2015-01-02 12:00:00, 120.00
2015-01-03 00:00:00, 130.00

Here’s an interresting answer of Jack Zampolin posted on googlegroups

I would suggest using a forecast_id as a tag to differentiate the forecasts from one another. Then store the value_date as the timestamp. This is what I do when using holt_winters. In the future can you ask questions like this over on community.influxdata.com? We have moved the Q&A over there.

@Got Does that work for your usecase? Glad I was able to help!

I will implement it in the next few days. I’ll use a forecast_date as a tag (since there is no id on my side)

I’ll keep you informed but I think the idea is great.

Okay, I just implemented this architecture, this seems great but there is only one little limitation : tags are stored like strings, so I didn’t find any easy way to get all data for the last forecast_id.

Have you any suggestion?

@Got Can’t you query by forecast_id? WHERE forecast_id = 'foo'?

Yes it work,

In my case, forecast_id is not an id, but the time of the forecast (so the tag name is forecast_time). :slight_smile:

Initially, I was plannig to make request like select max(forecast_time) but this won’t work with a tag.

If I’m not wrong, This mean I have to store a database of distinct forecast_id / forecast_time elsewhere…

@Got Another option would be to store the forecasted data in its own measurement or database. That way the forecasted data has the exact same structure as the actual data. In that case you could get the most recent forecast with a LAST query,

SELECT last(value) as value, value_date, forecast_date FROM "forecasted_data"

Using a time value for a tag will not work very well as it means that you have unbounded cardinalitly for the tag which will eventually means that you have too many tags values for the database to handle well.

1 Like