So I have Telegraf pulling in the forecast data from openweathermap’s API and this works well enough. The forecast data is stored in lines with a forecast tag which can be something like 9h, 12h, 48h etc. This means for any given time there can be multiple values. So for example for temperature at 2021-01-22 16:10:00 GMT+1 the value tagged 9h might be 6.59 while the value for the tag 12h might be 6.71 for the same time/date. The 9h forecast should be more accurate than the 12h forecast so I’d like to use that value.
My problem is in how to filter the results so only one is returned per date/time and the one returned has the ‘lowest’ forecast tag (e.g. 9h, not 12h)
My current query looks like this:
from(bucket: "OpenWeatherForecast")
|> range(start: now(), stop: 12h)
|> filter(fn: (r) => r._measurement == "weather")
|> filter(fn: (r) => r._field == "temperature")
|> map(fn: (r) => ({ _time: r._time, _value: r._value }))
|> aggregateWindow(every: 10m, fn: last, createEmpty: false)
|> yield('last')
This is getting me the results but is choosing the last entry rather than the ‘correct’ one which would be the most recent and therefore most accurate forecast. Using mean instead of last gets me the mean of all the possible values for a given time but still not what I want.
Here is another example where you can see clearly there are many results for a single time/date:
Query that produced that example:
from(bucket: "OpenWeatherForecast")
|> range(start: now(), stop: 3h)
|> filter(fn: (r) => r["_measurement"] == "weather")
|> filter(fn: (r) => r["_field"] == "temperature")
|> map(fn: (r) => ({ _time: r._time, _value: r._value, forecast: r.forecast }))
|> sort(columns: ["forecast"], desc: false)
|> aggregateWindow(every: 10m, fn: last, createEmpty: true)
|> yield(name: "last")
Any advice would be greatly appreciated!