How to filter by 'lowest' tag value for same date/time?

So I have Telegraf pulling in the forecast data from openweathermap’s API and this works well enough. The forecast data is stored in lines with a forecast tag which can be something like 9h, 12h, 48h etc. This means for any given time there can be multiple values. So for example for temperature at 2021-01-22 16:10:00 GMT+1 the value tagged 9h might be 6.59 while the value for the tag 12h might be 6.71 for the same time/date. The 9h forecast should be more accurate than the 12h forecast so I’d like to use that value.

My problem is in how to filter the results so only one is returned per date/time and the one returned has the ‘lowest’ forecast tag (e.g. 9h, not 12h)

My current query looks like this:

from(bucket: "OpenWeatherForecast")
  |> range(start: now(), stop: 12h)
  |> filter(fn: (r) => r._measurement == "weather")
  |> filter(fn: (r) => r._field == "temperature")
  |> map(fn: (r) => ({ _time: r._time, _value: r._value }))
  |> aggregateWindow(every: 10m, fn: last, createEmpty: false)
  |> yield('last')

This is getting me the results but is choosing the last entry rather than the ‘correct’ one which would be the most recent and therefore most accurate forecast. Using mean instead of last gets me the mean of all the possible values for a given time but still not what I want.

Here is another example where you can see clearly there are many results for a single time/date:

Query that produced that example:

from(bucket: "OpenWeatherForecast")
  |> range(start: now(), stop: 3h)
  |> filter(fn: (r) => r["_measurement"] == "weather")
  |> filter(fn: (r) => r["_field"] == "temperature")
  |> map(fn: (r) => ({ _time: r._time, _value: r._value, forecast: r.forecast }))
  |> sort(columns: ["forecast"], desc: false)
  |> aggregateWindow(every: 10m, fn: last, createEmpty: true)
  |> yield(name: "last")

Any advice would be greatly appreciated!

Hello @tanc,
Have you tried using the top() function?

I believe it’s what you’re looking for? Please let me know if I’m mistaken.

@tanc This is possible, but it takes some type-conversion and grouping trickery. The first thing you’re going to need to do is add the _time column to and remove the forecast column from the group key. This will make sure the tables are grouped by the same time values, but include all the different forecast tag values in each table.

data
  |> group(columns: ["_value", "forecast"], mode: "except")
//...

You’ll then need to convert the forecast string into a duration, then an integer so it can be mathematically compared. Then you’ll be able to find the minimum value. Once you have the minimum value, you probably want to convert the integer back into a duration, then a string.

data
  |> group(columns: ["_value", "forecast"], mode: "except")
  |> map(fn: (r) => ({ r with forecast: int(v: duration(v: r.forecast))}))
  |> min(column: "forecast")
  |> map(fn: (r) => ({ r with forecast: string(v: duration(v: r.forecast)) }))

That should give you the value associated with the lowest forecast tag.

Thank you so much @scott and @Anaisdg ! :+1: