Hi,
we sample our batch scheduler hosts about 2-3 times per minute and record the number of running/idle/held jobs (3 fields) per user+id/scheduler (2 tags) via telegraf into influxdb.
For a grafana dashboard (e.g. BlendStat looks suitable), I would like to calculate
(a) the total number of jobs, i.e. running + idle + held jobs
(b) group these by ‘user+id’ and ‘scheduler’ tags
© and use only the last value recorded for these combos, provided they appeared within the last 5 minutes.
From all these, I would like to select the three results with the largest total number of jobs and display them next to each other on the dash board,e.g. like this:
+-------------+ +-------------+ +-------------+
| user+id | | user+id | | user+id |
| scheduler | | scheduler | | scheduler |
| running | | running | | running |
| idle | | idle | | idle |
| held | | held | | held |
+-------------+ +-------------+ +-------------+
(or if possible, even fancier with the time series graph for the past hour as a background).
However, I am still struggling to get my head around InfluxDB, e.g. I can use show tag values to gather the available ‘user+id’ and ‘scheduler’, but so far I was not able to use those to come up with a suitable query.
sample data looks like (condortag is user+id from above)
> select * from condor limit 2
name: condor
time condortag hold host idle run
---- --------- ---- ---- ---- ---
1580224190000000000 user1@test.test4 1 condor3 5 159
1580224190000000000 user2@test.test2 0 condor3 1140 335
Hopefully, I was able to get the base information over to you, is it possible to achieve this?
Cheers
Carsten