I have a simple telegraf config file which just collects the disk space taken by influxdb shards (the data comes from the influxdb’s endpoint, as recommended in the docs):
[agent]
hostname = "docker-debian"
flush_interval = "10s"
interval = "10s"
# GET THE DISK USAGE OF THE INFLUXDB DATABASES, FROM INFLUXDB'S OWN INTERNAL MONITORING SYSTEM
[[inputs.influxdb]]
urls = ["http://influxdb_container:8086/debug/vars"]
namepass = ["influxdb_shard"]
fieldpass = ["diskBytes"]
taginclude = ["database","path"]
# SEND ALL THE DATA TO THE REMOTE INFLUXDB INSTANCE
[[outputs.influxdb]]
database = "telegraf"
urls = [ "http://influxdb_container:8086" ]
If I look at what data arrives in the database, I can see there are a bunch of values, one for each path and for each database:
Instead of having all the individual paths sent to Influxdb, I want to first sum all the paths and then send only a total for each database. Can this be done with Telegraf aggregators/processors?
Sorry if this is a dumb question, but why do you need to aggregate before they are stored? Wouldn’t it be easier to just aggregate the output in your queries?
As the database size grows, the number of path directories increases. For my setup, it was something like 20 paths. So I had 20 data points each time Telegraf collected data (and you have this for each database). This diskBytes measurement was actually by far the most space-consuming one in my setup.
I am not interested in the individual breakdown of how much disk space is occupied in each path - I am only interested in the total for each database. So it does not make sense for me to store all this unnecessary data.
It’s possible to do that. In fact I was doing it like that for a long time. But you need to do a subquery:
SELECT mean("diskBytes_summed") FROM (
SELECT sum("diskBytes") AS "diskBytes_summed" FROM "influxdb_shard"
WHERE "database"='telegraf' AND $timeFilter GROUP BY time(5s) fill(null)
)
GROUP BY time(10m)
The outer “group by” must be longer than the inner one (in my case I chose 10m and 5s), and the inner one must be shorter than your collection interval (mine was 10s). Otherwise it does not aggregate correctly and gives the wrong result for the sum over all paths.
Doing the aggregation in the query is cumbersome, non-intuitive, and far worse in terms of performance. My graphs were very slow to load the last 6 months of data.
This works nicely, and gives the total of the data directory summed over all paths.
(This is not quite the same as the influxdb_shard measurement, which I think also takes into account the WAL directory. But monitoring the data directory is enough, since this is where the majority of the disk space is used).