Monitor influxdb with telegraf always timeout

telegraf

#1

I use telegraf input plugin influxdb to collect influxdb running info. That means It request the debug/vars endpoints to get data.

I have more than 5000 shards, so this endpoint responses about 15MB respodata size and takes more then 20 seconds to complete. I think it’s too long. And when I check the returned file, thereis so much info that I don’t care:

"shard:/data1/influxdb/data/system_200174/rp_system_200174/32640:32640": {"name":"shard","tags":{"database":"system_200174","engine":"tsm1","id":"32640","path":"/data1/influxdb/data/system_200174/rp_system_200174/32640","retentionPolicy":"rp_system_200174","walPath":"/data1/influxdb/wal/system_200174/rp_system_200174/32640"},"values":{"diskBytes":19375,"fieldsCreate":0,"seriesCreate":18,"writeBytes":0,"writePointsDropped":0,"writePointsErr":0,"writePointsOk":0,"writeReq":0,"writeReqErr":0,"writeReqOk":0}},

Could it possible to filter out these shard detailed info, like curl http://localhost:8086/debug/vars?data=system,cmdline,runtime,queryExecutor,database,write,subscriber,cq,httpd, that I drop shard,tsm1_engine,tsm1_cache.

Or what’s the suggest collect interval? Does the long returning resonse will impact the influxdb instance?