We use telegraf 1.4.2 on our hosts (about 350) and one influxDB 1.3.7 instance on a separate server receiving the telegraf events.
Currently we are sending about 200 data points per minute per host, each holding 4 tags max.
The hosts send via UDP.
The tag values are not longer than 12 characters, the keys not longer than 8.
The series amount on the influxDB database is currently at 38200.
Hardware of the influx server:
UDP configuration from the influxDB config.
enabled = true
bind-address = ":8089"
database = “telegraf”
# retention-policy = “”
# These next lines control how batching works. You should have this enabled # otherwise you could get dropped metrics or poor performance. Batching # will buffer points in memory if you have many coming in. # Flush if this many points get buffered batch-size = 5000 # Number of batches that may be pending in memory batch-pending = 10 # Will flush at least this often even if we haven't hit buffer limit batch-timeout = "1s" # UDP Read buffer size, 0 means OS default. UDP listener will fail if set above OS max. read-buffer = 0
With this setup everything works fine, we get all events in the right intervals and with the right values.
If we add 4 more tags to each datapoint we can see the phenomen, that only about a fourth of the data points are written into the influxDB. We monitored the traffic with tcp dump on the hosts and the influxDB server and can confirm, that the points are correctly sent by telegraf and receveived at the influx port but not written into the database.
We also tested sending via HTTP/TCP, same result.
It feels a bit like a queuing issue. The phenomen does not occour when we send the extended tag set from just a couple of hosts. But as soon as we send from all 350 hosts, the written points go down.
We need the tag information so we can group by them in our queries. Every numeric value is written into fields.
Has anyone got a similar problem? How could we adjust the influx config?