Hi,
We are working on telegraf for collecting data from json file through “tail” in-out plugin and outputs to influxdb. Each point has 10 tag, 350 fields and 80 points are pushed to json file in 1 second. Everything is in single server.
Telegraf Configuration is
interval = 5s
flush interval 5s
influxdb timeout = 5s
time format is unix_ns.
We are facing some packet loss. for 3Lakh records some 1000 records are lost.
I did not find any losses in the influxdb (from the _internal database).
I am not able to figure out the reason for packet loss. Data load is only moderate as per the hardware sizing guideline in influxdb.
We cont afford of losing the data. Please help to find out the reason for data loss. Is telegraf culprit??
I would start by turning on the internal plugin, and then check the internal_write measurement for the metrics_dropped field. This is tagged per output plugin, does it remain at zero?
Now we are neither rotating nor truncating. We will run the test for 2-3 hours and check the data. We are still in initial phases.
Is there any chance that telegraf collecting data is missed, will the inout plugin hang, when output plugin is despacting data?.
We are using “inotify” for the tail plugin.
Not able to get the proper reason for packet drop, Or is there any specific telegraf configuration for so and so load of data.