Do batch import timeouts result in dropped data?

John_Sobanski · April 24, 2019, 7:08pm

I have a Telegraf node that ingests a newline delimited JSON file and emits to influxdb.

I have the following configuration for inputs.tail:


[[inputs.tail]]
   files = ["./*.txt"]
   from_beginning = true
   json_time_key = "time"
   json_time_format = "2006-01-02T15:04:05Z07:00"
   json_name_key = "foo"
   tag_keys = ["foo","bar","stooge"]
   data_format = "json"

This input configuration works as expected. Telegraf ingests a 2GB file of ~3M JSON data points without issue.

I then decided to get fancy and emit to a separate DB per stooge.

[[outputs.influxdb]]
  urls = ["http://127.0.0.1:8086"]
  database = "MOE"
  [outputs.influxdb.tagpass]
  stooge = ["MOE"]

[[outputs.influxdb]]
  urls = ["http://127.0.0.1:8086"]
  database = "LARRY"
  [outputs.influxdb.tagpass]
  stooge = ["LARRY"]

[[outputs.influxdb]]
  urls = ["http://127.0.0.1:8086"]
  database = "CURLEY"
  [outputs.influxdb.tagpass]
  stooge = ["CURLEY"]

This works as expected, except that I see some errors.

2019-04-24T18:52:34Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-04-24T18:52:36Z E! [outputs.influxdb] when writing to [http://127.0.0.1:8086]: Post http://127.0.0.1:8086/write?db=MOE: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-04-24T18:52:39Z E! [outputs.influxdb] when writing to [http://127.0.0.1:8086]: Post http://127.0.0.1:8086/write?db=LARRY: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-04-24T18:52:39Z D! [outputs.influxdb] buffer fullness: 10000 / 10000 metrics. 
2019-04-24T18:52:39Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-04-24T18:55:31Z W! [agent] output "influxdb" did not complete within its flush interval

I have two questions

Are these timeouts of any concern? Or will telegraf re-try to emit the data until it receives a successful WRITE ACK from InfluxDB?
Should I tune the parameters, e.g. decrease the batch size, increase the buffer size, and increase the timeouts or is this a bad idea?

MarcV · April 24, 2019, 7:48pm

Hi , your first question …

this means metrics are dropped … and the buffer should be increased

  ## For failed writes, telegraf will cache metric_buffer_limit metrics for each
  ## output, and will flush this buffer on a successful write. Oldest metrics
  ## are dropped first when this buffer fills.
  ## This buffer only fills when writes fail to output plugin(s).
  metric_buffer_limit = 10000

Topic		Replies	Views
Data Loss happens at telegraf side influxdb , telegraf	4	1444	August 8, 2019
Can't insert timestamp in influx influxdb , telegraf	4	841	January 17, 2023
Import JSON file and stop Telegraf telegraf	8	3069	July 30, 2018
Influx write timeout with Telegraf Telegraf influxdb , telegraf	1	2549	August 24, 2017
Telegraf - Default value instead of connection time-outs Telegraf influxdb , telegraf , grafana	2	1049	August 10, 2021

Do batch import timeouts result in dropped data?

Related topics