Do batch import timeouts result in dropped data?

I have a Telegraf node that ingests a newline delimited JSON file and emits to influxdb.

I have the following configuration for inputs.tail:


[[inputs.tail]]
   files = ["./*.txt"]
   from_beginning = true
   json_time_key = "time"
   json_time_format = "2006-01-02T15:04:05Z07:00"
   json_name_key = "foo"
   tag_keys = ["foo","bar","stooge"]
   data_format = "json"

This input configuration works as expected. Telegraf ingests a 2GB file of ~3M JSON data points without issue.

I then decided to get fancy and emit to a separate DB per stooge.

[[outputs.influxdb]]
  urls = ["http://127.0.0.1:8086"]
  database = "MOE"
  [outputs.influxdb.tagpass]
  stooge = ["MOE"]

[[outputs.influxdb]]
  urls = ["http://127.0.0.1:8086"]
  database = "LARRY"
  [outputs.influxdb.tagpass]
  stooge = ["LARRY"]

[[outputs.influxdb]]
  urls = ["http://127.0.0.1:8086"]
  database = "CURLEY"
  [outputs.influxdb.tagpass]
  stooge = ["CURLEY"]

This works as expected, except that I see some errors.

2019-04-24T18:52:34Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-04-24T18:52:36Z E! [outputs.influxdb] when writing to [http://127.0.0.1:8086]: Post http://127.0.0.1:8086/write?db=MOE: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-04-24T18:52:39Z E! [outputs.influxdb] when writing to [http://127.0.0.1:8086]: Post http://127.0.0.1:8086/write?db=LARRY: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-04-24T18:52:39Z D! [outputs.influxdb] buffer fullness: 10000 / 10000 metrics. 
2019-04-24T18:52:39Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-04-24T18:55:31Z W! [agent] output "influxdb" did not complete within its flush interval

I have two questions

  • Are these timeouts of any concern? Or will telegraf re-try to emit the data until it receives a successful WRITE ACK from InfluxDB?
  • Should I tune the parameters, e.g. decrease the batch size, increase the buffer size, and increase the timeouts or is this a bad idea?

Hi , your first question …

this means metrics are dropped … and the buffer should be increased

  ## For failed writes, telegraf will cache metric_buffer_limit metrics for each
  ## output, and will flush this buffer on a successful write. Oldest metrics
  ## are dropped first when this buffer fills.
  ## This buffer only fills when writes fail to output plugin(s).
  metric_buffer_limit = 10000