Timeout in database inputs

Hi
First of all, my goal:
To measure latency for multiple services running on different ports on a server.

I have setup an influxdb and grafana to be able to monitor the latency on some services.
This already works fine with some different inputs from other telegraf agents.

In this case i am having trouble with some missing datapoints. I had a look in the database and i see some timeout where i would see data.

This only happens when i add services to measure, either in different files or as another input.
The logfile is not giving me alot to work with other than this i.e:

2018-06-27T10:02:44Z D! Output [influxdb] buffer fullness: 20 / 10000 metrics.
2018-06-27T10:02:44Z D! Output [influxdb] buffer fullness: 20 / 10000 metrics.
2018-06-27T10:02:45Z D! Output [influxdb] wrote batch of 20 metrics in 27.392525m
2018-06-27T10:02:45Z D! Output [influxdb] wrote batch of 20 metrics in 26.96671ms
2018-06-27T10:02:47Z D! Output [influxdb] buffer fullness: 29 / 10000 metrics.
2018-06-27T10:02:47Z D! Output [influxdb] buffer fullness: 29 / 10000 metrics.
2018-06-27T10:02:47Z D! Output [influxdb] wrote batch of 29 metrics in 39.53634ms
2018-06-27T10:02:47Z D! Output [influxdb] wrote batch of 29 metrics in 39.476422m
2018-06-27T10:02:49Z D! Output [influxdb] buffer fullness: 22 / 10000 metrics.
2018-06-27T10:02:49Z D! Output [influxdb] buffer fullness: 22 / 10000 metrics.
2018-06-27T10:02:49Z D! Output [influxdb] wrote batch of 22 metrics in 12.949624m
2018-06-27T10:02:49Z D! Output [influxdb] wrote batch of 22 metrics in 12.613302m
2018-06-27T10:02:50Z D! Output [influxdb] buffer fullness: 15 / 10000 metrics.
2018-06-27T10:02:50Z D! Output [influxdb] buffer fullness: 15 / 10000 metrics.
2018-06-27T10:02:50Z D! Output [influxdb] wrote batch of 15 metrics in 28.296731m
2018-06-27T10:02:50Z D! Output [influxdb] wrote batch of 15 metrics in 28.397066m
2018-06-27T10:02:52Z D! Output [influxdb] buffer fullness: 17 / 10000 metrics.
2018-06-27T10:02:52Z D! Output [influxdb] buffer fullness: 17 / 10000 metrics.
2018-06-27T10:02:52Z D! Output [influxdb] wrote batch of 17 metrics in 29.561376m
2018-06-27T10:02:52Z D! Output [influxdb] wrote batch of 17 metrics in 30.117717m
2018-06-27T10:02:54Z D! Output [influxdb] buffer fullness: 29 / 10000 metrics.
2018-06-27T10:02:54Z D! Output [influxdb] buffer fullness: 29 / 10000 metrics.
2018-06-27T10:02:54Z D! Output [influxdb] wrote batch of 29 metrics in 9.646194ms
2018-06-27T10:02:54Z D! Output [influxdb] wrote batch of 29 metrics in 9.777617ms

I am using the following config for this:
[[inputs.net_response]]
protocol = “tcp”
address = “hostname:30039”

But see this in the database:
1530093119000000000 30039 tcp timeout 1 timeout hostname
1530093120000000000 30048 tcp 0.008535873 success 0 success hostname
1530093121000000000 30039 tcp 0.008421727 success 0 success hostname
1530093122000000000 30048 tcp 0.008362516 success 0 success hostname
1530093123000000000 30039 tcp 0.00874807 success 0 success hostname
1530093124000000000 30048 tcp 0.008439371 success 0 success hostname
1530093125000000000 30039 tcp 0.008446039 success 0 success hostname
1530093126000000000 30039 tcp 0.008537049 success 0 success hostname
1530093128000000000 30048 tcp timeout 1 timeout hostname
1530093129000000000 30048 tcp 0.008358559 success 0 success hostname
1530093130000000000 30039 tcp timeout 1 timeout hostname
1530093130000000000 30039 tcp 0.008589568 success 0 success hostname
1530093131000000000 30048 tcp timeout 1 timeout hostname
1530093132000000000 30048 tcp 0.008545831 success 0 success hostname

I am not really sure what the reason is, as the server is not busy at all.
Can you advice where to look to solve this?
Adding config just for the sake of it:

[global_tags]
[agent]
interval = “2s”
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = “1s”
flush_interval = “2s”
flush_jitter = “1s”
precision = “”
debug = true
quiet = false
logfile = “/var/log/telegraf/30039.log”
hostname = “game30039”
omit_hostname = true

[[outputs.influxdb]]
urls = [“http://influxhost:8086”]
database = “mydata”

[[inputs.net_response]]
protocol = “tcp”
address = “hostname:30039”

[[inputs.net_response]]
protocol = “tcp”
address = “hostname:30048”

#[[inputs.net_response]]

protocol = “tcp”

address = “hostname:30051”

#[[inputs.net_response]]

protocol = “tcp”

address = “hostname:30054”

#[[inputs.net_response]]

protocol = “tcp”

address = “hostname:30057”

It would be delightful if someone had maybe a part of an answer :slight_smile:
Christopher

This is not a timeout in the database inputs, this is a timeout from the service you are monitoring.

The data you shared is the result of the net_response plugin.

1530093119000000000 30039 tcp timeout 1 timeout hostname
1530093120000000000 30048 tcp 0.008535873 success 0 success hostname

The first data point is a timeout, which means that Telegraf was unable to reach the service you are trying to contact. It would make sense to get these kinds of timeouts when you bring up a new service, since it might not be immediately available to serve traffic.

The second data point is a success, which means that Telegraf was able to complete its net_response check.

In both cases, data was successfully written to InfluxDB without issue, and there are no errors in the Telegraf logs you shared.

The default timeout on net_response is one second, perhaps set it a bit higher and see if the timeout still occurs:

[[inputs.net_response]]
  protocol = “tcp”
  address = “hostname:30039”
  timeout = "5s"