Telegraf logs errors while metrics are successfully written in InfluxDB

Telegraf log:

2019-11-17T17:00:10Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics
2019-11-17T17:00:15Z E! [outputs.influxdb] when writing to [http://influxdb:8086]: Post http://influxdb:8086/write?db=opench: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-11-17T17:00:15Z D! [outputs.influxdb] Buffer fullness: 3 / 10000 metrics
2019-11-17T17:00:15Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-11-17T17:00:20Z D! [outputs.file] Wrote batch of 2 metrics in 138.445µs
2019-11-17T17:00:20Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics
2019-11-17T17:00:25Z E! [outputs.influxdb] when writing to [http://influxdb:8086]: Post http://influxdb:8086/write?db=opench: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-11-17T17:00:25Z D! [outputs.influxdb] Buffer fullness: 3 / 10000 metrics
2019-11-17T17:00:25Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-11-17T17:00:30Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics
2019-11-17T17:00:35Z E! [outputs.influxdb] when writing to [http://influxdb:8086]: Post http://influxdb:8086/write?db=opench: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
...
2019-11-17T17:19:05Z D! [outputs.influxdb] Buffer fullness: 6 / 10000 metrics
2019-11-17T17:19:05Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-11-17T17:19:10Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics
2019-11-17T17:19:15Z E! [outputs.influxdb] when writing to [http://influxdb:8086]: Post http://influxdb:8086/write?db=opench: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
...

Telegraf and InfluxDB started by this docker-compose:

version: '3'

services:

  influxdb:
    image: influxdb:latest
    container_name: opench-influxdb
    restart: always
    volumes:
      - ./influxdb/influxdb.conf:/etc/influxdb/influxdb.conf:ro
      - opench-influxdb-volume:/var/lib/influxdb
    networks:
      - opench

  telegraf:
    image: telegraf:latest
    container_name: opench-telegraf
    restart: always
    links:
      - influxdb
    volumes:
      - ./telegraf/telegraf.conf:/etc/telegraf/telegraf.conf:ro
    networks:
      - opench

networks:
  opench:
    driver: bridge

volumes:
  opench-influxdb-volume:
    external: true

Metrics are written in InfluxDB - I can successfully query them, I can ping in both directions in both telegraf and influxdb. But telegraf timeouts when posting the metrics.

pi@raspberrypi:~/apps/opench/docker $ docker exec -it opench-telegraf /bin/bash
root@317aa174c835:/# telegraf --version
Telegraf 1.12.4 (git: HEAD 9a4f08e9)

root@317aa174c835:/# curl -sl -I http://influxdb:8086/ping
HTTP/1.1 204 No Content
Content-Type: application/json
Request-Id: dd4e9e42-096e-11ea-8346-0242ac140002
X-Influxdb-Build: OSS
X-Influxdb-Version: 1.7.8
X-Request-Id: dd4e9e42-096e-11ea-8346-0242ac140002
Date: Sun, 17 Nov 2019 19:17:15 GMT

root@317aa174c835:/# ping influxdb
PING influxdb (172.20.0.2) 56(84) bytes of data.
64 bytes from opench-influxdb.docker_opench (172.20.0.2): icmp_seq=1 ttl=64 time=0.312 ms
64 bytes from opench-influxdb.docker_opench (172.20.0.2): icmp_seq=2 ttl=64 time=0.223 ms
64 bytes from opench-influxdb.docker_opench (172.20.0.2): icmp_seq=3 ttl=64 time=0.228 ms
64 bytes from opench-influxdb.docker_opench (172.20.0.2): icmp_seq=4 ttl=64 time=0.227 ms

Strange, what could it be?

Hello @Yuriy_Olkhovyy,
Thanks for your question. So far I’ve only found this issue related to yours. Are you using 1.7.5?
Alternatively, what is your telegraf config look like? Can you share it for debugging?
You might want to set timeout = "60s" or some time longer than the default or what you have.

Telegraf and InfluxDB versions:

root@244eceb833d5:/# telegraf --version
Telegraf 1.12.4 (git: HEAD 9a4f08e9)
    ...
root@3e97f8b185d1:/# influx --version
InfluxDB shell version: 1.7.8

Increased timeout to 60s in telegraf, error changed to the following (metrics still successfully written in influxdb):

2019-11-23T16:02:27Z E! [outputs.influxdb] when writing to [http://influxdb:8086]: 500 Internal Server Error: timeout
2019-11-23T16:02:27Z D! [outputs.influxdb] Buffer fullness: 3 / 10000 metrics
2019-11-23T16:02:27Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-11-23T16:03:09Z D! [outputs.influxdb] Wrote batch of 3 metrics in 7.00731664s
2019-11-23T16:03:09Z D! [outputs.influxdb] Buffer fullness: 0 / 10000 metrics

Writing a batch of 3 metrics in 7 seconds sounds a bit long. And the timestamps of the log entries differ by 32 seconds.

Attached telegraf.conf here

Hello @Yuriy_Olkhovyy,
I don’t think you successfully manage to share the telegraf config. Could you please try again? Wait–you’re successfully writing points despite the 500? That’s odd. And you can ping the containers? What do the influxdb logs look like?

telegraf.conf.tar.gz (46.8 KB)

Sorry for delay, just uploaded my telegraf.conf

@Yuriy_Olkhovyy, Perhaps this is a storage system issue? Slow SD card tying up IO operations on the InfluxDB side? Can you look at the logs from the DB? Maybe the DB container is in a restart loop?

Thanks for the config. It looks good to me.

storage system issue

You are probably right - the setup is quite resource-limited. It’s a Raspberry PI 4 with 4GB RAM, whole system on an SD card of 4GB, and a 120GB HDD for the data. The important thing to mention that the SD card was about 70% full in the beginning, but at the end the card got 100% full (logs?) and the system stopped doing any work. It boots, containers with Influx stack start, but no work observed anymore. There is still plenty of space on the 120GB disk though.

I will replace the SD card with a bigger one and will let you know the result. This will take some time - I expect I’ll have time to do that only after Christmas.

Thanks a lot for your comments.