Telegraf in Docker freezes on "Loading Config" - but ONLY when influxdb_v2 "token" parameter is specified

I’m trying to run telegraf in a Docker container, where Docker is running inside a systemd-nspawn “jail” on the TrueNAS platform.

I’m running into a bizarre behaviour where telegraf freezes on startup with the “Loading Config” message in the logs - but only if the “token” parameter of the outputs.influxdb_v2 plugin is specified.

Docker config:

version: "3.8"
services:
  telegraf:
    image: telegraf:latest
    container_name: telegraf_test
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /hostconfig/telegraf.conf.condensed:/etc/telegraf/telegraf.conf:ro
    ports:
      - 8125:8125
    restart: unless-stopped

Telegraf config for testing:

[global_tags]

[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = "0s"
  debug = true
  quiet = false
  logtarget = "stderr"
  hostname = "ceres"
  omit_hostname = false

[[outputs.influxdb_v2]]
  urls = ["http://172.16.1.97:8086"]
  token = "asdfadfasdfasdf_mytoken=="
  organization = "myorg"
  bucket = "mybucket"

[[inputs.mem]]

When I try and run this config, I get up to this point in the logs, then I see nothing else:

2024-05-12T17:23:37Z I! Loading config: /etc/telegraf/telegraf.conf

After a LOT of random troubleshooting, I found when I comment out the “token” line of the influxdb_v2 config, the logs start moving again, although it fails predictably not being able to authenticate with influxdb.

12/2024
11:30:40 AM
2024-05-12T17:30:40Z I! Loading config: /etc/telegraf/telegraf.conf
05/12/2024
11:30:40 AM
2024-05-12T17:30:40Z I! Starting Telegraf 1.30.2 brought to you by InfluxData the makers of InfluxDB
05/12/2024
11:30:40 AM
2024-05-12T17:30:40Z I! Available plugins: 233 inputs, 9 aggregators, 31 processors, 24 parsers, 60 outputs, 6 secret-stores
05/12/2024
11:30:40 AM
2024-05-12T17:30:40Z I! Loaded inputs: mem
05/12/2024
11:30:40 AM
2024-05-12T17:30:40Z I! Loaded aggregators: 
05/12/2024
11:30:40 AM
2024-05-12T17:30:40Z I! Loaded processors: 
05/12/2024
11:30:40 AM
2024-05-12T17:30:40Z I! Loaded secretstores: 
05/12/2024
11:30:40 AM
2024-05-12T17:30:40Z I! Loaded outputs: influxdb_v2
05/12/2024
11:30:40 AM
2024-05-12T17:30:40Z I! Tags enabled: host=ceres
05/12/2024
11:30:40 AM
2024-05-12T17:30:40Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"ceres", Flush Interval:10s
05/12/2024
11:30:40 AM
2024-05-12T17:30:40Z D! [agent] Initializing plugins
05/12/2024
11:30:40 AM
2024-05-12T17:30:40Z D! [agent] Connecting outputs
05/12/2024
11:30:40 AM
2024-05-12T17:30:40Z D! [agent] Attempting connection to [outputs.influxdb_v2]
05/12/2024
11:30:40 AM
2024-05-12T17:30:40Z D! [agent] Successfully connected to outputs.influxdb_v2
05/12/2024
11:30:40 AM
2024-05-12T17:30:40Z D! [agent] Starting service inputs
05/12/2024
11:30:50 AM
2024-05-12T17:30:50Z E! [outputs.influxdb_v2] When writing to [http://172.16.1.97:8086]: failed to write metric to truenas (401 Unauthorized): unauthorized: unauthorized access
05/12/2024
11:30:50 AM
2024-05-12T17:30:50Z D! [outputs.influxdb_v2] Buffer fullness: 1 / 10000 metrics
05/12/2024
11:30:50 AM
2024-05-12T17:30:50Z E! [agent] Error writing to outputs.influxdb_v2: failed to send metrics to any configured server(s)

I am running InfluxDB v2.7.6 and grabbing telegraf:latest from dockerhub.

Any ideas? I feel like I’ve searched to the bottom of the internet on this one and can’t find a similar issue anywhere. Thank you!

Edit: Just for fun, I tried creating an all-access token in influx and giving it to this telegraf instance; no change in behaviour described above.

One more datapoint: I am migrating this instance of telegraf from a Ubuntu VM running Docker to a systemd “jail” running Docker. The config file I started with, and have pared down for testing, works on the Ubuntu VM but not inside this jail.

The first thing that comes to mind is that one of the libraries we use requires lockable memory and so if you are running a jail, you need to be sure that allow.molock = 1; is in your config.

2 Likes

That’s it! Thank you so much. I added --capability=CAP_IPC_LOCK to my systemd_nspawn_user_args and immediately got things up and running. Still fiddling with the finer configuration and host pass-through setup, but this was the big obstacle.

Linking my Jailmaker Github thread to leave others a trail of breadcrumbs…

1 Like