Docker telegraf mqtt msg into influxdb of Bulk (6000Hz)

Hi ,@Jay_Clifford ,
When my telegraf collects mqtt data at a frequency of 6000Hz / s, an abnormal chain break occurs in the telegraf :cold_face: :weary:
help me :pray:

2021-12-20T08:18:45Z E! [inputs.mqtt_consumer] Error in plugin: connection lost: pingresp not received, disconnecting


![image|690x95](upload://7ffSPHXE6aVGOcEheZ1w9gfIPR.png)

#################################################################################

################  telegraf.conf ####################

#################################################################################

# Configuration for telegraf agent
[agent]
  ## Default data collection interval for all inputs
  ## interval = "10s"
  ## Rounds collection interval to 'interval'
  ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
  ## round_interval = false

  ## Telegraf will send metrics to outputs in batches of at most
  ## metric_batch_size metrics.
  ## This controls the size of writes that Telegraf sends to output plugins.
  metric_batch_size = 1000

  ## Maximum number of unwritten metrics per output.  Increasing this value
  ## allows for longer periods of output downtime without dropping metrics at the
  ## cost of higher maximum memory usage.
  metric_buffer_limit = 10000

  ## Collection jitter is used to jitter the collection by a random amount.
  ## Each plugin will sleep for a random time within jitter before collecting.
  ## This can be used to avoid many plugins querying things like sysfs at the
  ## same time, which can have a measurable effect on the system.
  collection_jitter = "0s"

  ## Default flushing interval for all outputs. Maximum flush_interval will be
  ## flush_interval + flush_jitter
  flush_interval = "10s"
  ## Jitter the flush interval by a random amount. This is primarily to avoid
  ## large write spikes for users running a large number of telegraf instances.
  ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
  flush_jitter = "0s"

  ## By default or when set to "0s", precision will be set to the same
  ## timestamp order as the collection interval, with the maximum being 1s.
  ##   ie, when interval = "10s", precision will be "1s"
  ##       when interval = "250ms", precision will be "1ms"
  ## Precision will NOT be used for service inputs. It is up to each individual
  ## service input to set the timestamp at the appropriate precision.
  ## Valid time units are "ns", "us" (or "µs"), "ms", "s".
  precision = ""

  ## Log at debug level.
  debug = true
  ## Log only error level messages.
  ## quiet = true

Hi @loneWolf666,

First, can we get your full configuration? The config you left in the 3rd comment is only the agent settings. It would be good to see what other configuration settings you are using, especially for the inputs and ouptus.

Second, what type of system is hosting these systems? Is this a raspberry pi? laptop? How much memory and CPU does it have?

You have shown one error and one warning from Telegraf. First the error from the mqtt_consumer:

2021-12-20T08:18:45Z E! [inputs.mqtt_consumer] Error in plugin: connection lost: pingresp not received, disconnecting

This is reporting that the MQTT client Telegraf uses did not get a response from your MQTT server in the allowed time. If you continue reading the logs, you will see that the MQTT server disconnects and then Telegraf immediately tries to reconnect once again.

I am assuming you are hosting both the MQTT server, Telegraf, and InfluxDB all on the same system given the container status screenshot. Given your container load for the emqx-2.0, there is probably contention on your system for accessing resources. As a result, these disconnections and reconnects will continue to happen with such a heavily loaded system.

The warning you showed was from influxdb_v2:

did not complete within its flush interval

Assuming the flush interval is what the agent config shows, 10s, this means that the write to InfluxDB took longer than 10 seconds. Looking at your short log, I see some writes taking many seconds. Similar to the above issue with the MQTT server, this is again probably load-related if all three services are hosted on the same system.

Hope that helps!

Thanks!

Thank you for your support,jpowers,
I use a 64 g MEM, CPUs 40 server,
I also feel that it is the problem of the emqx node,

But I still want to ask you, is the flush interval the time interval between telegraf and influxdb? Not to prevent data from being stored in telegraf memory for a long time, but the designed timeout data does not meet the limit condition, which triggers the operation of clearing memory data :pray: :pray: :pray:

The flush interval is how often Telegraf will attempt to write to the specified outputs. In those case, yes it will attempt to write to InfluxDB every 10 seconds.

################  ERROR ###########
2022-01-11T01:39:04Z E! [inputs.mqtt_consumer] Error in plugin: invalid character 'N' looking for beginning of value
2022-01-11T01:39:04Z E! [inputs.mqtt_consumer] Error in plugin: invalid character 'N' looking for beginning of value
2022-01-11T01:39:06Z D! [outputs.influxdb_v2] Wrote batch of 2000 metrics in 70.25789ms
2022-01-11T01:39:06Z E! [inputs.mqtt_consumer] Error in plugin: connection lost: pingresp not received, disconnecting
2022-01-11T01:39:06Z D! [inputs.mqtt_consumer] Disconnected [tcp://172.29.60.10:1883]
2022-01-11T01:39:06Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 30000 metrics
2022-01-11T01:39:07Z D! [outputs.influxdb_v2] Wrote batch of 2000 metrics in 247.183766ms
2022-01-11T01:39:07Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 30000 metrics
2022-01-11T01:39:10Z D! [inputs.mqtt_consumer] Connecting [tcp://172.29.60.10:1883]
2022-01-11T01:39:10Z D! [inputs.mqtt_consumer] Connecting [tcp://172.29.60.10:1883]
2022-01-11T01:39:10Z I! [inputs.mqtt_consumer] Connected [tcp://172.29.60.10:1883]
2022-01-11T01:39:10Z I! [inputs.mqtt_consumer] Connected [tcp://172.29.60.10:1883]
2022-01-11T01:39:10Z D! [outputs.influxdb_v2] Wrote batch of 2000 metrics in 48.037592ms
2022-01-11T01:39:10Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 30000 metrics
2022-01-11T01:39:20Z D! [outputs.influxdb_v2] Wrote batch of 2000 metrics in 2.198730375s
2022-01-11T01:39:20Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 30000 metrics
2022-01-11T01:39:20Z E! [inputs.mqtt_consumer] Error in plugin: connection lost: pingresp not received, disconnecting
2022-01-11T01:39:20Z D! [inputs.mqtt_consumer] Disconnected [tcp://172.29.60.10:1883]
2022-01-11T01:39:30Z D! [inputs.mqtt_consumer] Connecting [tcp://172.29.60.10:1883]
2022-01-11T01:39:30Z I! [inputs.mqtt_consumer] Connected [tcp://172.29.60.10:1883]
2022-01-11T01:39:32Z D! [outputs.influxdb_v2] Wrote batch of 2000 metrics in 52.027944ms
2022-01-11T01:39:32Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 30000 metrics
2022-01-11T01:39:33Z D! [outputs.influxdb_v2] Wrote batch of 111 metrics in 33.689145ms
2022-01-11T01:39:33Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 30000 metrics
2022-01-11T01:39:33Z D! [outputs.influxdb_v2] Wrote batch of 2000 metrics in 108.839962ms
2022-01-11T01:39:33Z E! [inputs.mqtt_consumer] Error in plugin: invalid character 'N' looking for beginning of value
2022-01-11T01:39:33Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 30000 metrics
2022-01-11T01:39:33Z E! [inputs.mqtt_consumer] Error in plugin: invalid character 'N' looking for beginning of value
2022-01-11T01:39:33Z E! [inputs.mqtt_consumer] Error in plugin: invalid character 'N' looking for beginning of value
2022-01-11T01:39:33Z E! [inputs.mqtt_consumer] Error in plugin: invalid character 'N' looking for beginning of value
2022-01-11T01:39:33Z E! [inputs.mqtt_consumer] Error in plugin: invalid character 'N' looking for beginning of value

system_resource_art.conf.txt (1.6 KB)
telegraf.conf.txt (2.2 KB)
Hi, @jpowers
This is my configuration file, but the refresh time is modified and the chain is still broken. The newly discovered error today is

2022-01-11T01:39:33Z E! [inputs.mqtt_consumer] Error in plugin: invalid character 'N' looking for beginning of value

:pray: :pray: :pray: :pray:

Since you are using JSON data_format, this is most likly an invalid JSON message that Telegraf cannot parse. I would see what messages are coming out of your MQTT server and see if/why they are invalid.

Hi,@jpowers
“flush interval” I added the problem of 30s chain breaking, which is still not solved :pray: :pray: :pray: