-
Yes there are ways to isolate inputs. The most effective approach is to run multiple Telegraf instances. This is a common practice for handling different types of metrics or when dealing with potential performance issues.
-
Yes,running each input in a separate Telegraf instance would help fix the issue. This is supported by the documentation and is considered a best practice for certain scenarios. see this example: telegraf/plugins/inputs/vsphere/README.md at master · influxdata/telegraf · GitHub
-
Several optimizations can help:
-
Adjust timeouts: Your current timeout is 60s, which might not be enough for some API calls.
-
Collection jitter: Adding collection jitter can help distribute the load: collection_jitter = “30s”
-
Increase interval: If 5m isn’t enough for your API calls to complete, consider increasing it.
-
Split configurations: As mentioned in a community post, splitting configurations across multiple Telegraf instances is a common solution: Multiple Agents SNMP
-
When a plugin’s collection takes longer than its interval, it can disrupt metric collection and result in missed samples. This appears to be what you’re experiencing. A similar issue was reported with the inputs.http plugin where cookie authentication failures would crash the entire agent: Inputs.http cookie failure
-
Telegraf does support parallel execution through multiple plugin instances, but there are limitations. Each plugin instance runs independently, but they all share the same process resources.
So to summarize, try the following:
-
Split your configurations into separate Telegraf instances, each handling a subset of your API endpoints.
-
Adjust timeouts and intervals based on the actual response times of your endpoints.
-
Consider using collection jitter to distribute the load.
-
Monitor each instance’s performance using Telegraf’s internal metrics.