Hi everyone,
I’m using the tail input plugin with the grok parser to parse application logs at scale. I’ve confirmed that when log generation
exceeds the parsing throughput, Telegraf continues to parse but falls behind — meaning metrics are produced with increasing delay.
My question is: is there an internal metric or built-in mechanism to detect this kind of parsing lag?
Here’s what I’ve already looked into:
- inputs.internal: I checked metrics like gather_errors, gather_timeouts, gather_time_ns, buffer_size, and metrics_dropped. These are
useful for detecting collection failures or buffer overflow, but they don’t seem to indicate whether the tail plugin is falling behind
the log file writes. Even when parsing is delayed, there are no errors or timeouts — it’s just silently lagging. - Tail plugin source code: I found that the plugin tracks file offsets internally (t.offsets), but this value is not exposed as a
metric. If it were, comparing the current read offset against the file size would be a straightforward way to measure the gap. - Community & GitHub Issues: I searched through existing discussions and issues but couldn’t find an established solution for this
specific problem.
I understand I might be missing something obvious — if so, I apologize for the redundant question!
If there’s no built-in way, I’d appreciate any suggestions on how to approach this. For example:
- Is there a way to expose the tail plugin’s read offset as a metric?
- Has anyone implemented a custom solution for monitoring this kind of lag?
- Would a feature request for a “bytes behind” or “read offset” metric be welcome?
Any guidance would be greatly appreciated. Thanks!
