I configured telegraf to collect system metrics and flushing them into three different outputs like elasticsearch, influxdb1, influxdb2 .
In case if the elasticsearch output has some issues like host is down or not reachable, I get an error in the telegraf logs and none of the other outputs (influxdb1, influxdb2) are getting data until I fix the elasticsearch output issues. - Is this the desirable behaviour of telegraf?
I would like to know is there a way to configure telegraf in such a way that it should flush the collected metrics to the outputs even if one of the configured outputs is having issues. Is it possible?
In general this is the default behavior of telegraf. There are a number of reasons for doing this. The scenario I like to give, is if your config file had say the wrong URL or password to your output, and telegraf ignored that and continued to attempt to run, how would you catch that? If the telegraf service continued to run as if nothing happened many users may not have noticed, allowed telegraf to not send metrics for days, months, and then would be surprised when they had lost many metrics.
There are exceptions to this and we are a happy to look at PRs or changes to let specific plugins to add an option to ignore failures, but there is no global option to allow for these types of failures.
As such, if this is something you are interested in seeing for elasticsearch, then I would recommend filing a feature request.
Thank you providing a detailed information. This is really helpful and I understood the reason why ignoring is not a good idea, I also ran into similar problems earlier.
But, I owe an explanation on why such ignore feature will be helpful. Let’s say we wanted to push the same metrics to more than one databases and one of the databases target is unreachable, if telegraf refuses to send metrics to none of the databases because one of the database host is unreachable now we will lose the metrics in all the database targets.
If telegraf would’ve ignored the one database target which is unreachable and sent the metrics to the output databases that are reachable, we would at-least get the metrics via the other databases available. I believe the users will get to know about the error on one of the failed outputs can be solved through the telegraf logs and correct it.
Not specific to elasticsearch output, I believe this will be a nice feature to have for any output plugin to ignore and continue to push to other available databases by using an ignore flag in the configuration block, not necessarily a global configuration option.
Of-course, I will be happy to fill a feature request for this ignore feature in output block of the configuration file for telegraf.
I believe the users will get to know about the error on one of the failed outputs can be solved through the telegraf logs and correct it.
While I agree that I would hope users would notice, it is far better to be explicit that something is wrong and there is a call to action, than to hope users notice.
I will be happy to fill a feature request for this ignore feature in output block of the configuration file for telegraf.
We are happy to look into this on a case-by-case basis. Meaning a feature request for a specific plugin would be something we could consider. A feature request to add this globally is not something we are entertaining.