How to manage auto scaling groups?


I am investigating to deploy the full TICK platform, mostly comparing it with a Prometheus based solution.

Something I found really easy to configure with Prometheus was to setup auto discovery on our Google Cloud project. I didn’t find very explicit documentation on how to do it on Kapacitor, I just found this page: Scraping and discovery | Kapacitor 1.5 Documentation

The idea would be that it would not alert when instances are deleted from an autoscaling group, as it will be done and not push any new metrics. I don’t know if this would be possible ? How would you do this ?


So I’ve been taking a deeper look at the page.
If I understand right, in this case, the data is not pushed to InfluxDB as usual but Kapacitor will pull the data from the discovered nodes, and only using Prometheus compatible exporters ?

Then I suppose to be able to reuse the dashboards and everything I would have to use this outputs on the telegraf installed on my instances in the autoscaling group ?

Am I understanding right ? Then what would happen when the node is destroyed by autoscaler ?
Is there any example of TICKscript for this kind of application ?


I’ll leave the answer to your Kapacitor-specific questions to someone who is more familiar with that version of Prometheus metrics collection and auto-discovery. However, from another angle, have you looked into using Telegraf for this use case instead of Kapacitor, altogether?

Telegraf has a plugin for this and can be run as a Daemonset in a pod. While it doesn’t technically “auto-discover” (yet…that’s scheduled for the next feature release of Telegraf, I believe), it can run as part of every pod and configured to pull metrics from each service/container in that pod. If run that way, it will act as auto-discovery. The benefit of this is multi-faceted:

  1. This is a push-based solution so you can feasibly gather the metrics faster.

  2. It applies to more than just Prometheus metrics. If you are running other services in these pods, Telegraf can pull those metrics the same way.

  3. Telegraf is a framework for best practices with regard to InfluxDB. It has built-in features to optimize the way the data is pushed to the database such as sorting tags, retries, configurable polling intervals and flush intervals, aggregations, etc.