Kapacitor - first impression

kapacitor
#1

Kapacitor has a very powerful engine for analyzing and alerting based on time-series data. I was impressed with ease of setup and after initial struggle could create some basic alerts.

Struggle stemmed from complicated documentation, which was listing available nodes alphabetically, not providing enough description and examples. Chaining methods showing up on every page complicate documentation even further.

Chronograf helped to create some basic alerts, but hid the power of Kapacitor under the hood. TICK script editor was not very helpful as well. It was lacking intellisense and syntax highlighting.

I think, the most power comes from alerting based on stream of incoming data. However, these tasks are hard to setup:

  1. There is no compatibility with batch based alerting. We want to alert in real time, but want to compare against baseline from the database. Right now, we would have to implement side process to query and cache historic data. (I am not even talking that the functionality of UDFs is dependent on unix-specific sockets.)

  2. There is no way to test the stream-based task, except for waiting for alert to get triggered. It would be very, very, very helpful if there was an option to replay historic data as if it was coming in real-time and generate the log of alerts that would be generated. My understanding is that the filtering from TICK script need to be translated into InfluxQL to fetch data ordered by time alone, or is there anything else I am missing?

After setting up Kapacitor and creating subscription, I was shocked that Kapacitor started receiving ALL data from InfluxQL. My expectation was that it would subscribe to only the data it needs. Kapacitor has implementation for filtering incoming data, so why not to reuse this code at InfluxDB?

While Kapacitor is lacking scalability, parsing all data incoming from the InfluxDB cluster on a single machine may be problematic. I heard, that the recommendation was to switch to batch-processing, which is not the option, as we have users who are used to getting alerts in real-time.

I was also thinking that the reason, Kapacitor gets data through InfluxDB, is that InfluxDB would serve as a message queue, accumulating incoming data in WAL until it gets processed by Kapacitor, but found it not to be the case. Also, it appears, that undelivered data doesn’t get resent. Just because we already have Kafka in front of InfluxDB clusters (to queue, handle multiple senders, and multiplex) we think about sending from it directly to Kapacitor. Although Kapacitor has to have a subscription, it may be set up on a dummy instance of InfluxDB.

Are there plans to implement the above functionality? Otherwise I may have to implement it myself, but would prefer to contribute in that case. Any guidance would be appreciated.

I would really like to see Kapacitor suit our use case. InfluxData has the great future. It’s just that we always want that future to be available right now. :thinking:

Thank you!

1 Like
#2

Thanks for the thoughtful feedback!

I agree that the docs on chaining methods is confusing. We have been making iterative progress on improving the docs. This feedback is helpful.

It seems that the batch vs stream question is the biggest concern. I few thoughts on that:

  • In most cases the latency difference between stream vs batch is negligible. Meaning that you can use a batch task and it will be almost as fast as the stream task. What specific use cases do you have where the small latency difference is an important concern?

  • As for Kapacitor subscribing to the entire stream of data there are several ways to manage this. You are right that batch tends to be more efficient since it only queries the data that it needs where as stream get the entire firehose of data. You can configure Kapacitor to only subscribe to certain databases if there are entire databases that you do not need to consume in Kapacitor.

  • Since you are already using Kafka in front of InfluxDB I would recommend that you have Kafka send the data to Kapacitor as well and avoid the double hop through InfluxDB. Kapacitor does not have to have a subscriptions, they can be turned off with the disable-subscriptions configuration option.

Finally:

Good news here. There is a way to replay historical data for stream tasks so long as that data is saved in InfluxDB. Both static datasets for testing purposes and live data for more adhoc analysis can be replayed. See this guide for creating a stream and replaying it against a task. Also see the usage text for kapacitor replay-live query

Again thanks, for the thorough feedback!