@HansKe I think there may be a bit of a confusion going on here. I’ll do my best to clear it up.
Kapacitor has a DSL called Tickscript that is used to define tasks. Beneath tasks are a concept called a pipeline which is made up of individual pieces (nodes) that are connected together. A node can either take in (want) a batch or stream. Similarly it can output (provide) a batch or a stream. A batch is just one or more points grouped together. The type of data that a node wants and provides varies by use case. There are circumstances where you’d want a batch and provide a stream (computing the average or maximum for example), where you want a batch and provide a batch (finding outliers in a batch of data), where you’d want a stream and provide a batch (grouping together similar points), and where you want a stream and provide a stream (applying a mathematical function like the logarithm to a value in a point).
All of the nodes in Kapacitor follow this want/provide semantics. There are times when this gets a bit blurred (see influxQL node where some nodes accept both batches and streams).
As I’m sure you’re aware, in Kapacitor there are two types of tasks batch tasks and stream tasks. The two different types of tasks are the result of the two initial nodes in a pipeline: the ones that query InfluxDB (batch) and those that subscribe to all writes in InfluxDB (stream).
So then the larger question is well why do we need both? The answer really just comes down to the amount of memory you have available. Consider the following example where we want to take an average over the last weeks worth of data every day.
As a batch task:
|query('SELECT mean(usage_user) as usage_user FROM cpu')
In this case, data is stored in InfluxDB and each day Kapacitor queries the week of data and yields the result to the Kapacitor pipeline. This way InfluxDB can store the weeks worth of data on disk rather that in memory (as we’ll see in the next example).
As a stream task:
Here we week the entire weeks worth of data in memory. As users are usually writing thousands of individual points a second, over the course of a week this equates to billions and billions of points that will be sitting idly in memory for most of the time.
The question you might then have is well then why would I ever use a stream task. The answer to which is that for small time windows this isn’t so much of an issue and by using a stream instead of a batch, you’ve lowered the query load on InfluxDB.