Help understanding BarrierNode

mkarlesky · April 5, 2019, 9:37pm

The following is the example from the BarrierNode documentation with a slight change — here the final chaining method is count() instead of top().

stream
  |from()
    .measurement('cpu')
  |barrier()
    .idle(5s)
  |window()
    .period(10s)
    .every(5s)
  |count('value')

I wrote a TICKScript that includes functionality nearly identical to the above. In my use case new data was arriving roughly every 15 seconds while idle() and every() were each set to 15 seconds.

After going into production, every so often, the script would produce an odd result. Specifically, the result of count() would be zero even though data was, in fact, present for the configured window. By process of elimination we were able to point to the TICKScript as being the source of the problem.

On a hunch, I modified the barrier so that its idle() time was a bit longer than the window’s every() time instead of being equal. Upon making that change the problem stopped. I’ve seen no further incidents for days.

I’d like to understand the interaction of the BarrierNode and WindowNode in this example. My hypothesis is that random variation in the arrival of new data points and the equivalent timing of the two nodes combined to create a situation where it appeared that no new data was present. That is, I think the WindowNode’s query was repeatedly just missing the arrival of new data and the BarrierNode was repeatedly filling the window with empty results; every so often this would happen a sufficient number of times so that it appeared the entire window was empty.

Is my hypothesis correct? That the timing was just on the edge and the result was an unstable window of results?

Update

I’ve pretty much satisfied myself that indeed the timing of the barrier and that of our incoming data was interacting in such a way to infrequently cause this odd issue. In short, the lesson learned is that a BarrierNode’s idle() time needs to be longer than the real time between incoming data points. In our case some slight delays beyond the ideal time between data points was leading to our issue. Lengthening the BarrierNode’s idle() fixed the problem.

That said, I’d still like to understand the interaction of the BarrierNode and the WindowNode. Since data was, in fact, coming in all along, I would expect the WindowNode to see real data that had been stored even though the barriers had been triggered. That is, even in our oddball case where multiple successive barriers fired over several minutes I would expect the window query to include both real data that had come in as well as the empty points injected by the barriers. How is it that the barriers were starving the window query of real data even though real data had been coming in all along?

Topic		Replies	Views
[Kapacitor] [TICKscript] Why would I want overlapping data in a stream? Telegraf kapacitor	3	719	June 6, 2018
Kapacitor WindowNode behavior kapacitor	2	422	January 5, 2021
Sanity check simple TICKscript Kapacitor kapacitor	2	1123	November 17, 2017
Kapacitor WindowNode - configure based on stream data?	4	467	November 23, 2020
Kapacitor stats node error Kapacitor kapacitor	3	553	January 17, 2020

Help understanding BarrierNode

Related Topics