Hey there! I’m using kapacitor to perform some operations on two timeseries and I want to combine them to one new series.
First I took a look at this tutorial: Calculate rates across joined series + backfill | Kapacitor 1.5 Documentation
After I’ve done that to create a simple added series, I had a look at the documentation and tried to understand the different nodes and possibilities on how to do this. After a while I got confused on the behavior of the nodes and what the timestamps have to do with this. I hope someone can help me understand this.
Background: I’ve two series with a period of around 15 minutes emitting new points each. So at most two points that should be considered the same are at a 7.5 minutes distance to each other.
I got it to work with the example using a windows node like this:
...
|window()
.period(15m)
.every(1m)
.align()
|last('myfield')
series1|join(series2)
.as('s1', 's2')
...
As I understand this the window is configured to span e.g. from 10:15-10:30 due to the .align() property. The window spans 15 minutes and is emitted every 1 minute. After that, the last point in this 15 minute window is selected. As I understand the join node correctly, it should merge two incoming points, if they are considered the same by timestamp. Since it is already working this way, I’m wondering how the join node knows when to consider the points the same. In my example the points are 3 seconds away from each other in general. It could be 3 minutes though. I’m getting the idea, that the last()-node might be altering the timestamps? Or is it the window messing with the timestamps, such that, when they arrive at the join node, both points have the same timestamp? But how are the two streams synchronizing about that?
Because of this confusion I tried to build a joined stream with the documentation as I thought it would make sense.
...
|from()
...
stream1|join(stream2)
.tolerance(8m)
...
So in this approach I just used two plain streams and joined them by tolerance. As the most distance between two points is 7.5 minutes (don’t compute a point if a source provides no point) I thought this should be equivalent, such that every 8 minutes a point in the new series should be computed out of every incoming point of the two originating series. Unfortunately this works for several minutes, hours or even days but eventually stops emtting points while the first approach works for several weeks now.
I’m asking myself, why is the first approach working, when based on the documentation no timestamp should be modified and hence the join node should never consider two points the same, while the second approach does not always work when the tolerance() property suggests to be the exact right tool for this job.