I am new to InfluxDB and am trying to use Kapacitor for some ETL.
This is what I have in InfluxDB. The data represents the 99th percentile for latency for two http endpoints (services A and B) at 1 minute intervals.
The time field is an InfluxDB timestamp but I have used the notation H{X}M{Y} to represent hour X, minute Y.
time service p99
H1M1 A 289
H1M2 B 177
H2M1 A 134
H2M2 A 78
H2M3 B 123
What I want to do is transform this data to calculate an overall service level percentage. For this example, I will assume the service level percentage is required each hour.
Lets say the service level objectives for each service are as follows. i.e. p99 for service A must be less than 200 and for service B p99 must be less than 150.
service value
A 200
B 150
Step 1: Determine if the service meets the service level objective. (I am using 0 to represent a failure below)
time service p99 ok
H1M1 A 289 0
H1M2 B 107 1
H2M1 A 134 1
H2M2 A 78 1
H2M3 B 163 0
Step 2: Calculate the % of minutes for each hour that meet the service level objective.
time percentage
H1 50
H2 66.67
I want to produce these numbers every hour so running in batch every hour to calculate the previous hour is fine.
I have tried writing the transformation in Kapacitor but had some difficulty working out how to store the service levels and then also deriving the final percentage by hour.
@davoad There are a few ways to go about this I’ll walk through two.
The simplest way you be to create a template task and set the service level as a var in the task. Then run a separate task for each service.
Another way would be to generate a matching stream of the service levels for each stream an join them. This approach is more flexible but requires that you constantly send the service levels to Kapacitor.
Let’s look at the first method:
var level float
var service string
var data = batch
|query('SELECT p99 FROM ... WHERE service = \'' + service + '\'' )
.every(1h)
.period(1h)
var count = data
|count('p99')
.as('value')
var up = data
|where(lambda: "p99" < level)
|count('p99')
.as('value')
up
|join(count)
.as('up','count')
|eval(lambda: float("up.value") / float("count.value"))
.as('up_percentage')
|log()
The other option requires that you write to a database every so often what the current service level is and join on that data to compare dynamically to the level.
A task for that would look something like this.
var levels = batch
|query('SELECT level as value FROM ...' )
.every(1h)
.period(1h)
.align()
.groupBy('service')
var p99 = batch
|query('SELECT p99 as value FROM ...' )
.every(1h)
.period(1h)
.align()
.groupBy('service')
var data = levels
|join(p99)
.as('levels', 'p99')
var count = data
|count('p99.value')
.as('value')
var up = data
|where(lambda: "p99.value" < "levels.value")
|count('p99.value')
.as('value')
up
|join(count)
.as('up','count')
|eval(lambda: float("up.value") / float("count.value"))
.as('up_percentage')
|log()
Either approach should work, if you have more than dozens of services it is probably better to use the later approach as it means less tasks to manage.
The other thing I have been struggling with is ensuring there is no data loss. If my Kapacitor instance goes down, is there any way to resume this task from a last known timestamp so that there is no data loss?