Kapacitor ETL help please

kapacitor
#1

Hi,

I am new to InfluxDB and am trying to use Kapacitor for some ETL.

This is what I have in InfluxDB. The data represents the 99th percentile for latency for two http endpoints (services A and B) at 1 minute intervals.

The time field is an InfluxDB timestamp but I have used the notation H{X}M{Y} to represent hour X, minute Y.

time    service    p99
H1M1    A          289
H1M2    B          177
H2M1    A          134
H2M2    A          78
H2M3    B          123

What I want to do is transform this data to calculate an overall service level percentage. For this example, I will assume the service level percentage is required each hour.

Lets say the service level objectives for each service are as follows. i.e. p99 for service A must be less than 200 and for service B p99 must be less than 150.

service    value
A          200
B          150

Step 1: Determine if the service meets the service level objective. (I am using 0 to represent a failure below)

time    service    p99    ok
H1M1    A          289    0
H1M2    B          107    1
H2M1    A          134    1
H2M2    A          78     1
H2M3    B          163    0

Step 2: Calculate the % of minutes for each hour that meet the service level objective.

time    percentage
H1      50
H2      66.67

I want to produce these numbers every hour so running in batch every hour to calculate the previous hour is fine.

I have tried writing the transformation in Kapacitor but had some difficulty working out how to store the service levels and then also deriving the final percentage by hour.

Any guidance greatly appreciated.

#2

@davoad There are a few ways to go about this I’ll walk through two.

The simplest way you be to create a template task and set the service level as a var in the task. Then run a separate task for each service.

Another way would be to generate a matching stream of the service levels for each stream an join them. This approach is more flexible but requires that you constantly send the service levels to Kapacitor.

Let’s look at the first method:

var level float
var service string

var data = batch
 |query('SELECT p99 FROM ... WHERE service = \'' + service + '\'' )
    .every(1h)
    .period(1h)

var count = data
  |count('p99')
     .as('value')

var up = data
  |where(lambda: "p99" < level)
  |count('p99')
     .as('value')


up
  |join(count)
     .as('up','count')
  |eval(lambda: float("up.value") /  float("count.value"))
     .as('up_percentage')
  |log()

Using the above TICkscript you can define a template and then instance that template for each service.
See https://docs.influxdata.com/kapacitor/v1.2/examples/template_tasks/ for basics of using templates

The other option requires that you write to a database every so often what the current service level is and join on that data to compare dynamically to the level.

A task for that would look something like this.

var levels = batch
 |query('SELECT level as value FROM ...' )
    .every(1h)
    .period(1h)
    .align()
    .groupBy('service')

var p99 = batch
 |query('SELECT p99 as value FROM ...' )
    .every(1h)
    .period(1h)
    .align()
    .groupBy('service')

var data = levels
  |join(p99)
   .as('levels', 'p99')


var count = data
  |count('p99.value')
     .as('value')

var up = data
  |where(lambda: "p99.value" < "levels.value")
  |count('p99.value')
     .as('value')
  

up
  |join(count)
     .as('up','count')
  |eval(lambda: float("up.value") /  float("count.value"))
     .as('up_percentage')
  |log()

Either approach should work, if you have more than dozens of services it is probably better to use the later approach as it means less tasks to manage.

#3

Thanks very much. That helps a lot.

The other thing I have been struggling with is ensuring there is no data loss. If my Kapacitor instance goes down, is there any way to resume this task from a last known timestamp so that there is no data loss?

#4

Not dynamically, but if you use the replay-live features of Kapacitor it is easy to instruct a task to rerun on past data.

For example if the above task fails at 1 PM and the current time is 4 PM then tell Kapacitor to replay the data live for the past 5 hours to be safe

kapacitor replay-live batch -task TASK_ID -past 5h -rec-time

This can also be performed via the API so you could easily script it for your needs.