I’ve got a more or less constant stream of data coming in to a measurement (“loadbalancing_member_events”) that I’m picking and choosing from, grouping by a couple dimensions, averaging across a particular dimension, and storing back into Influx as a new measurement. I’m frequently getting gaps in the output, up to a dozen times an hour. Here’s the relevant part of the TICKscript:
var flaps = stream |from() .database('environment') .retentionPolicy('autogen') .measurement('loadbalancing_member_events') // use only down events (going down then up is a full flap; don't count halves) .where(lambda: "event" == 'readiness change' AND "transition" == 'down') |delete() .field('text') .tag('transition') |window() .period(5m) .every(10s) |groupBy(['pool', 'reporting_lb']) |count('event') |groupBy(['pool']) |mean('count') // flaps|httpOut('flaps') flaps |influxDBOut() .create() .database('environment') .retentionPolicy('autogen') .measurement('loadbalancing_flap_count_mean_across_lbs')
And here’s a visualization showing the gaps:
And here’s using the CLT, looking at a particular pool:
> select * from loadbalancing_flap_count_mean_across_lbs where pool = '/Common/pool_one' and time >= '2017-11-08 08:19:30' and time < '2017-11-08 08:20:30' name: loadbalancing_flap_count_mean_across_lbs time mean pool ---- ---- ---- 2017-11-08T08:19:39Z 18.8 /Common/pool_one 2017-11-08T08:19:49Z 18.6 /Common/pool_one 2017-11-08T08:19:59Z 16.8 /Common/pool_one 2017-11-08T08:20:10Z 16 /Common/pool_one 2017-11-08T08:20:20Z 14.8 /Common/pool_one
Oh, wait… Now I see it. Ha.
Looks like there’s some kind of slowly accreting lag going on that just pushed us over a threshold. Viewing without
-precision rfc3339 on the CLT, I can see the points are roughly as close as always, but just incremented over into the next unit (ms -> s), and so the visualization is acting wonky.
Would align() solve this?