Replay recording shows data 7x instead of 1x? / Too many subscriptions

kapacitor
#1

Hi folks.

Btw: I’d happily drop $100 per course if you published courses on:
- A deep dive into tick scripts including many examples, and how to debug tick scripts
- Basic math / stats (and other pre-req knowledge) needed to use any sort of alerting tools and especially the various math functions needed and that are useful to understand alerting in general

After spending roughly 40-60 hours digging into influx and kapacitor I have it mostly working, but there are a few things that behave differently than I expect. I’m here to banish my ignorance.

I have confusion / questions around these 4 points:

  1. In the context of a tick script the difference between cumulativeSum and Sum are confusing to me. It appears that cumSum sums up all the values for a given window (period). Does sum keep adding up all values until the warning level is reset? I haven’t been able to figure out why and how sum operates. I’m trying to alert on the number of box crashes during a window of time.

  2. For a batch script, can I create “buckets” of say 30min, and then query back 2 hours, and alert if any of the buckets exceed x number? Or is a batch only allowed to be 1 bucket, and I need to use streaming?

  3. Is there any way at all to shift the Time field in the message by timezone?

  4. [the main question in this post] I record a stream for 60s. Then I manually input these data points 1,4,3,3,25 into influxdb using the CLI. I play this back, with default options, and with the -real-clock flag. The logs show 7 entries for each of these values, and then the cumulativeSum seems to give me a running total for each of these points (*7), which is not what I expect.

I’m using the dockerized tick stack.
Kapacitor: 1.3.3
InfluxDB: 1.3.5
Chronograf 1.3.8
Telegraf 1.3.0

My tick script:

var message = '7) [{{.Level}}] {{ index .Fields "cumSum" }} Restarts for {{ index .Tags "pool" }} during the last hour. Generated by "{{.ID}}" rule @ {{.Time}}'
var db = 'primary'
var rp = 'autogen'
var measurement = 'restarts'
var groupBy = ['pool']
var whereFilter = lambda: ("pool" == 'p2pnodeweb')
var period = 30s
var every = 10s

var data = stream
    |from()
        .database(db)
        .retentionPolicy(rp)
        .measurement(measurement)
        .groupBy(groupBy)
        .where(whereFilter)
    |log()
        .prefix('r7-0')
        .level('DEBUG')
     |window()
        .period(period)
        .every(every)
    // |log('/tmp/from.log')
    // |cumulativeSum('value')
    |log()
        .prefix('r7-1')
        .level('DEBUG')
    |cumulativeSum('value')
        .as('cumSum')
    |log()
        .prefix('r7-2')
        .level('DEBUG')

var trigger = data
    |influxDBOut()
        .create()
        .database(outputDB)
        .retentionPolicy(outputRP)
        .measurement(outputMeasurement)
        .tag('alertName', name)
        .tag('triggerType', triggerType)

var trigger2 = data
    |alert()
        .crit(lambda: "cumSum" > 22)
        .critReset(lambda: "cumSum" < 22)
        .warn(lambda: "cumSum" > 0)
//        .stateChangesOnly(60m) //fire on state changes only, but still fire every 60min
        .message(message)
        .id(idVar)
        .idTag(idTag)
        .messageField(messageField)
        .durationField(durationField)
        .log('/tmp/alerts.log')
        .slack()
        .channel('t-platform-bot')

I then

  1. create a recording with: kapacitor record stream -task restart_alert -duration 60s
  2. Use the influx CLI to insert these values: 1,4,3,3,25
    INSERT restarts,pool=appName value=1
    INSERT restarts,pool=appName value=4.0
    […]
  3. replay with kapacitor replay -recording 0686e220-8ac2-4b9d-a919-0c0d98e2a82c -task restart_alert
    I also try with real-clock

This results in 2 alerts being sent to slack. Both showing 23 as the cumSum. I’m guessing the alert is sent out as soon as the cumSum crosses the 22 threshold, but I cannot understand what combination of values or windows yielded that amount. This becomes even more obvious when I make the window (period) larger.

I read the design doc on github, and understand that c below stands for collected and e stands for emitted, but I don’t know what exactly that means. Is that data points that are being passed through? Chunks of data?

My biggest confusion is on these 2 points: 1) Why do the logs show 7 entries for each data point? 2) When I don’t use a period() a log entry is shown after the 7 entries for each point, which just adds to my confusion.

Any help would be greatly appreciated!
Thanks,
Jamis

kapacitor_1      | [task_master:40b8a4fc-4014-4daf-8bc1-a23926edea5f] 2017/11/30 00:06:41 I! opened
kapacitor_1      | [httpd] 127.0.0.1 - - [30/Nov/2017:00:06:41 +0000] "POST /kapacitor/v1/replays HTTP/1.1" 201 262 "-" "KapacitorClient" 57e804db-d562-11e7-807a-000000000000 23077
kapacitor_1      | [task_master:40b8a4fc-4014-4daf-8bc1-a23926edea5f] 2017/11/30 00:06:41 D! Starting task: restart_alert7
kapacitor_1      | [task_master:40b8a4fc-4014-4daf-8bc1-a23926edea5f] 2017/11/30 00:06:41 I! Started task: restart_alert7
kapacitor_1      | [task_master:40b8a4fc-4014-4daf-8bc1-a23926edea5f] 2017/11/30 00:06:41 D! digraph restart_alert7 {
kapacitor_1      | stream0 -> from1;
kapacitor_1      | from1 -> log2;
kapacitor_1      | log2 -> window3;
kapacitor_1      | window3 -> log4;
kapacitor_1      | log4 -> cumulativeSum5;
kapacitor_1      | cumulativeSum5 -> log6;
kapacitor_1      | log6 -> influxdb_out7;
kapacitor_1      | log6 -> alert8;
kapacitor_1      | }
influxdb_1       | [I] 2017-11-30T00:06:41Z CREATE DATABASE chronograf WITH NAME autogen service=query
kapacitor_1      | [edge:task_master:40b8a4fc-4014-4daf-8bc1-a23926edea5f|19c2f552-d101-46dc-87f2-ca0c729820d6->stream] 2017/11/30 00:06:41 D! closing c: 35 e: 34
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":1},"Time":"2017-11-30T00:06:41.662492048Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":1},"Time":"2017-11-30T00:06:41.662492048Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":1},"Time":"2017-11-30T00:06:41.662492048Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":1},"Time":"2017-11-30T00:06:41.662492048Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":1},"Time":"2017-11-30T00:06:41.662492048Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":1},"Time":"2017-11-30T00:06:41.662492048Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":1},"Time":"2017-11-30T00:06:41.662492048Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":4},"Time":"2017-11-30T00:06:50.716692931Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":4},"Time":"2017-11-30T00:06:50.716692931Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":4},"Time":"2017-11-30T00:06:50.716692931Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":4},"Time":"2017-11-30T00:06:50.716692931Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":4},"Time":"2017-11-30T00:06:50.716692931Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":4},"Time":"2017-11-30T00:06:50.716692931Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":4},"Time":"2017-11-30T00:06:50.716692931Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":3},"Time":"2017-11-30T00:06:57.798601155Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":3},"Time":"2017-11-30T00:06:57.798601155Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":3},"Time":"2017-11-30T00:06:57.798601155Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":3},"Time":"2017-11-30T00:06:57.798601155Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":3},"Time":"2017-11-30T00:06:57.798601155Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":3},"Time":"2017-11-30T00:06:57.798601155Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":3},"Time":"2017-11-30T00:06:57.798601155Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":3},"Time":"2017-11-30T00:06:58.318695866Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":3},"Time":"2017-11-30T00:06:58.318695866Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":3},"Time":"2017-11-30T00:06:58.318695866Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":3},"Time":"2017-11-30T00:06:58.318695866Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":3},"Time":"2017-11-30T00:06:58.318695866Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":3},"Time":"2017-11-30T00:06:58.318695866Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":3},"Time":"2017-11-30T00:06:58.318695866Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":25},"Time":"2017-11-30T00:07:08.197481013Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":25},"Time":"2017-11-30T00:07:08.197481013Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":25},"Time":"2017-11-30T00:07:08.197481013Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":25},"Time":"2017-11-30T00:07:08.197481013Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":25},"Time":"2017-11-30T00:07:08.197481013Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":25},"Time":"2017-11-30T00:07:08.197481013Z"}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log2] 2017/11/30 00:06:41 D! r7-0 {"Name":"restarts","Database":"primary","RetentionPolicy":"autogen","Group":"pool=p2pnodeweb","Dimensions":{"ByName":false,"TagNames":["pool"]},"Tags":{"pool":"p2pnodeweb"},"Fields":{"value":25},"Time":"2017-11-30T00:07:08.197481013Z"}
kapacitor_1      |
influxdb_1       | [httpd] 172.23.0.6 - - [30/Nov/2017:00:06:41 +0000] "POST /query?db=&q=CREATE+DATABASE+chronograf+WITH+NAME+autogen HTTP/1.1" 200 62 "-" "KapacitorInfluxDBClient" 57ee3196-d562-11e7-8018-000000000000 6206
kapacitor_1      | [edge:task_master:40b8a4fc-4014-4daf-8bc1-a23926edea5f|write_points->stream] 2017/11/30 00:06:41 D! closing c: 0 e: 0
kapacitor_1      | [edge:restart_alert7|stream->stream0] 2017/11/30 00:06:41 D! closing c: 35 e: 35
kapacitor_1      | [restart_alert7:log4] 2017/11/30 00:06:41 D! r7-1 {"name":"restarts","tmax":"2017-11-30T00:06:51.662492048Z","group":"pool=p2pnodeweb","tags":{"pool":"p2pnodeweb"},"points":[{"time":"2017-11-30T00:06:41.662492048Z","fields":{"value":1},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"value":1},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"value":1},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"value":1},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"value":1},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"value":1},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"value":1},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"value":4},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"value":4},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"value":4},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"value":4},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"value":4},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"value":4},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"value":4},"tags":{"pool":"p2pnodeweb"}}]}
kapacitor_1      |
kapacitor_1      | [edge:restart_alert7|stream0->from1] 2017/11/30 00:06:41 D! closing c: 35 e: 35
kapacitor_1      | [edge:restart_alert7|from1->log2] 2017/11/30 00:06:41 D! closing c: 35 e: 35
kapacitor_1      | [edge:restart_alert7|log2->window3] 2017/11/30 00:06:41 D! closing c: 35 e: 35
kapacitor_1      | [edge:restart_alert7|window3->log4] 2017/11/30 00:06:41 D! closing c: 2 e: 2
kapacitor_1      | [restart_alert7:log4] 2017/11/30 00:06:41 D! r7-1 {"name":"restarts","tmax":"2017-11-30T00:07:07.798601155Z","group":"pool=p2pnodeweb","tags":{"pool":"p2pnodeweb"},"points":[{"time":"2017-11-30T00:06:41.662492048Z","fields":{"value":1},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"value":1},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"value":1},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"value":1},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"value":1},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"value":1},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"value":1},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"value":4},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"value":4},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"value":4},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"value":4},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"value":4},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"value":4},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"value":4},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:57.798601155Z","fields":{"value":3},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:57.798601155Z","fields":{"value":3},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:57.798601155Z","fields":{"value":3},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:57.798601155Z","fields":{"value":3},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:57.798601155Z","fields":{"value":3},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:57.798601155Z","fields":{"value":3},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:57.798601155Z","fields":{"value":3},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:58.318695866Z","fields":{"value":3},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:58.318695866Z","fields":{"value":3},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:58.318695866Z","fields":{"value":3},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:58.318695866Z","fields":{"value":3},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:58.318695866Z","fields":{"value":3},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:58.318695866Z","fields":{"value":3},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:58.318695866Z","fields":{"value":3},"tags":{"pool":"p2pnodeweb"}}]}
kapacitor_1      |
kapacitor_1      | [edge:restart_alert7|log4->cumulativeSum5] 2017/11/30 00:06:41 D! closing c: 2 e: 0
kapacitor_1      | [edge:restart_alert7|cumulativeSum5->log6] 2017/11/30 00:06:41 D! closing c: 2 e: 0
kapacitor_1      | [restart_alert7:log6] 2017/11/30 00:06:41 D! r7-2 {"name":"restarts","tmax":"2017-11-30T00:06:51.662492048Z","group":"pool=p2pnodeweb","tags":{"pool":"p2pnodeweb"},"points":[{"time":"2017-11-30T00:06:41.662492048Z","fields":{"cumSum":1},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"cumSum":2},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"cumSum":3},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"cumSum":4},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"cumSum":5},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"cumSum":6},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"cumSum":7},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"cumSum":11},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"cumSum":15},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"cumSum":19},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"cumSum":23},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"cumSum":27},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"cumSum":31},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"cumSum":35},"tags":{"pool":"p2pnodeweb"}}]}
kapacitor_1      |
kapacitor_1      | [restart_alert7:log6] 2017/11/30 00:06:41 D! r7-2 {"name":"restarts","tmax":"2017-11-30T00:07:07.798601155Z","group":"pool=p2pnodeweb","tags":{"pool":"p2pnodeweb"},"points":[{"time":"2017-11-30T00:06:41.662492048Z","fields":{"cumSum":1},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"cumSum":2},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"cumSum":3},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"cumSum":4},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"cumSum":5},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"cumSum":6},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:41.662492048Z","fields":{"cumSum":7},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"cumSum":11},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"cumSum":15},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"cumSum":19},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"cumSum":23},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"cumSum":27},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"cumSum":31},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:50.716692931Z","fields":{"cumSum":35},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:57.798601155Z","fields":{"cumSum":38},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:57.798601155Z","fields":{"cumSum":41},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:57.798601155Z","fields":{"cumSum":44},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:57.798601155Z","fields":{"cumSum":47},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:57.798601155Z","fields":{"cumSum":50},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:57.798601155Z","fields":{"cumSum":53},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:57.798601155Z","fields":{"cumSum":56},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:58.318695866Z","fields":{"cumSum":59},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:58.318695866Z","fields":{"cumSum":62},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:58.318695866Z","fields":{"cumSum":65},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:58.318695866Z","fields":{"cumSum":68},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:58.318695866Z","fields":{"cumSum":71},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:58.318695866Z","fields":{"cumSum":74},"tags":{"pool":"p2pnodeweb"}},{"time":"2017-11-30T00:06:58.318695866Z","fields":{"cumSum":77},"tags":{"pool":"p2pnodeweb"}}]}
kapacitor_1      |
kapacitor_1      | [edge:restart_alert7|log6->alert8] 2017/11/30 00:06:41 D! closing c: 2 e: 0
kapacitor_1      | [edge:restart_alert7|log6->influxdb_out7] 2017/11/30 00:06:41 D! closing c: 2 e: 0
kapacitor_1      | [restart_alert7:alert8] 2017/11/30 00:06:41 D! CRITICAL alert triggered id:7) Restarts > 20:pool=p2pnodeweb msg:7) [CRITICAL] 23 Restarts for p2pnodeweb during the last hour. Generated by "7) Restarts > 20:pool=p2pnodeweb" rule @ 2017-11-30 00:06:50.716692931 +0000 UTC data:&{restarts map[pool:p2pnodeweb] [time cumSum] [[2017-11-30 00:06:41.662492048 +0000 UTC 1] [2017-11-30 00:06:41.662492048 +0000 UTC 2] [2017-11-30 00:06:41.662492048 +0000 UTC 3] [2017-11-30 00:06:41.662492048 +0000 UTC 4] [2017-11-30 00:06:41.662492048 +0000 UTC 5] [2017-11-30 00:06:41.662492048 +0000 UTC 6] [2017-11-30 00:06:41.662492048 +0000 UTC 7] [2017-11-30 00:06:50.716692931 +0000 UTC 11] [2017-11-30 00:06:50.716692931 +0000 UTC 15] [2017-11-30 00:06:50.716692931 +0000 UTC 19] [2017-11-30 00:06:50.716692931 +0000 UTC 23] [2017-11-30 00:06:50.716692931 +0000 UTC 27] [2017-11-30 00:06:50.716692931 +0000 UTC 31] [2017-11-30 00:06:50.716692931 +0000 UTC 35]]}
kapacitor_1      | [restart_alert7:alert8] 2017/11/30 00:06:41 D! CRITICAL alert triggered id:7) Restarts > 20:pool=p2pnodeweb msg:7) [CRITICAL] 23 Restarts for p2pnodeweb during the last hour. Generated by "7) Restarts > 20:pool=p2pnodeweb" rule @ 2017-11-30 00:06:50.716692931 +0000 UTC data:&{restarts map[pool:p2pnodeweb] [time cumSum] [[2017-11-30 00:06:41.662492048 +0000 UTC 1] [2017-11-30 00:06:41.662492048 +0000 UTC 2] [2017-11-30 00:06:41.662492048 +0000 UTC 3] [2017-11-30 00:06:41.662492048 +0000 UTC 4] [2017-11-30 00:06:41.662492048 +0000 UTC 5] [2017-11-30 00:06:41.662492048 +0000 UTC 6] [2017-11-30 00:06:41.662492048 +0000 UTC 7] [2017-11-30 00:06:50.716692931 +0000 UTC 11] [2017-11-30 00:06:50.716692931 +0000 UTC 15] [2017-11-30 00:06:50.716692931 +0000 UTC 19] [2017-11-30 00:06:50.716692931 +0000 UTC 23] [2017-11-30 00:06:50.716692931 +0000 UTC 27] [2017-11-30 00:06:50.716692931 +0000 UTC 31] [2017-11-30 00:06:50.716692931 +0000 UTC 35] [2017-11-30 00:06:57.798601155 +0000 UTC 38] [2017-11-30 00:06:57.798601155 +0000 UTC 41] [2017-11-30 00:06:57.798601155 +0000 UTC 44] [2017-11-30 00:06:57.798601155 +0000 UTC 47] [2017-11-30 00:06:57.798601155 +0000 UTC 50] [2017-11-30 00:06:57.798601155 +0000 UTC 53] [2017-11-30 00:06:57.798601155 +0000 UTC 56] [2017-11-30 00:06:58.318695866 +0000 UTC 59] [2017-11-30 00:06:58.318695866 +0000 UTC 62] [2017-11-30 00:06:58.318695866 +0000 UTC 65] [2017-11-30 00:06:58.318695866 +0000 UTC 68] [2017-11-30 00:06:58.318695866 +0000 UTC 71] [2017-11-30 00:06:58.318695866 +0000 UTC 74] [2017-11-30 00:06:58.318695866 +0000 UTC 77]]}
kapacitor_1      | [httpd] 127.0.0.1 - - [30/Nov/2017:00:06:42 +0000] "GET /kapacitor/v1/replays/40b8a4fc-4014-4daf-8bc1-a23926edea5f HTTP/1.1" 202 262 "-" "KapacitorClient" 58389323-d562-11e7-807b-000000000000 400
kapacitor_1      | [httpd] 127.0.0.1 - - [30/Nov/2017:00:06:42 +0000] "GET /kapacitor/v1/replays/40b8a4fc-4014-4daf-8bc1-a23926edea5f HTTP/1.1" 202 262 "-" "KapacitorClient" 588515a1-d562-11e7-807c-000000000000 816
influxdb_1       | [httpd] 172.23.0.6 - - [30/Nov/2017:00:06:42 +0000] "POST /write?consistency=&db=chronograf&precision=ns&rp=autogen HTTP/1.1" 204 0 "-" "KapacitorInfluxDBClient" 5890863e-d562-11e7-8019-000000000000 1452
kapacitor_1      | [task_master:40b8a4fc-4014-4daf-8bc1-a23926edea5f] 2017/11/30 00:06:42 I! Stopped task: restart_alert7
kapacitor_1      | [task_master:40b8a4fc-4014-4daf-8bc1-a23926edea5f] 2017/11/30 00:06:42 I! closed

#2

Jamis,

I have a wild guess as to why you are seeing multiple entries for the data. Can you run show subscriptions on the InfluxDB instances via the influx cli? My guess is that somehow you have multiple subscriptions all pointing to the same host. So when InfluxDB gets a write it, sends that point to each subscriptions, which happens to be the same Kapacitor host so Kapacitor gets 7 points instead of one.

I could be way off here, and if I am we can dig further.

As for your other questions:

1 I think the difference between cumulativeSum and sum is easier to see if you do not window the data. CumulativeSum of a stream of data (not windowed) will just keep increasing the sum for each new point it sees and never reset the sum. The sum of a stream of data will sum data that has the same timestamp. On windowed data cumulativeSum will output the running sum for each point in the window and reset for each window. The sum of a window will output a single value for the entire window. Hope that helps.
2. Yes, use .period(2h) and groupBy(time(30m))
3. No, that is a good feature request, please submit a github issue.

Hope this helps.
Nathaniel

#3

Ah, you are absolutely right.

> show subscriptions
name: chronograf
retention_policy name                                           mode destinations
---------------- ----                                           ---- ------------
autogen          kapacitor-6097cf1c-34c0-429d-8589-e684223d0e3d ANY  [http://kapacitor:9092]
autogen          kapacitor-b4fabfdd-409f-4590-be64-6af8112ade16 ANY  [http://kapacitor:9092]
autogen          kapacitor-808e1944-8458-4d85-bc8a-b168e769d6dc ANY  [http://kapacitor:9092]
autogen          kapacitor-84e88b1e-8409-4883-b787-7fb94a50b408 ANY  [http://kapacitor:9092]
autogen          kapacitor-d9fffdce-ab5b-4f13-9065-58966af1684d ANY  [http://kapacitor:9092]
autogen          kapacitor-9ebcd853-6088-4b6d-8336-78e68a09084a ANY  [http://kapacitor:9092]
autogen          kapacitor-1331a135-4fe6-43c6-bcc4-abc59c85bfd2 ANY  [http://kapacitor:9092]

name: _internal
retention_policy name                                           mode destinations
---------------- ----                                           ---- ------------
monitor          kapacitor-6097cf1c-34c0-429d-8589-e684223d0e3d ANY  [http://kapacitor:9092]
monitor          kapacitor-b4fabfdd-409f-4590-be64-6af8112ade16 ANY  [http://kapacitor:9092]
monitor          kapacitor-808e1944-8458-4d85-bc8a-b168e769d6dc ANY  [http://kapacitor:9092]
monitor          kapacitor-84e88b1e-8409-4883-b787-7fb94a50b408 ANY  [http://kapacitor:9092]
monitor          kapacitor-d9fffdce-ab5b-4f13-9065-58966af1684d ANY  [http://kapacitor:9092]
monitor          kapacitor-9ebcd853-6088-4b6d-8336-78e68a09084a ANY  [http://kapacitor:9092]
monitor          kapacitor-1331a135-4fe6-43c6-bcc4-abc59c85bfd2 ANY  [http://kapacitor:9092]

name: primary
retention_policy name                                           mode destinations
---------------- ----                                           ---- ------------
autogen          kapacitor-6097cf1c-34c0-429d-8589-e684223d0e3d ANY  [http://kapacitor:9092]
autogen          kapacitor-b4fabfdd-409f-4590-be64-6af8112ade16 ANY  [http://kapacitor:9092]
autogen          kapacitor-808e1944-8458-4d85-bc8a-b168e769d6dc ANY  [http://kapacitor:9092]
autogen          kapacitor-84e88b1e-8409-4883-b787-7fb94a50b408 ANY  [http://kapacitor:9092]
autogen          kapacitor-d9fffdce-ab5b-4f13-9065-58966af1684d ANY  [http://kapacitor:9092]
autogen          kapacitor-9ebcd853-6088-4b6d-8336-78e68a09084a ANY  [http://kapacitor:9092]
autogen          kapacitor-1331a135-4fe6-43c6-bcc4-abc59c85bfd2 ANY  [http://kapacitor:9092]

name: telegraf
retention_policy name                                           mode destinations
---------------- ----                                           ---- ------------
autogen          kapacitor-808e1944-8458-4d85-bc8a-b168e769d6dc ANY  [http://kapacitor:9092]
autogen          kapacitor-84e88b1e-8409-4883-b787-7fb94a50b408 ANY  [http://kapacitor:9092]
autogen          kapacitor-d9fffdce-ab5b-4f13-9065-58966af1684d ANY  [http://kapacitor:9092]
autogen          kapacitor-9ebcd853-6088-4b6d-8336-78e68a09084a ANY  [http://kapacitor:9092]
autogen          kapacitor-1331a135-4fe6-43c6-bcc4-abc59c85bfd2 ANY  [http://kapacitor:9092]

Some more logs:

influxdb_1       | [I] 2017-11-30T19:01:40Z reading file /var/lib/influxdb/wal/telegraf/autogen/9/_00006.wal, size 855337 engine=tsm1 service=cacheloader
influxdb_1       | [I] 2017-11-30T19:01:40Z reading file /var/lib/influxdb/wal/telegraf/autogen/9/_00007.wal, size 18817 engine=tsm1 service=cacheloader
influxdb_1       | [I] 2017-11-30T19:01:40Z reading file /var/lib/influxdb/wal/telegraf/autogen/9/_00008.wal, size 223499 engine=tsm1 service=cacheloader
influxdb_1       | [I] 2017-11-30T19:01:40Z reading file /var/lib/influxdb/wal/telegraf/autogen/9/_00009.wal, size 0 engine=tsm1 service=cacheloader
influxdb_1       | [I] 2017-11-30T19:01:40Z /var/lib/influxdb/data/telegraf/autogen/9 opened in 528.946796ms service=store
influxdb_1       | [I] 2017-11-30T19:01:40Z opened service service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z Starting monitor system service=monitor
influxdb_1       | [I] 2017-11-30T19:01:40Z 'build' registered for diagnostics monitoring service=monitor
influxdb_1       | [I] 2017-11-30T19:01:40Z 'runtime' registered for diagnostics monitoring service=monitor
influxdb_1       | [I] 2017-11-30T19:01:40Z 'network' registered for diagnostics monitoring service=monitor
influxdb_1       | [I] 2017-11-30T19:01:40Z 'system' registered for diagnostics monitoring service=monitor
influxdb_1       | [I] 2017-11-30T19:01:40Z Starting precreation service with check interval of 10m0s, advance period of 30m0s service=shard-precreation
influxdb_1       | [I] 2017-11-30T19:01:40Z Starting snapshot service service=snapshot
influxdb_1       | [I] 2017-11-30T19:01:40Z Starting continuous query service service=continuous_querier
influxdb_1       | [I] 2017-11-30T19:01:40Z Starting HTTP service service=httpd
influxdb_1       | [I] 2017-11-30T19:01:40Z Authentication enabled:false service=httpd
influxdb_1       | [I] 2017-11-30T19:01:40Z Listening on HTTP:[::]:8086 service=httpd
influxdb_1       | [I] 2017-11-30T19:01:40Z Starting retention policy enforcement service with check interval of 30m0s service=retention
influxdb_1       | [I] 2017-11-30T19:01:40Z Listening for signals
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for chronograf autogen service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for chronograf autogen service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for chronograf autogen service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z Storing statistics in database '_internal' retention policy 'monitor', at interval 10s service=monitor
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for chronograf autogen service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for chronograf autogen service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for chronograf autogen service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for chronograf autogen service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for _internal monitor service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for _internal monitor service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for _internal monitor service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for _internal monitor service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for _internal monitor service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for _internal monitor service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for _internal monitor service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for primary autogen service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for primary autogen service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for primary autogen service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for primary autogen service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for primary autogen service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for primary autogen service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for primary autogen service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for telegraf autogen service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for telegraf autogen service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for telegraf autogen service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for telegraf autogen service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z added new subscription for telegraf autogen service=subscriber
influxdb_1       | [I] 2017-11-30T19:01:40Z Sending usage statistics to usage.influxdata.com
influxdb_1       | [I] 2017-11-30T19:01:40Z CREATE DATABASE telegraf service=query
#4

I think I know why this may be happening:

  1. I currently don’t have a conf file for influxdb ( I do have one for kapacitor)

  2. For persistence, I’ve mounted these volumes:

    • ./shared_data/influxdb:/var/lib/influxdb
    • ./shared_data/kapacitor:/var/lib/kapacitor

When I remove the shared_data folder, I get these subscriptions instead of what we see above:

> show subscriptions
name: telegraf
retention_policy name                                           mode destinations
---------------- ----                                           ---- ------------
autogen          kapacitor-1331a135-4fe6-43c6-bcc4-abc59c85bfd2 ANY  [http://kapacitor:9092]

name: _internal
retention_policy name                                           mode destinations
---------------- ----                                           ---- ------------
monitor          kapacitor-1331a135-4fe6-43c6-bcc4-abc59c85bfd2 ANY  [http://kapacitor:9092]

I’m guessing every time I fire up docker-compose for the stack it adds another subscription (and maxes out at 7 maybe).

Questions:

  1. What do you suggest for persisting the Influx data between deploys? Should I make a backup / restore each time? Or is there an easier way? (update: I checked the docker readme for influx, and it suggests exactly what I’m doing)
  2. Does influx automatically add subscriptions for kapacitor or is that something I should be doing manually?

Currently I just create a DB once, and then just use the volume mounts to persist the data, but seems to be incorrect.

#5

Hm, no matter what I try now, I can’t get it to create several subscriptions. I’ll keep an eye on that, and check for multiple subscriptions.

I suppose I could write a startup script that removes the old subscriptions and then just adds the most recent.

#6

Update:
Here’s something interesting:

When I run docker-compose up --build (which results in same container ID every time, subscriptions are not increased.

When I run docker-compose up --build -d (which results in a new ID every time, subscriptions also increase by 1 every time.

As best as I can tell, when it adds it, influx does a POST to show the subscriptions. Then immediately after it CREATES another subscription. I wonder how it determines if it should create another subscription…

influxdb_1       | [httpd] 172.23.0.4 - - [30/Nov/2017:23:41:26 +0000] "POST /query?db=&q=SHOW+SUBSCRIPTIONS HTTP/1.1" 200 223 "-" "KapacitorInfluxDBClient" faf3f580-d627-11e7-8005-000000000000 364
kapacitor_1      | [srv] 2017/11/30 23:41:26 D! opened service: *telegram.Service
influxdb_1       | [I] 2017-11-30T23:41:26Z CREATE SUBSCRIPTION "kapacitor-cd4da5f1-41fb-4c94-a1d4-4e1f6bdfde82" ON telegraf.autogen DESTINATIONS ANY 'http://kapacitor:9092' service=query
#7

After reading https://github.com/influxdata/kapacitor/issues/969 it’s clear to me that I need to unsubscribe before shutting down the influxdb container. Is this supposed to be automatic?

Is there a blessed way to shut down the influxdb container? Stop the influxd service maybe?

Or is there a drop all subscriptions command I can run on the influx cli?

#8

Final update
@nathaniel Thank you for all your help. Answering my questions has helped me immensely, and pointing me to subscriptions allowed me to find the workaround listed below.

I’d love to understand why using docker-compose down causes subscription duplication, but the following works well enough for me.

I’ve discovered that it matters how I stop the container.

When I use the first set below, subscriptions don’t increase with each docker-compose up command.

These commands work fine:
docker-compose stop
docker [containerID] stop

This command does not work (duplicates subscriptions):
docker-compose down

All of these commands seem to shutdown gracefully:

[I] 2017-12-01T03:13:13Z Signal received, initializing clean shutdown...
[I] 2017-12-01T03:13:13Z Waiting for clean shutdown...
[I] 2017-12-01T03:13:13Z shutting down monitor system service=monitor
[I] 2017-12-01T03:13:13Z terminating storage of statistics service=monitor
[I] 2017-12-01T03:13:13Z Precreation service terminating service=shard-precreation
[I] 2017-12-01T03:13:13Z snapshot listener closed service=snapshot
[I] 2017-12-01T03:13:13Z continuous query service terminating service=continuous_querier
[I] 2017-12-01T03:13:13Z retention policy enforcement terminating service=retention
[I] 2017-12-01T03:13:13Z closed service service=subscriber
[I] 2017-12-01T03:13:13Z server shutdown completed

And they all include these starting logs

[I] 2017-12-01T03:45:15Z added new subscription for primary autogen service=subscriber
[I] 2017-12-01T03:45:15Z Listening for signals
[I] 2017-12-01T03:45:15Z added new subscription for telegraf autogen service=subscriber
[I] 2017-12-01T03:45:15Z Sending usage statistics to usage.influxdata.com
[I] 2017-12-01T03:45:15Z added new subscription for _internal monitor service=subscriber

But only the down command makes these logs show up on startup:

[I] 2017-12-01T03:39:05Z SHOW SUBSCRIPTIONS service=query
[httpd] 172.23.0.6 - - [01/Dec/2017:03:39:05 +0000] "POST /query?db=&q=SHOW+SUBSCRIPTIONS HTTP/1.1" 200 232 "-" "KapacitorInfluxDBClient" 2e5a7bf7-d649-11e7-8009-000000000000 549
[I] 2017-12-01T03:39:05Z CREATE SUBSCRIPTION "kapacitor-5664a81d-24cc-46ab-a802-e2ebe7674e21" ON primary.autogen DESTINATIONS ANY 'http://kapacitor:9092' service=query
[I] 2017-12-01T03:39:05Z added new subscription for primary autogen service=subscriber
#9

Kapacitor stores a unique id in the /var/lib/kapacitor/ directory. When it starts up if the id file does not exist it will create a new one. It uses that ID to know whether the subscriptions belong to it or not. If Kapacitor finds subscriptions with that ID it will assume they belong to it and not create any more. If Kapacitor doesn’t find subscriptions with that ID it will create new ones with its ID.

In short if that ID file can be persisted across restarts of the container then no subscription duplicates should arise. But if for whatever reason that file is lost then each time Kapacitor starts it will get a new ID and create new subscriptions.

#10

Solved! Thank you. That was the piece that I was missing.

Because I generated a config file with the kapacitord config command, the data dir was changed by default to
data_dir = "/root/.kapacitor"

I was persisting the wrong folder.
- ./shared_data/kapacitor:/var/lib/kapacitor

Essentially I was persisting influxdb data (including the subscriptions), but not the kapacitor data so kapacitor kept initializing new subscribes.

Thank you for all your help. If you’re in San Jose near the airport ever, let me buy you lunch.