Kapacitor not creating subscriptions after InfluxDB upgrade

influxdb
kapacitor

#1

We are currently in the process of upgrading Influx throughout our environments and have encountered an issue that I am not able to isolate as being a bug or configuration issue.

We have been running Influx 1.5 and kapacitor 1.4.0 for 7-8 months and have recently started the process of upgrading our Influx instances to 1.7.1. The overall upgrade process was smooth with no configuration changes necessary, however, we have noticed that kapacitor is no longer creating subscriptions against the new version.

Despite being little in the kapacitor change log that seems to be tied to this issue, Ive tried testing with kapacitor 1.5.2 with the same result.

Below are some snippets of the logs related to this issue. Some dns names and identifying information have been changed to protect the innocent :stuck_out_tongue:

Kapacitor startup:

‘##:::’##::::’###::::’########:::::’###:::::’######::’####:’########::’#######::’########::
##::’##::::’## ##::: ##… ##:::’## ##:::’##… ##:. ##::… ##…::’##… ##: ##… ##:
##:’##::::’##:. ##:: ##:::: ##::’##:. ##:: ##:::…::: ##::::: ##:::: ##:::: ##: ##:::: ##:
#####::::’##:::. ##: ########::’##:::. ##: ##:::::::: ##::::: ##:::: ##:::: ##: ########::
##. ##::: #########: ##…::: #########: ##:::::::: ##::::: ##:::: ##:::: ##: ##… ##:::
##:. ##:: ##… ##: ##:::::::: ##… ##: ##::: ##:: ##::::: ##:::: ##:::: ##: ##::. ##::
##::. ##: ##:::: ##: ##:::::::: ##:::: ##:. ######::’####:::: ##::::. #######:: ##:::. ##:
…::::…::…:::::…::…:::::::::…:::::…:::…:::…:::::…::::::…:::…:::::…::
2018/12/20 16:21:00 Using configuration at: /etc/kapacitor/kapacitor.conf
ts=2018-12-20T16:21:00.839Z lvl=info msg=“kapacitor starting” service=run version=1.5.2 branch=HEAD commit=3086452d00830e01d932838d8c6d1df818648ad3
ts=2018-12-20T16:21:00.839Z lvl=info msg=“go version” service=run version=go1.11.2
ts=2018-12-20T16:21:00.839Z lvl=info msg=“listing Kapacitor hostname” source=srv hostname=infra-general-kapacitor.infra.svc.cluster.local
ts=2018-12-20T16:21:00.839Z lvl=info msg=“listing ClusterID and ServerID” source=srv cluster_id=b4def909-757d-441a-a567-6ec547dd8051 server_id=be82bca3-a9fd-4817-a882-961f62d6d3d3
ts=2018-12-20T16:21:00.839Z lvl=info msg=“opened task master” service=kapacitor task_master=main
ts=2018-12-20T16:21:00.839Z lvl=info msg=“using InsecureSkipVerify when connecting to InfluxDB; this is insecure” service=influxdb cluster=infra-influx-all urls_0=http://valid-influxdb-url:8086/
ts=2018-12-20T16:21:00.839Z lvl=debug msg=“opening service” source=srv service=*storage.Service

InfluxDB logs during Kapacitor startup and live cycle:

Dec 20 15:53:47 some-valid-hostname influxd[7936]: [httpd] 10.1.10.207 - - [20/Dec/2018:15:53:47 +0000] “GET /ping HTTP/1.1” 204 0 “-” “KapacitorInfluxDBClient” 6fb215a1-046f-11e9-966a-0aff01b2e450 49
Dec 20 15:53:47 some-valid-hostname influxd[7936]: [httpd] 10.1.10.207 - - [20/Dec/2018:15:53:47 +0000] “POST /query?db=&q=SHOW+DATABASES HTTP/1.1” 200 151 “-” “KapacitorInfluxDBClient” 6fb22c4f-046f-11e9-966b-0aff01b2e450 509
Dec 20 15:53:47 some-valid-hostname influxd[7936]: [httpd] 10.1.10.207 - - [20/Dec/2018:15:53:47 +0000] “POST /query?db=&q=SHOW+RETENTION+POLICIES+ON+kubernetes HTTP/1.1” 200 147 “-” “KapacitorInfluxDBClient” 6fb27a8e-046f-11e9-966d-0aff01b2e450 247
Dec 20 15:53:47 some-valid-hostname influxd[7936]: [httpd] 10.1.10.207 - - [20/Dec/2018:15:53:47 +0000] “POST /query?db=&q=SHOW+RETENTION+POLICIES+ON+system HTTP/1.1” 200 149 “-” “KapacitorInfluxDBClient” 6fb2f56b-046f-11e9-9671-0aff01b2e450 240
Dec 20 15:53:47 some-valid-hostname influxd[7936]: [httpd] 10.1.10.207 - - [20/Dec/2018:15:53:47 +0000] “POST /query?db=&q=SHOW+RETENTION+POLICIES+ON+_internal HTTP/1.1” 200 153 “-” “KapacitorInfluxDBClient” 6fb310cf-046f-11e9-9672-0aff01b2e450 259
Dec 20 15:53:47 some-valid-hostname influxd[7936]: [httpd] 10.1.10.207 - - [20/Dec/2018:15:53:47 +0000] “POST /query?db=&q=SHOW+RETENTION+POLICIES+ON+events HTTP/1.1” 200 149 “-” “KapacitorInfluxDBClient” 6fb32f21-046f-11e9-9673-0aff01b2e450 231
Dec 20 15:53:47 some-valid-hostname influxd[7936]: [httpd] 10.1.10.207 - - [20/Dec/2018:15:53:47 +0000] “POST /query?db=&q=SHOW+SUBSCRIPTIONS HTTP/1.1” 200 57 “-” “KapacitorInfluxDBClient” 6fb34c4c-046f-11e9-9674-0aff01b2e450 278

From this we can see that Kapacitor is able to communicate to Influx and even retrieves a list of subscriptions, however, from that point on kapacitor does not actually create the subscriptions.

Looking at the code path in Kapacitor I believe the show subscriptions call is coming from here: https://github.com/influxdata/kapacitor/blob/v1.5.2/services/influxdb/service.go#L883 however from that point the portions of code that perform subscription related operations do not offer much in the way of debug output - so short of adding a ton of logging and running a custom version Im at a lose of whats going on.

Might anyone have stumbled onto this?


#2

I went the route of peppering logging throughout the entire subscription code and ended up identifying why the subscriptions were not being created – the retention policies to which they were subscribing were not present within influx thus the subscriptions were not created.

All custom retention policies were not present (just the autogen) and are not sure how they got removed. The current working assumption is that something in the upgrade caused them to be deleted but we havent been able to validate that just yet.