SIGSEGV: segmentation violation caused by TICKscript?


Is it possible for a TICKscript to crash kapacitor? That seems to be what’s happening, but I can’t confirm that and don’t even know where to look.

What I know is that we’ve had kapacitor running for quite a while without much use. I created a template task and used it to make tasks. Now Kapacitor crashes with the following after running for about a minute:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0xa0 pc=0xa07575]

goroutine 3078 [running]:*InfluxQLNode).runStreamInfluxQL(0xc4428bcc00, 0xc43bd1edc0, 0xc420020600)
        /home/ec2-user/go/src/ +0xe05*InfluxQLNode).runInfluxQLs(0xc4428bcc00, 0x0, 0x0, 0x0, 0xc434550f78, 0xc43bd1edc0)
        /home/ec2-user/go/src/ +0x115*InfluxQLNode).(, 0x0, 0x0, 0xc434550fa0, 0x0)
        /home/ec2-user/go/src/ +0x48*node).start.func1(0xc4428bcc00, 0x0, 0x0, 0x0)
        /home/ec2-user/go/src/ +0x8e
created by*node).start
        /home/ec2-user/go/src/ +0x5d

If I delete my tasks, kapacitor doesn’t crash right away, so I’m thinking there’s something in the TICKscript that’s ticking off kapacitor.

Here’s the template task:

// API call that this task monitors and triggers on
var targetApi string

// Application this is grouped with
var application string

// Number of errors that triggers an alert
var errorAbsolute int

// Percentage of errors that triggers an alert
var errorPercentage int

// Minimum number of errors  before triggering an alert
var minErrorCount int

// Minimum number of requests before triggering an alert
var minRequestCount int

// Any relevant notes about this api; used if there are known issues
var note string

// response time on which to trigger an alert
var responseTime int

// length of sliding window of time
var period = 5m

// frequency it is checked
var every = 1m

var data = stream
        |where(lambda: "api" == targetApi)
        |log().prefix('initial stream: ').level('DEBUG')

var totalRecords = data
        |log().prefix('totalRecords: ').level('DEBUG')

var totalErrors = data
        |where(lambda: "error" == 'true')
        |log().prefix('totalErrors (pre-count): ').level('DEBUG')
        |log().prefix('totalErrors (post-count): ').level('DEBUG')

        |log().prefix('pre-join: ').level('DEBUG')
        |eval(lambda: float("totalErrors.count") / float("totals.count" + 1)  )
        |log().prefix('pre-alert: ').level('DEBUG')
                .id('{{ index .Tags "api"}}')
                .message('Count: {{ index .Fields "totals.count" }}; Errors: {{ index .Fields "totalErrors.count" }} Rate: {{ index .Fields "rate" }}  maxDuration: {{ index .Fields "executionTime" }}')
                //.crit(lambda: TRUE)
                .crit(lambda: "rate" > errorPercentage OR "totals.count" > errorAbsolute OR max('duration') > responseTime)
                // Whenever we get an alert write it to a file.

So generally, can anyone spot anything that might cause a SIGSEGV? Or shed some light on what I might be doing wrong?

I have some specific questions, too:

In the stack trace, does this goroutine number:

goroutine 3078 [running]:

tell me anything useful about which task might be at fault (if it’s a task)?

This line seems to lay the blame on a InfluxQLNode:***InfluxQLNode**).runStreamInfluxQL(0xc4428bcc00, 0xc43bd1edc0, 0xc420020600)

Is there a way to tell which InfluxQLNode? And do the number after runStreamInfluxQL point to anything useful in tracking down this bug?

Any help would be appreciated. Still a newbie…