Telegraf Kafka input -- making no progress on Kafka lag

We’re using kafka to get messages into telegraf. However, several of our nodes have been hanging (?) on particular partitions.

Group           Topic                          Pid Offset          logSize         Lag             Owner
kafka-aggr	telegraf                       0   2178614         2178748         134             prd-xm-telegraf-dataproc-v2-metrics-server1
kafka-aggr	telegraf                       1   866             1846971         1846105         prd-xm-telegraf-dataproc-v2-metrics-server2
kafka-aggr	telegraf                       2   1999872         2004413         4541            prd-xm-telegraf-dataproc-v2-metrics-server3
kafka-aggr	telegraf                       3   2032            1793542         1791510         prd-xm-telegraf-dataproc-v2-metrics-server4
kafka-aggr	telegraf                       4   1552            1563455         1561903         prd-xm-telegraf-dataproc-v2-metrics-server5
kafka-aggr	telegraf                       5   3038            1846319         1843281         prd-xm-telegraf-dataproc-v2-metrics-server6

The telegraf consumers working on partitions 1,3,4,5 appear to not be consuming any more data, while the other two seem to be making progress fine. Telegraf version is

Telegraf v1.2.1 (git: release-1.2 3b6ffb344e5c03c1595d862282a6823ecb438cff)

Anyone seen this?

In the log, I’ve got debugging turned on, and can see it processing metrics, probably from a different input (local CPU or something). Here’s a snip of the log, and a stack trace:

2017-03-24T23:26:09Z D! Output [influxdb] wrote batch of 34 metrics in 19.721271ms
2017-03-24T23:26:33Z D! Output [influxdb] buffer fullness: 34 / 100000 metrics. 
2017-03-24T23:26:33Z D! Output [influxdb] wrote batch of 34 metrics in 4.731444ms
2017-03-24T23:26:48Z D! Output [influxdb] buffer fullness: 51 / 100000 metrics. 
2017-03-24T23:26:48Z D! Output [influxdb] wrote batch of 51 metrics in 5.684483ms
SIGQUIT: quit
PC=0x464071 m=0

goroutine 0 [idle]:
runtime.futex(0x19a49d0, 0x0, 0x0, 0x0, 0x0, 0x7ed048, 0x0, 0x0, 0x7ffd0b56e910, 0x412ff2, ...)
	/usr/local/go/src/runtime/sys_linux_amd64.s:387 +0x21
runtime.futexsleep(0x19a49d0, 0x0, 0xffffffffffffffff)
	/usr/local/go/src/runtime/os_linux.go:45 +0x62
runtime.notesleep(0x19a49d0)
	/usr/local/go/src/runtime/lock_futex.go:145 +0x82
runtime.stopm()
	/usr/local/go/src/runtime/proc.go:1594 +0xad
runtime.exitsyscall0(0xc420c21d40)
	/usr/local/go/src/runtime/proc.go:2638 +0x128
runtime.mcall(0x7ffd0b56e9c0)
	/usr/local/go/src/runtime/asm_amd64.s:240 +0x5b

goroutine 1 [semacquire, 18 minutes]:
sync.runtime_Semacquire(0xc4204d628c)
	/usr/local/go/src/runtime/sema.go:47 +0x30
sync.(*WaitGroup).Wait(0xc4204d6280)
	/usr/local/go/src/sync/waitgroup.go:131 +0x97
github.com/influxdata/telegraf/agent.(*Agent).Run(0xc420038070, 0xc42012ce40, 0x0, 0x0)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:383 +0x4de
main.reloadLoop(0xc42012c660, 0x0, 0x0)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:284 +0xcec
main.main()
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:342 +0x85

goroutine 17 [syscall, 18 minutes, locked to thread]:
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:2086 +0x1

goroutine 5 [syscall, 18 minutes]:
os/signal.signal_recv(0x0)
	/usr/local/go/src/runtime/sigqueue.go:116 +0x157
os/signal.loop()
	/usr/local/go/src/os/signal/signal_unix.go:22 +0x22
created by os/signal.init.1
	/usr/local/go/src/os/signal/signal_unix.go:28 +0x41

goroutine 46 [semacquire, 18 minutes]:
sync.runtime_Semacquire(0xc42020304c)
	/usr/local/go/src/runtime/sema.go:47 +0x30
sync.(*WaitGroup).Wait(0xc420203040)
	/usr/local/go/src/sync/waitgroup.go:131 +0x97
github.com/samuel/go-zookeeper/zk.(*Conn).loop(0xc420146460)
	/home/ubuntu/telegraf-build/src/github.com/samuel/go-zookeeper/zk/conn.go:335 +0x4cb
github.com/samuel/go-zookeeper/zk.Connect.func1(0xc420146460)
	/home/ubuntu/telegraf-build/src/github.com/samuel/go-zookeeper/zk/conn.go:205 +0x2f
created by github.com/samuel/go-zookeeper/zk.Connect
	/home/ubuntu/telegraf-build/src/github.com/samuel/go-zookeeper/zk/conn.go:209 +0x64a

goroutine 54 [select]:
github.com/influxdata/telegraf/internal.RandomSleep(0x37e11d600, 0xc42012ce40)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/internal/internal.go:235 +0x15c
github.com/influxdata/telegraf/agent.(*Agent).flusher(0xc420038070, 0xc42012ce40, 0xc42012d0e0, 0x0, 0x0)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:298 +0x20e
github.com/influxdata/telegraf/agent.(*Agent).Run.func1(0xc4204d6280, 0xc420038070, 0xc42012ce40, 0xc42012d0e0)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:353 +0x77
created by github.com/influxdata/telegraf/agent.(*Agent).Run
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:357 +0x320

goroutine 43 [select, 18 minutes]:
main.reloadLoop.func2(0xc42012cea0, 0xc42012ce40, 0xc42018b500, 0xc42012c660)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:252 +0x264
created by main.reloadLoop
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:266 +0x98d

goroutine 55 [select]:
github.com/influxdata/telegraf/agent.(*Agent).gatherer(0xc420038070, 0xc42012ce40, 0xc4201eb580, 0x4a817c800, 0xc42012d0e0)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:129 +0x37e
github.com/influxdata/telegraf/agent.(*Agent).Run.func3(0xc4204d6280, 0xc420038070, 0xc42012ce40, 0xc42012d0e0, 0xc4201eb580, 0x4a817c800)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:379 +0x7f
created by github.com/influxdata/telegraf/agent.(*Agent).Run
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:380 +0x49b

goroutine 42 [select, 18 minutes, locked to thread]:
runtime.gopark(0x11575f0, 0x0, 0x107ae04, 0x6, 0x18, 0x2)
	/usr/local/go/src/runtime/proc.go:259 +0x13a
runtime.selectgoImpl(0xc4202bef20, 0x0, 0x18)
	/usr/local/go/src/runtime/select.go:423 +0x1235
runtime.selectgo(0xc4202bef20)
	/usr/local/go/src/runtime/select.go:238 +0x1c
runtime.ensureSigM.func1()
	/usr/local/go/src/runtime/signal1_unix.go:304 +0x2f3
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:2086 +0x1

goroutine 51 [IO wait]:
net.runtime_pollWait(0x7f52ef902660, 0x72, 0x4)
	/usr/local/go/src/runtime/netpoll.go:160 +0x59
net.(*pollDesc).wait(0xc4204cc840, 0x72, 0xc420045bc0, 0xc420012078)
	/usr/local/go/src/net/fd_poll_runtime.go:73 +0x38
net.(*pollDesc).waitRead(0xc4204cc840, 0x18ee740, 0xc420012078)
	/usr/local/go/src/net/fd_poll_runtime.go:78 +0x34
net.(*netFD).Read(0xc4204cc7e0, 0xc42084c000, 0x4, 0x180000, 0x0, 0x18ee740, 0xc420012078)
	/usr/local/go/src/net/fd_unix.go:243 +0x1a1
net.(*conn).Read(0xc4202b6008, 0xc42084c000, 0x4, 0x180000, 0x0, 0x0, 0x0)
	/usr/local/go/src/net/net.go:173 +0x70
io.ReadAtLeast(0x18eb6c0, 0xc4202b6008, 0xc42084c000, 0x4, 0x180000, 0x4, 0x1042fc0, 0x0, 0x18eb6c0)
	/usr/local/go/src/io/io.go:307 +0xa4
io.ReadFull(0x18eb6c0, 0xc4202b6008, 0xc42084c000, 0x4, 0x180000, 0x0, 0x0, 0x0)
	/usr/local/go/src/io/io.go:325 +0x58
github.com/samuel/go-zookeeper/zk.(*Conn).recvLoop(0xc420146460, 0x18fede0, 0xc4202b6008, 0x5d, 0xe)
	/home/ubuntu/telegraf-build/src/github.com/samuel/go-zookeeper/zk/conn.go:575 +0x173
github.com/samuel/go-zookeeper/zk.(*Conn).loop.func2(0xc420146460, 0xc42007c300, 0xc420203040)
	/home/ubuntu/telegraf-build/src/github.com/samuel/go-zookeeper/zk/conn.go:325 +0x41
created by github.com/samuel/go-zookeeper/zk.(*Conn).loop
	/home/ubuntu/telegraf-build/src/github.com/samuel/go-zookeeper/zk/conn.go:332 +0x4ac

goroutine 50 [select]:
github.com/samuel/go-zookeeper/zk.(*Conn).sendLoop(0xc420146460, 0x18fede0, 0xc4202b6008, 0xc42007c300, 0x0, 0x0)
	/home/ubuntu/telegraf-build/src/github.com/samuel/go-zookeeper/zk/conn.go:511 +0xb62
github.com/samuel/go-zookeeper/zk.(*Conn).loop.func1(0xc420146460, 0xc42007c300, 0xc420203040)
	/home/ubuntu/telegraf-build/src/github.com/samuel/go-zookeeper/zk/conn.go:317 +0x4b
created by github.com/samuel/go-zookeeper/zk.(*Conn).loop
	/home/ubuntu/telegraf-build/src/github.com/samuel/go-zookeeper/zk/conn.go:321 +0x45c

goroutine 731 [semacquire, 15 minutes]:
sync.runtime_Semacquire(0xc420efd27c)
	/usr/local/go/src/runtime/sema.go:47 +0x30
sync.(*WaitGroup).Wait(0xc420efd270)
	/usr/local/go/src/sync/waitgroup.go:131 +0x97
github.com/wvanbergen/kafka/consumergroup.(*ConsumerGroup).topicConsumer(0xc420148510, 0xc420113451, 0x8, 0xc42007cc00, 0xc42007cc60, 0xc420233740)
	/home/ubuntu/telegraf-build/src/github.com/wvanbergen/kafka/consumergroup/consumer_group.go:348 +0x53d
created by github.com/wvanbergen/kafka/consumergroup.(*ConsumerGroup).topicListConsumer
	/home/ubuntu/telegraf-build/src/github.com/wvanbergen/kafka/consumergroup/consumer_group.go:274 +0x25a

goroutine 52 [chan receive]:
github.com/Shopify/sarama.(*Broker).responseReceiver(0xc420456070)
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/broker.go:410 +0x103
github.com/Shopify/sarama.(*Broker).(github.com/Shopify/sarama.responseReceiver)-fm()
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/broker.go:93 +0x2a
github.com/Shopify/sarama.withRecover(0xc420113350)
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/utils.go:46 +0x43
created by github.com/Shopify/sarama.(*Broker).Open.func1
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/broker.go:93 +0x5d9

goroutine 68 [select, 8 minutes]:
github.com/Shopify/sarama.(*client).backgroundMetadataUpdater(0xc420144200)
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/client.go:560 +0x2e6
github.com/Shopify/sarama.(*client).(github.com/Shopify/sarama.backgroundMetadataUpdater)-fm()
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/client.go:149 +0x2a
github.com/Shopify/sarama.withRecover(0xc420117890)
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/utils.go:46 +0x43
created by github.com/Shopify/sarama.NewClient
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/client.go:149 +0x5f4

goroutine 69 [select]:
github.com/wvanbergen/kafka/consumergroup.(*zookeeperOffsetManager).offsetCommitter(0xc42005a910)
	/home/ubuntu/telegraf-build/src/github.com/wvanbergen/kafka/consumergroup/offset_manager.go:199 +0x1fb
created by github.com/wvanbergen/kafka/consumergroup.NewZookeeperOffsetManager
	/home/ubuntu/telegraf-build/src/github.com/wvanbergen/kafka/consumergroup/offset_manager.go:106 +0x201

goroutine 70 [select, 15 minutes]:
github.com/wvanbergen/kafka/consumergroup.(*ConsumerGroup).topicListConsumer(0xc420148510, 0xc4201135b0, 0x1, 0x1)
	/home/ubuntu/telegraf-build/src/github.com/wvanbergen/kafka/consumergroup/consumer_group.go:277 +0x5e3
created by github.com/wvanbergen/kafka/consumergroup.JoinConsumerGroup
	/home/ubuntu/telegraf-build/src/github.com/wvanbergen/kafka/consumergroup/consumer_group.go:179 +0x65f

goroutine 71 [select, 18 minutes]:
github.com/influxdata/telegraf/plugins/inputs/kafka_consumer.(*Kafka).receiver(0xc420164000)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/plugins/inputs/kafka_consumer/kafka_consumer.go:127 +0x67f
created by github.com/influxdata/telegraf/plugins/inputs/kafka_consumer.(*Kafka).Start
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/plugins/inputs/kafka_consumer/kafka_consumer.go:117 +0x1d0

goroutine 746 [chan receive, 15 minutes]:
github.com/Shopify/sarama.(*partitionConsumer).responseFeeder(0xc420ca7200)
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/consumer.go:416 +0x74
github.com/Shopify/sarama.(*partitionConsumer).(github.com/Shopify/sarama.responseFeeder)-fm()
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/consumer.go:156 +0x2a
github.com/Shopify/sarama.withRecover(0xc420203850)
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/utils.go:46 +0x43
created by github.com/Shopify/sarama.(*consumer).ConsumePartition
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/consumer.go:156 +0x391

goroutine 56 [select]:
github.com/influxdata/telegraf/internal.RandomSleep(0x37e11d600, 0xc42012ce40)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/internal/internal.go:235 +0x15c
github.com/influxdata/telegraf/agent.(*Agent).gatherer(0xc420038070, 0xc42012ce40, 0xc4201eb600, 0x4a817c800, 0xc42012d0e0)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:121 +0x22b
github.com/influxdata/telegraf/agent.(*Agent).Run.func3(0xc4204d6280, 0xc420038070, 0xc42012ce40, 0xc42012d0e0, 0xc4201eb600, 0x4a817c800)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:379 +0x7f
created by github.com/influxdata/telegraf/agent.(*Agent).Run
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:380 +0x49b

goroutine 57 [select]:
github.com/influxdata/telegraf/agent.(*Agent).gatherer(0xc420038070, 0xc42012ce40, 0xc4201eb680, 0x4a817c800, 0xc42012d0e0)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:129 +0x37e
github.com/influxdata/telegraf/agent.(*Agent).Run.func3(0xc4204d6280, 0xc420038070, 0xc42012ce40, 0xc42012d0e0, 0xc4201eb680, 0x4a817c800)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:379 +0x7f
created by github.com/influxdata/telegraf/agent.(*Agent).Run
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:380 +0x49b

goroutine 58 [select]:
github.com/influxdata/telegraf/agent.(*Agent).gatherer(0xc420038070, 0xc42012ce40, 0xc4201eb700, 0x4a817c800, 0xc42012d0e0)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:129 +0x37e
github.com/influxdata/telegraf/agent.(*Agent).Run.func3(0xc4204d6280, 0xc420038070, 0xc42012ce40, 0xc42012d0e0, 0xc4201eb700, 0x4a817c800)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:379 +0x7f
created by github.com/influxdata/telegraf/agent.(*Agent).Run
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:380 +0x49b

goroutine 59 [select]:
github.com/influxdata/telegraf/agent.(*Agent).gatherer(0xc420038070, 0xc42012ce40, 0xc4201eb7c0, 0x4a817c800, 0xc42012d0e0)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:129 +0x37e
github.com/influxdata/telegraf/agent.(*Agent).Run.func3(0xc4204d6280, 0xc420038070, 0xc42012ce40, 0xc42012d0e0, 0xc4201eb7c0, 0x4a817c800)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:379 +0x7f
created by github.com/influxdata/telegraf/agent.(*Agent).Run
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:380 +0x49b

goroutine 60 [select]:
github.com/influxdata/telegraf/agent.(*Agent).gatherer(0xc420038070, 0xc42012ce40, 0xc4201eb900, 0x4a817c800, 0xc42012d0e0)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:129 +0x37e
github.com/influxdata/telegraf/agent.(*Agent).Run.func3(0xc4204d6280, 0xc420038070, 0xc42012ce40, 0xc42012d0e0, 0xc4201eb900, 0x4a817c800)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:379 +0x7f
created by github.com/influxdata/telegraf/agent.(*Agent).Run
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:380 +0x49b

goroutine 61 [select]:
github.com/influxdata/telegraf/internal.RandomSleep(0x37e11d600, 0xc42012ce40)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/internal/internal.go:235 +0x15c
github.com/influxdata/telegraf/agent.(*Agent).gatherer(0xc420038070, 0xc42012ce40, 0xc4201eb980, 0x4a817c800, 0xc42012d0e0)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:121 +0x22b
github.com/influxdata/telegraf/agent.(*Agent).Run.func3(0xc4204d6280, 0xc420038070, 0xc42012ce40, 0xc42012d0e0, 0xc4201eb980, 0x4a817c800)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:379 +0x7f
created by github.com/influxdata/telegraf/agent.(*Agent).Run
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:380 +0x49b

goroutine 62 [select]:
github.com/influxdata/telegraf/internal.RandomSleep(0x37e11d600, 0xc42012ce40)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/internal/internal.go:235 +0x15c
github.com/influxdata/telegraf/agent.(*Agent).gatherer(0xc420038070, 0xc42012ce40, 0xc4201eba40, 0x4a817c800, 0xc42012d0e0)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:121 +0x22b
github.com/influxdata/telegraf/agent.(*Agent).Run.func3(0xc4204d6280, 0xc420038070, 0xc42012ce40, 0xc42012d0e0, 0xc4201eba40, 0x4a817c800)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:379 +0x7f
created by github.com/influxdata/telegraf/agent.(*Agent).Run
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:380 +0x49b

goroutine 63 [select]:
github.com/influxdata/telegraf/agent.(*Agent).gatherer(0xc420038070, 0xc42012ce40, 0xc4201ebac0, 0x4a817c800, 0xc42012d0e0)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:129 +0x37e
github.com/influxdata/telegraf/agent.(*Agent).Run.func3(0xc4204d6280, 0xc420038070, 0xc42012ce40, 0xc42012d0e0, 0xc4201ebac0, 0x4a817c800)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:379 +0x7f
created by github.com/influxdata/telegraf/agent.(*Agent).Run
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:380 +0x49b

goroutine 64 [select]:
github.com/influxdata/telegraf/internal.RandomSleep(0x37e11d600, 0xc42012ce40)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/internal/internal.go:235 +0x15c
github.com/influxdata/telegraf/agent.(*Agent).gatherer(0xc420038070, 0xc42012ce40, 0xc4201ebb40, 0x4a817c800, 0xc42012d0e0)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:121 +0x22b
github.com/influxdata/telegraf/agent.(*Agent).Run.func3(0xc4204d6280, 0xc420038070, 0xc42012ce40, 0xc42012d0e0, 0xc4201ebb40, 0x4a817c800)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:379 +0x7f
created by github.com/influxdata/telegraf/agent.(*Agent).Run
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:380 +0x49b

goroutine 65 [select]:
github.com/influxdata/telegraf/internal.RandomSleep(0x37e11d600, 0xc42012ce40)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/internal/internal.go:235 +0x15c
github.com/influxdata/telegraf/agent.(*Agent).gatherer(0xc420038070, 0xc42012ce40, 0xc4201ebbc0, 0x4a817c800, 0xc42012d0e0)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:121 +0x22b
github.com/influxdata/telegraf/agent.(*Agent).Run.func3(0xc4204d6280, 0xc420038070, 0xc42012ce40, 0xc42012d0e0, 0xc4201ebbc0, 0x4a817c800)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:379 +0x7f
created by github.com/influxdata/telegraf/agent.(*Agent).Run
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:380 +0x49b

goroutine 82 [select]:
github.com/influxdata/telegraf/agent.(*Agent).gatherer(0xc420038070, 0xc42012ce40, 0xc42011e000, 0x4a817c800, 0xc42012d0e0)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:129 +0x37e
github.com/influxdata/telegraf/agent.(*Agent).Run.func3(0xc4204d6280, 0xc420038070, 0xc42012ce40, 0xc42012d0e0, 0xc42011e000, 0x4a817c800)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:379 +0x7f
created by github.com/influxdata/telegraf/agent.(*Agent).Run
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:380 +0x49b

goroutine 83 [select]:
github.com/influxdata/telegraf/internal.RandomSleep(0x37e11d600, 0xc42012ce40)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/internal/internal.go:235 +0x15c
github.com/influxdata/telegraf/agent.(*Agent).gatherer(0xc420038070, 0xc42012ce40, 0xc42011e0c0, 0x4a817c800, 0xc42012d0e0)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:121 +0x22b
github.com/influxdata/telegraf/agent.(*Agent).Run.func3(0xc4204d6280, 0xc420038070, 0xc42012ce40, 0xc42012d0e0, 0xc42011e0c0, 0x4a817c800)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:379 +0x7f
created by github.com/influxdata/telegraf/agent.(*Agent).Run
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:380 +0x49b

goroutine 84 [select]:
github.com/influxdata/telegraf/internal.RandomSleep(0x37e11d600, 0xc42012ce40)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/internal/internal.go:235 +0x15c
github.com/influxdata/telegraf/agent.(*Agent).gatherer(0xc420038070, 0xc42012ce40, 0xc42011e6c0, 0x4a817c800, 0xc42012d0e0)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:121 +0x22b
github.com/influxdata/telegraf/agent.(*Agent).Run.func3(0xc4204d6280, 0xc420038070, 0xc42012ce40, 0xc42012d0e0, 0xc42011e6c0, 0x4a817c800)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:379 +0x7f
created by github.com/influxdata/telegraf/agent.(*Agent).Run
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:380 +0x49b

goroutine 73 [select]:
github.com/influxdata/telegraf/agent.(*Agent).flusher.func1(0xc420202250, 0xc42012ce40, 0xc42007df20, 0xc420038070)
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:257 +0x2fa
created by github.com/influxdata/telegraf/agent.(*Agent).flusher
	/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:286 +0xee

goroutine 199 [select]:
net/http.(*persistConn).writeLoop(0xc420476c00)
	/usr/local/go/src/net/http/transport.go:1646 +0x3bd
created by net/http.(*Transport).dialConn
	/usr/local/go/src/net/http/transport.go:1063 +0x50e

goroutine 198 [IO wait]:
net.runtime_pollWait(0x7f52ef9025a0, 0x72, 0x7)
	/usr/local/go/src/runtime/netpoll.go:160 +0x59
net.(*pollDesc).wait(0xc4201de4c0, 0x72, 0xc4204e79d0, 0xc420012078)
	/usr/local/go/src/net/fd_poll_runtime.go:73 +0x38
net.(*pollDesc).waitRead(0xc4201de4c0, 0x18ee740, 0xc420012078)
	/usr/local/go/src/net/fd_poll_runtime.go:78 +0x34
net.(*netFD).Read(0xc4201de460, 0xc420300000, 0x1000, 0x1000, 0x0, 0x18ee740, 0xc420012078)
	/usr/local/go/src/net/fd_unix.go:243 +0x1a1
net.(*conn).Read(0xc42083c038, 0xc420300000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
	/usr/local/go/src/net/net.go:173 +0x70
net/http.(*persistConn).Read(0xc420476c00, 0xc420300000, 0x1000, 0x1000, 0x30, 0xc4204e7b58, 0x43fcfc)
	/usr/local/go/src/net/http/transport.go:1261 +0x154
bufio.(*Reader).fill(0xc420b5cea0)
	/usr/local/go/src/bufio/bufio.go:97 +0x10c
bufio.(*Reader).Peek(0xc420b5cea0, 0x1, 0x0, 0x1, 0x0, 0xc42100b500, 0x0)
	/usr/local/go/src/bufio/bufio.go:129 +0x62
net/http.(*persistConn).readLoop(0xc420476c00)
	/usr/local/go/src/net/http/transport.go:1418 +0x1a1
created by net/http.(*Transport).dialConn
	/usr/local/go/src/net/http/transport.go:1062 +0x4e9

goroutine 4123 [select]:
github.com/Shopify/sarama.(*brokerConsumer).subscriptionManager(0xc42081e5a0)
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/consumer.go:557 +0x525
github.com/Shopify/sarama.(*brokerConsumer).(github.com/Shopify/sarama.subscriptionManager)-fm()
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/consumer.go:530 +0x2a
github.com/Shopify/sarama.withRecover(0xc420113870)
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/utils.go:46 +0x43
created by github.com/Shopify/sarama.(*consumer).newBrokerConsumer
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/consumer.go:530 +0x1fb

goroutine 4124 [runnable]:
github.com/Shopify/sarama.(*brokerConsumer).abort(0xc42081e5a0, 0x18ecbc0, 0xc4204d67a0)
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/consumer.go:671 +0x167
github.com/Shopify/sarama.(*brokerConsumer).subscriptionConsumer(0xc42081e5a0)
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/consumer.go:594 +0x39f
github.com/Shopify/sarama.(*brokerConsumer).(github.com/Shopify/sarama.subscriptionConsumer)-fm()
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/consumer.go:531 +0x2a
github.com/Shopify/sarama.withRecover(0xc420113880)
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/utils.go:46 +0x43
created by github.com/Shopify/sarama.(*consumer).newBrokerConsumer
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/consumer.go:531 +0x253

goroutine 745 [select]:
github.com/Shopify/sarama.(*partitionConsumer).dispatcher(0xc420ca7200)
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/consumer.go:306 +0x344
github.com/Shopify/sarama.(*partitionConsumer).(github.com/Shopify/sarama.dispatcher)-fm()
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/consumer.go:155 +0x2a
github.com/Shopify/sarama.withRecover(0xc420203840)
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/utils.go:46 +0x43
created by github.com/Shopify/sarama.(*consumer).ConsumePartition
	/home/ubuntu/telegraf-build/src/github.com/Shopify/sarama/consumer.go:155 +0x335

goroutine 732 [select, 15 minutes]:
github.com/wvanbergen/kafka/consumergroup.(*ConsumerGroup).partitionConsumer(0xc420148510, 0xc420113451, 0x8, 0x5, 0xc42007cc00, 0xc42007cc60, 0xc420efd270, 0xc420233740)
	/home/ubuntu/telegraf-build/src/github.com/wvanbergen/kafka/consumergroup/consumer_group.go:456 +0xf2c
created by github.com/wvanbergen/kafka/consumergroup.(*ConsumerGroup).topicConsumer
	/home/ubuntu/telegraf-build/src/github.com/wvanbergen/kafka/consumergroup/consumer_group.go:345 +0x512

rax    0xca
rbx    0x19a3610
rcx    0xffffffffffffffff
rdx    0x0
rdi    0x19a49d0
rsi    0x0
rbp    0x7ffd0b56e8e0
rsp    0x7ffd0b56e898
r8     0x0
r9     0x0
r10    0x0
r11    0x286
r12    0x0
r13    0xc420c21520
r14    0x444ff0
r15    0x1318190
rip    0x464071
rflags 0x286
cs     0x33
fs     0x0
GS     0X0

This looks like a bug, can you open an issue on the Telegraf github page? GitHub - influxdata/telegraf: The plugin-driven server agent for collecting & reporting metrics.

1 Like

Hi Daniel,

There’s one more thing I’d like to investigate. I believe our app is sending some huge (up to 1MB), malformed metrics. I’d like to get rid of those before submitting a bug report. Reproing this problem is proving to be super tough.