InfluxDB out of memory periodically (v1.3.1)

Our InfluxDB instance’s memory usage changes as below:

Every few hours it is killed by oom-killer and auto-restarts. We are using tsi1 after upgrading to v1.3.1.

Can anyone help with this case?

Logs from /var/log/messages:
Aug 3 06:25:03 10 influxd: [I] 2017-08-02T22:25:03Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002883-000000001.tsm (#0) engine=tsm1
Aug 3 06:25:03 10 influxd: [I] 2017-08-02T22:25:03Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002884-000000001.tsm (#1) engine=tsm1
Aug 3 06:25:03 10 influxd: [I] 2017-08-02T22:25:03Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002885-000000001.tsm (#2) engine=tsm1
Aug 3 06:25:04 10 influxd: [I] 2017-08-02T22:25:04Z beginning level 2 compaction of group 0, 2 TSM files engine=tsm1
Aug 3 06:25:04 10 influxd: [I] 2017-08-02T22:25:04Z compacting level 2 group (0) /data1/influxdb/data/hadoop/7days/474/000002880-000000002.tsm (#0) engine=tsm1
Aug 3 06:25:04 10 influxd: [I] 2017-08-02T22:25:04Z compacting level 2 group (0) /data1/influxdb/data/hadoop/7days/474/000002882-000000002.tsm (#1) engine=tsm1
Aug 3 06:25:17 10 influxd: [I] 2017-08-02T22:25:17Z compacted level 3 group (0) into /data1/influxdb/data/hadoop/7days/474/000002874-000000004.tsm.tmp (#0) engine=tsm1
Aug 3 06:25:17 10 influxd: [I] 2017-08-02T22:25:17Z compacted level 3 4 files into 1 files in 2m15.546821968s engine=tsm1
Aug 3 06:25:18 10 influxd: [I] 2017-08-02T22:25:18Z beginning full compaction of group 0, 2 TSM files engine=tsm1
Aug 3 06:25:18 10 influxd: [I] 2017-08-02T22:25:18Z compacting full group (0) /data1/influxdb/data/hadoop/7days/474/000002852-000000005.tsm (#0) engine=tsm1
Aug 3 06:25:18 10 influxd: [I] 2017-08-02T22:25:18Z compacting full group (0) /data1/influxdb/data/hadoop/7days/474/000002874-000000004.tsm (#1) engine=tsm1
Aug 3 06:25:27 10 influxd: [I] 2017-08-02T22:25:27Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 26.454486968s engine=tsm1
Aug 3 06:25:51 10 influxd: [I] 2017-08-02T22:25:51Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 18.07838669s engine=tsm1
Aug 3 06:26:14 10 influxd: [I] 2017-08-02T22:26:14Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 23.367757463s engine=tsm1
Aug 3 06:26:18 10 influxd: [I] 2017-08-02T22:26:18Z compacted level 1 group (0) into /data1/influxdb/data/hadoop/7days/474/000002885-000000002.tsm.tmp (#0) engine=tsm1
Aug 3 06:26:18 10 influxd: [I] 2017-08-02T22:26:18Z compacted level 1 3 files into 1 files in 1m14.490195001s engine=tsm1
Aug 3 06:26:18 10 influxd: [I] 2017-08-02T22:26:18Z beginning level 1 compaction of group 0, 3 TSM files engine=tsm1
Aug 3 06:26:18 10 influxd: [I] 2017-08-02T22:26:18Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002886-000000001.tsm (#0) engine=tsm1
Aug 3 06:26:18 10 influxd: [I] 2017-08-02T22:26:18Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002887-000000001.tsm (#1) engine=tsm1
Aug 3 06:26:18 10 influxd: [I] 2017-08-02T22:26:18Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002888-000000001.tsm (#2) engine=tsm1
Aug 3 06:26:27 10 influxd: [I] 2017-08-02T22:26:27Z compacted level 2 group (0) into /data1/influxdb/data/hadoop/7days/474/000002882-000000003.tsm.tmp (#0) engine=tsm1
Aug 3 06:26:27 10 influxd: [I] 2017-08-02T22:26:27Z compacted level 2 2 files into 1 files in 1m23.34308003s engine=tsm1
Aug 3 06:26:35 10 influxd: [I] 2017-08-02T22:26:35Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 20.827151852s engine=tsm1
Aug 3 06:27:05 10 influxd: [I] 2017-08-02T22:27:05Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 27.097564255s engine=tsm1
Aug 3 06:27:27 10 influxd: [I] 2017-08-02T22:27:27Z compacted level 1 group (0) into /data1/influxdb/data/hadoop/7days/474/000002888-000000002.tsm.tmp (#0) engine=tsm1
Aug 3 06:27:27 10 influxd: [I] 2017-08-02T22:27:27Z compacted level 1 3 files into 1 files in 1m8.899733032s engine=tsm1
Aug 3 06:27:27 10 influxd: [I] 2017-08-02T22:27:27Z beginning level 1 compaction of group 0, 2 TSM files engine=tsm1
Aug 3 06:27:27 10 influxd: [I] 2017-08-02T22:27:27Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002889-000000001.tsm (#0) engine=tsm1
Aug 3 06:27:27 10 influxd: [I] 2017-08-02T22:27:27Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002890-000000001.tsm (#1) engine=tsm1
Aug 3 06:27:27 10 influxd: [I] 2017-08-02T22:27:27Z beginning level 2 compaction of group 0, 2 TSM files engine=tsm1
Aug 3 06:27:27 10 influxd: [I] 2017-08-02T22:27:27Z compacting level 2 group (0) /data1/influxdb/data/hadoop/7days/474/000002885-000000002.tsm (#0) engine=tsm1
Aug 3 06:27:27 10 influxd: [I] 2017-08-02T22:27:27Z compacting level 2 group (0) /data1/influxdb/data/hadoop/7days/474/000002888-000000002.tsm (#1) engine=tsm1
Aug 3 06:27:31 10 influxd: [I] 2017-08-02T22:27:31Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 25.824509577s engine=tsm1
Aug 3 06:28:00 10 influxd: [I] 2017-08-02T22:28:00Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 22.91661226s engine=tsm1
Aug 3 06:28:16 10 influxd: [I] 2017-08-02T22:28:16Z compacted level 1 group (0) into /data1/influxdb/data/hadoop/7days/474/000002890-000000002.tsm.tmp (#0) engine=tsm1
Aug 3 06:28:16 10 influxd: [I] 2017-08-02T22:28:16Z compacted level 1 2 files into 1 files in 49.627325179s engine=tsm1
Aug 3 06:28:16 10 influxd: [I] 2017-08-02T22:28:16Z beginning level 1 compaction of group 0, 2 TSM files engine=tsm1
Aug 3 06:28:16 10 influxd: [I] 2017-08-02T22:28:16Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002891-000000001.tsm (#0) engine=tsm1
Aug 3 06:28:16 10 influxd: [I] 2017-08-02T22:28:16Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002892-000000001.tsm (#1) engine=tsm1
Aug 3 06:28:24 10 influxd: [I] 2017-08-02T22:28:24Z compacted full group (0) into /data1/influxdb/data/hadoop/7days/474/000002874-000000005.tsm.tmp (#0) engine=tsm1
Aug 3 06:28:24 10 influxd: [I] 2017-08-02T22:28:24Z compacted full 2 files into 1 files in 3m5.972340392s engine=tsm1
Aug 3 06:28:28 10 influxd: [I] 2017-08-02T22:28:28Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 27.689197983s engine=tsm1
Aug 3 06:28:53 10 influxd: [I] 2017-08-02T22:28:53Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 21.83700428s engine=tsm1
Aug 3 06:28:54 10 influxd: [I] 2017-08-02T22:28:54Z compacted level 2 group (0) into /data1/influxdb/data/hadoop/7days/474/000002888-000000003.tsm.tmp (#0) engine=tsm1
Aug 3 06:28:54 10 influxd: [I] 2017-08-02T22:28:54Z compacted level 2 2 files into 1 files in 1m27.530262442s engine=tsm1
Aug 3 06:29:10 10 influxd: [I] 2017-08-02T22:29:10Z compacted level 1 group (0) into /data1/influxdb/data/hadoop/7days/474/000002892-000000002.tsm.tmp (#0) engine=tsm1
Aug 3 06:29:10 10 influxd: [I] 2017-08-02T22:29:10Z compacted level 1 2 files into 1 files in 53.619477231s engine=tsm1
Aug 3 06:29:10 10 influxd: [I] 2017-08-02T22:29:10Z beginning level 2 compaction of group 0, 2 TSM files engine=tsm1
Aug 3 06:29:10 10 influxd: [I] 2017-08-02T22:29:10Z compacting level 2 group (0) /data1/influxdb/data/hadoop/7days/474/000002890-000000002.tsm (#0) engine=tsm1
Aug 3 06:29:10 10 influxd: [I] 2017-08-02T22:29:10Z compacting level 2 group (0) /data1/influxdb/data/hadoop/7days/474/000002892-000000002.tsm (#1) engine=tsm1
Aug 3 06:29:10 10 influxd: [I] 2017-08-02T22:29:10Z beginning level 1 compaction of group 0, 2 TSM files engine=tsm1
Aug 3 06:29:10 10 influxd: [I] 2017-08-02T22:29:10Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002893-000000001.tsm (#0) engine=tsm1
Aug 3 06:29:10 10 influxd: [I] 2017-08-02T22:29:10Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002894-000000001.tsm (#1) engine=tsm1
Aug 3 06:29:18 10 influxd: [I] 2017-08-02T22:29:18Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 24.839677641s engine=tsm1
Aug 3 06:29:38 10 influxd: [I] 2017-08-02T22:29:38Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 20.75602859s engine=tsm1
Aug 3 06:29:58 10 influxd: [I] 2017-08-02T22:29:58Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 17.045523299s engine=tsm1
Aug 3 06:30:11 10 influxd: [I] 2017-08-02T22:30:11Z compacted level 1 group (0) into /data1/influxdb/data/hadoop/7days/474/000002894-000000002.tsm.tmp (#0) engine=tsm1
Aug 3 06:30:11 10 influxd: [I] 2017-08-02T22:30:11Z compacted level 1 2 files into 1 files in 1m1.429370658s engine=tsm1
Aug 3 06:30:11 10 influxd: [I] 2017-08-02T22:30:11Z beginning level 1 compaction of group 0, 3 TSM files engine=tsm1
Aug 3 06:30:11 10 influxd: [I] 2017-08-02T22:30:11Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002895-000000001.tsm (#0) engine=tsm1
Aug 3 06:30:11 10 influxd: [I] 2017-08-02T22:30:11Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002896-000000001.tsm (#1) engine=tsm1
Aug 3 06:30:11 10 influxd: [I] 2017-08-02T22:30:11Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002897-000000001.tsm (#2) engine=tsm1
Aug 3 06:30:22 10 influxd: [I] 2017-08-02T22:30:22Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 23.633717764s engine=tsm1
Aug 3 06:30:26 10 influxd: [I] 2017-08-02T22:30:26Z compacted level 2 group (0) into /data1/influxdb/data/hadoop/7days/474/000002892-000000003.tsm.tmp (#0) engine=tsm1
Aug 3 06:30:26 10 influxd: [I] 2017-08-02T22:30:26Z compacted level 2 2 files into 1 files in 1m16.183772865s engine=tsm1
Aug 3 06:30:27 10 influxd: [I] 2017-08-02T22:30:27Z beginning level 3 compaction of group 0, 4 TSM files engine=tsm1
Aug 3 06:30:27 10 influxd: [I] 2017-08-02T22:30:27Z compacting level 3 group (0) /data1/influxdb/data/hadoop/7days/474/000002878-000000003.tsm (#0) engine=tsm1
Aug 3 06:30:27 10 influxd: [I] 2017-08-02T22:30:27Z compacting level 3 group (0) /data1/influxdb/data/hadoop/7days/474/000002882-000000003.tsm (#1) engine=tsm1
Aug 3 06:30:27 10 influxd: [I] 2017-08-02T22:30:27Z compacting level 3 group (0) /data1/influxdb/data/hadoop/7days/474/000002888-000000003.tsm (#2) engine=tsm1
Aug 3 06:30:27 10 influxd: [I] 2017-08-02T22:30:27Z compacting level 3 group (0) /data1/influxdb/data/hadoop/7days/474/000002892-000000003.tsm (#3) engine=tsm1
Aug 3 06:30:47 10 influxd: [I] 2017-08-02T22:30:47Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 18.526651204s engine=tsm1
Aug 3 06:31:07 10 influxd: [E] 2017-08-02T22:31:07Z log file compacted index=tsi token=73034e id=9 elapsed=1.647750896s bytes=13269360 kb_per_sec=7864
Aug 3 06:31:07 10 influxd: [I] 2017-08-02T22:31:07Z performing full compaction index=tsi token=8108e1 src=9,6 dst=/data1/influxdb/data/hadoop_usage/7days/476/index/L2-00000011.tsi
Aug 3 06:31:11 10 influxd: [I] 2017-08-02T22:31:11Z full compaction complete index=tsi token=8108e1 path=/data1/influxdb/data/hadoop_usage/7days/476/index/L2-00000011.tsi elapsed=3.795352537s bytes=22800753 kb_per_sec=5866
Aug 3 06:31:11 10 influxd: [I] 2017-08-02T22:31:11Z removing index file index=tsi token=8108e1 path=/data1/influxdb/data/hadoop_usage/7days/476/index/L1-00000009.tsi
Aug 3 06:31:11 10 influxd: [I] 2017-08-02T22:31:11Z removing index file index=tsi token=8108e1 path=/data1/influxdb/data/hadoop_usage/7days/476/index/L1-00000006.tsi
Aug 3 06:31:12 10 influxd: [I] 2017-08-02T22:31:12Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 24.992214016s engine=tsm1
Aug 3 06:31:22 10 influxd: [I] 2017-08-02T22:31:22Z compacted level 1 group (0) into /data1/influxdb/data/hadoop/7days/474/000002897-000000002.tsm.tmp (#0) engine=tsm1
Aug 3 06:31:22 10 influxd: [I] 2017-08-02T22:31:22Z compacted level 1 3 files into 1 files in 1m10.805258402s engine=tsm1
Aug 3 06:31:22 10 influxd: [I] 2017-08-02T22:31:22Z beginning level 1 compaction of group 0, 3 TSM files engine=tsm1
Aug 3 06:31:22 10 influxd: [I] 2017-08-02T22:31:22Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002898-000000001.tsm (#0) engine=tsm1
Aug 3 06:31:22 10 influxd: [I] 2017-08-02T22:31:22Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002899-000000001.tsm (#1) engine=tsm1
Aug 3 06:31:22 10 influxd: [I] 2017-08-02T22:31:22Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002900-000000001.tsm (#2) engine=tsm1
Aug 3 06:31:23 10 influxd: [I] 2017-08-02T22:31:23Z beginning level 2 compaction of group 0, 2 TSM files engine=tsm1
Aug 3 06:31:23 10 influxd: [I] 2017-08-02T22:31:23Z compacting level 2 group (0) /data1/influxdb/data/hadoop/7days/474/000002894-000000002.tsm (#0) engine=tsm1
Aug 3 06:31:23 10 influxd: [I] 2017-08-02T22:31:23Z compacting level 2 group (0) /data1/influxdb/data/hadoop/7days/474/000002897-000000002.tsm (#1) engine=tsm1
Aug 3 06:31:35 10 influxd: [I] 2017-08-02T22:31:35Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 22.663638409s engine=tsm1
Aug 3 06:32:42 10 influxd: [I] 2017-08-02T22:32:42Z failed to store statistics: timeout service=monitor
Aug 3 06:33:49 10 influxd: [I] 2017-08-02T22:33:49Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 2m11.552437857s engine=tsm1
Aug 3 06:35:31 10 influxd: [I] 2017-08-02T22:35:29Z failed to store statistics: timeout service=monitor
Aug 3 06:36:42 10 influxd: [I] 2017-08-02T22:36:42Z failed to store statistics: timeout service=monitor
Aug 3 06:37:23 10 influxd: [I] 2017-08-02T22:37:23Z retention policy shard deletion check commencing service=retention
Aug 3 06:39:26 10 influxd: [I] 2017-08-02T22:39:26Z failed to store statistics: timeout service=monitor
Aug 3 06:40:06 10 influxd: [I] 2017-08-02T22:40:06Z compacted level 1 group (0) into /data1/influxdb/data/hadoop/7days/474/000002900-000000002.tsm.tmp (#0) engine=tsm1
Aug 3 06:40:06 10 influxd: [I] 2017-08-02T22:40:06Z compacted level 1 3 files into 1 files in 8m43.771512292s engine=tsm1
Aug 3 06:40:06 10 influxd: [I] 2017-08-02T22:40:06Z beginning level 1 compaction of group 0, 2 TSM files engine=tsm1
Aug 3 06:40:06 10 influxd: [I] 2017-08-02T22:40:06Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002901-000000001.tsm (#0) engine=tsm1
Aug 3 06:40:06 10 influxd: [I] 2017-08-02T22:40:06Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002902-000000001.tsm (#1) engine=tsm1
Aug 3 06:40:08 10 influxd: [I] 2017-08-02T22:40:08Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 6m17.816718703s engine=tsm1
Aug 3 06:40:15 10 influxd: [I] 2017-08-02T22:40:15Z compacted level 3 group (0) into /data1/influxdb/data/hadoop/7days/474/000002892-000000004.tsm.tmp (#0) engine=tsm1
Aug 3 06:40:15 10 influxd: [I] 2017-08-02T22:40:15Z compacted level 3 4 files into 1 files in 9m48.499340287s engine=tsm1
Aug 3 06:40:27 10 influxd: [I] 2017-08-02T22:40:27Z compacted level 2 group (0) into /data1/influxdb/data/hadoop/7days/474/000002897-000000003.tsm.tmp (#0) engine=tsm1
Aug 3 06:40:27 10 influxd: [I] 2017-08-02T22:40:27Z compacted level 2 2 files into 1 files in 9m3.631680101s engine=tsm1
Aug 3 06:40:27 10 influxd: [I] 2017-08-02T22:40:27Z beginning full compaction of group 0, 2 TSM files engine=tsm1
Aug 3 06:40:27 10 influxd: [I] 2017-08-02T22:40:27Z compacting full group (0) /data1/influxdb/data/hadoop/7days/474/000002874-000000005.tsm (#0) engine=tsm1
Aug 3 06:40:27 10 influxd: [I] 2017-08-02T22:40:27Z compacting full group (0) /data1/influxdb/data/hadoop/7days/474/000002892-000000004.tsm (#1) engine=tsm1
Aug 3 06:41:30 10 influxd: [I] 2017-08-02T22:41:30Z failed to store statistics: timeout service=monitor
Aug 3 06:43:25 10 influxd: [I] 2017-08-02T22:43:25Z failed to store statistics: timeout service=monitor
Aug 3 06:44:05 10 influxd: [I] 2017-08-02T22:44:05Z failed to store statistics: timeout service=monitor
Aug 3 06:44:41 10 influxd: [I] 2017-08-02T22:44:41Z compacted level 1 group (0) into /data1/influxdb/data/hadoop/7days/474/000002902-000000002.tsm.tmp (#0) engine=tsm1
Aug 3 06:44:41 10 influxd: [I] 2017-08-02T22:44:41Z compacted level 1 2 files into 1 files in 4m34.999695843s engine=tsm1
Aug 3 06:44:42 10 influxd: [I] 2017-08-02T22:44:42Z beginning level 1 compaction of group 0, 2 TSM files engine=tsm1
Aug 3 06:44:42 10 influxd: [I] 2017-08-02T22:44:42Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002903-000000001.tsm (#0) engine=tsm1
Aug 3 06:44:42 10 influxd: [I] 2017-08-02T22:44:42Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002904-000000001.tsm (#1) engine=tsm1
Aug 3 06:44:42 10 influxd: [I] 2017-08-02T22:44:42Z beginning level 2 compaction of group 0, 2 TSM files engine=tsm1
Aug 3 06:44:42 10 influxd: [I] 2017-08-02T22:44:42Z compacting level 2 group (0) /data1/influxdb/data/hadoop/7days/474/000002900-000000002.tsm (#0) engine=tsm1
Aug 3 06:44:42 10 influxd: [I] 2017-08-02T22:44:42Z compacting level 2 group (0) /data1/influxdb/data/hadoop/7days/474/000002902-000000002.tsm (#1) engine=tsm1
Aug 3 06:44:43 10 influxd: [I] 2017-08-02T22:44:43Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 4m35.595204528s engine=tsm1
Aug 3 06:45:43 10 influxd: [I] 2017-08-02T22:45:43Z failed to store statistics: timeout service=monitor
Aug 3 06:48:00 10 influxd: [I] 2017-08-02T22:48:00Z failed to store statistics: timeout service=monitor
Aug 3 06:48:41 10 influxd: [I] 2017-08-02T22:48:41Z failed to store statistics: timeout service=monitor
Aug 3 06:49:00 10 influxd: [I] 2017-08-02T22:49:00Z failed to store statistics: timeout service=monitor
Aug 3 06:49:11 10 influxd: [I] 2017-08-02T22:49:11Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 4m27.323984582s engine=tsm1
Aug 3 06:49:42 10 influxd: [I] 2017-08-02T22:49:42Z compacted level 2 group (0) into /data1/influxdb/data/hadoop/7days/474/000002902-000000003.tsm.tmp (#0) engine=tsm1
Aug 3 06:49:42 10 influxd: [I] 2017-08-02T22:49:42Z compacted level 2 2 files into 1 files in 5m0.130144027s engine=tsm1
Aug 3 06:51:25 10 influxd: [I] 2017-08-02T22:51:25Z compacted level 1 group (0) into /data1/influxdb/data/hadoop/7days/474/000002904-000000002.tsm.tmp (#0) engine=tsm1
Aug 3 06:51:25 10 influxd: [I] 2017-08-02T22:51:25Z compacted level 1 2 files into 1 files in 6m42.758667654s engine=tsm1
Aug 3 06:51:25 10 influxd: [I] 2017-08-02T22:51:25Z beginning level 1 compaction of group 0, 2 TSM files engine=tsm1
Aug 3 06:51:25 10 influxd: [I] 2017-08-02T22:51:25Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002905-000000001.tsm (#0) engine=tsm1
Aug 3 06:51:25 10 influxd: [I] 2017-08-02T22:51:25Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002906-000000001.tsm (#1) engine=tsm1
Aug 3 06:51:26 10 influxd: [I] 2017-08-02T22:51:26Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 2m15.434643895s engine=tsm1
Aug 3 06:51:57 10 influxd: [I] 2017-08-02T22:51:57Z compacted full group (0) into /data1/influxdb/data/hadoop/7days/474/000002892-000000005.tsm.tmp (#0) engine=tsm1
Aug 3 06:51:57 10 influxd: [I] 2017-08-02T22:51:57Z compacted full 2 files into 1 files in 11m30.395155191s engine=tsm1
Aug 3 06:52:25 10 influxd: [I] 2017-08-02T22:52:25Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 58.866994161s engine=tsm1
Aug 3 06:53:44 10 influxd: [I] 2017-08-02T22:53:44Z Post http://localhost:9092/write?consistency=&db=hadoop&precision=ns&rp=7days: net/http: request canceled (Client.Timeout exceeded while awaiting headers) service=subscriber
Aug 3 06:53:46 10 influxd: [I] 2017-08-02T22:53:46Z failed to store statistics: timeout service=monitor
Aug 3 06:55:47 10 influxd: [I] 2017-08-02T22:55:47Z compacted level 1 group (0) into /data1/influxdb/data/hadoop/7days/474/000002906-000000002.tsm.tmp (#0) engine=tsm1
Aug 3 06:55:47 10 influxd: [I] 2017-08-02T22:55:47Z compacted level 1 2 files into 1 files in 4m22.306089127s engine=tsm1
Aug 3 06:55:48 10 influxd: [I] 2017-08-02T22:55:48Z beginning level 2 compaction of group 0, 2 TSM files engine=tsm1
Aug 3 06:55:48 10 influxd: [I] 2017-08-02T22:55:48Z compacting level 2 group (0) /data1/influxdb/data/hadoop/7days/474/000002904-000000002.tsm (#0) engine=tsm1
Aug 3 06:55:48 10 influxd: [I] 2017-08-02T22:55:48Z compacting level 2 group (0) /data1/influxdb/data/hadoop/7days/474/000002906-000000002.tsm (#1) engine=tsm1
Aug 3 06:55:52 10 influxd: [I] 2017-08-02T22:55:52Z beginning level 1 compaction of group 0, 2 TSM files engine=tsm1
Aug 3 06:55:52 10 influxd: [I] 2017-08-02T22:55:52Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002907-000000001.tsm (#0) engine=tsm1
Aug 3 06:55:52 10 influxd: [I] 2017-08-02T22:55:52Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002908-000000001.tsm (#1) engine=tsm1
Aug 3 06:55:52 10 influxd: [I] 2017-08-02T22:55:52Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 3m27.561628062s engine=tsm1
Aug 3 06:57:42 10 influxd: [I] 2017-08-02T22:57:42Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 1m49.531337137s engine=tsm1
Aug 3 06:58:46 10 influxd: [I] 2017-08-02T22:58:46Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 1m3.704523534s engine=tsm1
Aug 3 06:59:16 10 influxd: [I] 2017-08-02T22:59:16Z compacted level 1 group (0) into /data1/influxdb/data/hadoop/7days/474/000002908-000000002.tsm.tmp (#0) engine=tsm1
Aug 3 06:59:16 10 influxd: [I] 2017-08-02T22:59:16Z compacted level 1 2 files into 1 files in 3m24.177040376s engine=tsm1
Aug 3 06:59:16 10 influxd: [I] 2017-08-02T22:59:16Z beginning level 1 compaction of group 0, 2 TSM files engine=tsm1
Aug 3 06:59:16 10 influxd: [I] 2017-08-02T22:59:16Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002909-000000001.tsm (#0) engine=tsm1
Aug 3 06:59:16 10 influxd: [I] 2017-08-02T22:59:16Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002910-000000001.tsm (#1) engine=tsm1
Aug 3 07:00:53 10 influxd: [I] 2017-08-02T23:00:53Z Snapshot for path /data1/influxdb/data/_internal/monitor/473 written in 19.331262821s engine=tsm1
Aug 3 07:00:53 10 influxd: [I] 2017-08-02T23:00:53Z beginning level 1 compaction of group 0, 2 TSM files engine=tsm1
Aug 3 07:00:53 10 influxd: [I] 2017-08-02T23:00:53Z compacting level 1 group (0) /data1/influxdb/data/_internal/monitor/473/000000028-000000001.tsm (#0) engine=tsm1
Aug 3 07:00:53 10 influxd: [I] 2017-08-02T23:00:53Z compacting level 1 group (0) /data1/influxdb/data/_internal/monitor/473/000000029-000000001.tsm (#1) engine=tsm1
Aug 3 07:05:08 10 influxd: [I] 2017-08-02T23:05:08Z Post http://localhost:9092/write?consistency=&db=hadoop&precision=ns&rp=7days: net/http: request canceled (Client.Timeout exceeded while awaiting headers) service=subscriber
Aug 3 07:06:04 10 influxd: [I] 2017-08-02T23:06:04Z compacted level 1 group (0) into /data1/influxdb/data/_internal/monitor/473/000000029-000000002.tsm.tmp (#0) engine=tsm1
Aug 3 07:06:04 10 influxd: [I] 2017-08-02T23:06:04Z compacted level 1 2 files into 1 files in 5m11.222248893s engine=tsm1
Aug 3 07:07:22 10 influxd: [I] 2017-08-02T23:07:22Z retention policy shard deletion check commencing service=retention
Aug 3 07:07:33 10 influxd: [I] 2017-08-02T23:07:33Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 8m47.508388594s engine=tsm1
Aug 3 07:07:40 10 influxd: [I] 2017-08-02T23:07:40Z compacted level 2 group (0) into /data1/influxdb/data/hadoop/7days/474/000002906-000000003.tsm.tmp (#0) engine=tsm1
Aug 3 07:07:40 10 influxd: [I] 2017-08-02T23:07:40Z compacted level 2 2 files into 1 files in 11m52.052166769s engine=tsm1
Aug 3 07:08:54 10 influxd: [I] 2017-08-02T23:08:54Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 1m21.020987199s engine=tsm1
Aug 3 07:09:53 10 influxd: [I] 2017-08-02T23:09:53Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 59.266352434s engine=tsm1
Aug 3 07:10:57 10 influxd: [I] 2017-08-02T23:10:57Z Post http://localhost:9092/write?consistency=&db=hadoop&precision=ns&rp=7days: net/http: request canceled (Client.Timeout exceeded while awaiting headers) service=subscriber
Aug 3 07:10:58 10 influxd: [I] 2017-08-02T23:10:58Z Post http://localhost:9092/write?consistency=&db=hadoop&precision=ns&rp=7days: net/http: request canceled (Client.Timeout exceeded while awaiting headers) service=subscriber
Aug 3 07:10:58 10 influxd: [I] 2017-08-02T23:10:58Z Post http://localhost:9092/write?consistency=&db=hadoop&precision=ns&rp=7days: net/http: request canceled (Client.Timeout exceeded while awaiting headers) service=subscriber
Aug 3 07:10:58 10 influxd: [I] 2017-08-02T23:10:58Z Post http://localhost:9092/write?consistency=&db=hadoop&precision=ns&rp=7days: net/http: request canceled (Client.Timeout exceeded while awaiting headers) service=subscriber
Aug 3 07:10:58 10 influxd: [I] 2017-08-02T23:10:58Z Post http://localhost:9092/write?consistency=&db=hadoop&precision=ns&rp=7days: net/http: request canceled (Client.Timeout exceeded while awaiting headers) service=subscriber
Aug 3 07:10:59 10 influxd: [I] 2017-08-02T23:10:59Z Post http://localhost:9092/write?consistency=&db=hadoop&precision=ns&rp=7days: net/http: request canceled (Client.Timeout exceeded while awaiting headers) service=subscriber
Aug 3 07:10:59 10 influxd: [I] 2017-08-02T23:10:59Z Post http://localhost:9092/write?consistency=&db=hadoop&precision=ns&rp=7days: net/http: request canceled (Client.Timeout exceeded while awaiting headers) service=subscriber
Aug 3 07:10:59 10 influxd: [I] 2017-08-02T23:10:59Z Post http://localhost:9092/write?consistency=&db=hadoop&precision=ns&rp=7days: net/http: request canceled (Client.Timeout exceeded while awaiting headers) service=subscriber
Aug 3 07:10:59 10 influxd: [I] 2017-08-02T23:10:59Z Post http://localhost:9092/write?consistency=&db=hadoop&precision=ns&rp=7days: net/http: request canceled (Client.Timeout exceeded while awaiting headers) service=subscriber
Aug 3 07:10:59 10 influxd: [I] 2017-08-02T23:10:59Z Post http://localhost:9092/write?consistency=&db=hadoop&precision=ns&rp=7days: net/http: request canceled (Client.Timeout exceeded while awaiting headers) service=subscriber
Aug 3 07:10:59 10 influxd: [I] 2017-08-02T23:10:59Z Post http://localhost:9092/write?consistency=&db=hadoop&precision=ns&rp=7days: net/http: request canceled (Client.Timeout exceeded while awaiting headers) service=subscriber
Aug 3 07:10:59 10 influxd: [I] 2017-08-02T23:10:59Z Post http://localhost:9092/write?consistency=&db=hadoop&precision=ns&rp=7days: net/http: request canceled (Client.Timeout exceeded while awaiting headers) service=subscriber
Aug 3 07:11:09 10 influxd: [I] 2017-08-02T23:11:09Z Post http://localhost:9092/write?consistency=&db=hadoop_usage&precision=ns&rp=7days: net/http: request canceled (Client.Timeout exceeded while awaiting headers) service=subscriber
Aug 3 07:11:09 10 influxd: [I] 2017-08-02T23:11:09Z Post http://localhost:9092/write?consistency=&db=hadoop_usage&precision=ns&rp=7days: net/http: request canceled (Client.Timeout exceeded while awaiting headers) service=subscriber
Aug 3 07:11:09 10 influxd: [I] 2017-08-02T23:11:09Z Post http://localhost:9092/write?consistency=&db=hadoop_usage&precision=ns&rp=7days: net/http: request canceled (Client.Timeout exceeded while awaiting headers) service=subscriber
Aug 3 07:11:09 10 influxd: [I] 2017-08-02T23:11:09Z Post http://localhost:9092/write?consistency=&db=hadoop_usage&precision=ns&rp=7days: net/http: request canceled (Client.Timeout exceeded while awaiting headers) service=subscriber
Aug 3 07:11:09 10 influxd: [I] 2017-08-02T23:11:09Z Post http://localhost:9092/write?consistency=&db=hadoop_usage&precision=ns&rp=7days: net/http: request canceled (Client.Timeout exceeded while awaiting headers) service=subscriber
Aug 3 07:11:09 10 influxd: [I] 2017-08-02T23:11:09Z Post http://localhost:9092/write?consistency=&db=hadoop_usage&precision=ns&rp=7days: net/http: request canceled (Client.Timeout exceeded while awaiting headers) service=subscriber
Aug 3 07:11:09 10 influxd: [I] 2017-08-02T23:11:09Z Post http://localhost:9092/write?consistency=&db=hadoop_usage&precision=ns&rp=7days: net/http: request canceled (Client.Timeout exceeded while awaiting headers) service=subscriber
Aug 3 07:14:06 10 influxd: [I] 2017-08-02T23:14:06Z compacted level 1 group (0) into /data1/influxdb/data/hadoop/7days/474/000002910-000000002.tsm.tmp (#0) engine=tsm1
Aug 3 07:14:06 10 influxd: [I] 2017-08-02T23:14:06Z compacted level 1 2 files into 1 files in 14m50.331781547s engine=tsm1
Aug 3 07:14:06 10 influxd: [I] 2017-08-02T23:14:06Z beginning level 1 compaction of group 0, 3 TSM files engine=tsm1
Aug 3 07:14:06 10 influxd: [I] 2017-08-02T23:14:06Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002911-000000001.tsm (#0) engine=tsm1
Aug 3 07:14:06 10 influxd: [I] 2017-08-02T23:14:06Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002912-000000001.tsm (#1) engine=tsm1
Aug 3 07:14:06 10 influxd: [I] 2017-08-02T23:14:06Z compacting level 1 group (0) /data1/influxdb/data/hadoop/7days/474/000002913-000000001.tsm (#2) engine=tsm1
Aug 3 07:14:07 10 influxd: [I] 2017-08-02T23:14:07Z beginning level 2 compaction of group 0, 2 TSM files engine=tsm1
Aug 3 07:14:07 10 influxd: [I] 2017-08-02T23:14:07Z compacting level 2 group (0) /data1/influxdb/data/hadoop/7days/474/000002908-000000002.tsm (#0) engine=tsm1
Aug 3 07:14:07 10 influxd: [I] 2017-08-02T23:14:07Z compacting level 2 group (0) /data1/influxdb/data/hadoop/7days/474/000002910-000000002.tsm (#1) engine=tsm1
Aug 3 07:14:12 10 influxd: [I] 2017-08-02T23:14:12Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 4m18.813129883s engine=tsm1
Aug 3 07:15:15 10 influxd: [I] 2017-08-02T23:15:15Z Snapshot for path /data1/influxdb/data/hadoop/7days/474 written in 1m2.228614842s engine=tsm1
Aug 3 07:15:28 10 kernel: influxd invoked oom-killer: gfp_mask=0x42d0, order=3, oom_score_adj=0
Aug 3 07:15:28 10 kernel: influxd cpuset=/ mems_allowed=0-1

I’m having similar behavior (ever since I originally installed it early this year - at version 1.1 or 1.2?).

In my case we’re talking about much smaller ranges, in the range of typical use of, say, 4 GBs, and having those swings of using, ex., 8GB all of the sudden.

But the pattern is mostly the same as that - every x minutes, kind of regularly like in that graphic, there’s swings in RAM used by around 50~100+% increased usage.

Additional information: in our case, we have some continuous queries running every hour and every 12 hours
cq_1h CREATE CONTINUOUS QUERY cq_1h ON hadoop BEGIN SELECT mean() INTO hadoop.“30days”.:MEASUREMENT FROM hadoop.“7days”././ GROUP BY time(1h), * END
cq_12h CREATE CONTINUOUS QUERY cq_12h ON hadoop BEGIN SELECT mean() INTO hadoop.infinite.:MEASUREMENT FROM hadoop.“30days”././ GROUP BY time(12h), * END

This isn’t a new problem - here’s another post that’s much the same: High memory usage problem - #4 by coofercat

In that post, it suggests using the latest versions which can put some indexing information to disk rather than in ram (assuming you have SSD disks, this might work quite well). I haven’t yet tried that, although it’s ‘in the plan’.

The other suggestion is to downsample data fairly aggressively. This means less points have to be held in ram, so saves you memory. It seems reasonable enough, although I’m not sure how it would cope with years of old data (even if downsampled). I had a look at this, but couldn’t really make it work without using Kapacitor (which I haven’t got to installing yet).

I’m also in the process of migrating the data to “tsi1” supported option. This option is meant to store the “Indexes” on the disk rather than in the memory. So far in my test environment with 250 million data points, I’ve seen negligible memory usage.

However it will be worth noting that this option only help with Data ingestion into the database not with the queried data. So if you have an expensive query, the memory usage will temporarily increase. But with the combination of “Retention Policies” and “Continuous Queries” this can be addressed better.

Just had yet another plummeting of free RAM, accompanied with a surge of CPU and storage IOPS.
The CPU started climbing before it ran out of RAM.
It had a huge growth of Allocated GO Runtime space, as in the previous times.

Is there anything I can look at to investigate what might be causing this?
(an “amount of series” hypothesis does not seem believable to me, since I’ve had no noticeable shift let alone any massive shift - and I have less than 200K series total anyway)

Do you log the queries or if you store logs under /var/log/messages you should be able to view the related query or CQ which triggered this usage. If it is something related to the queries