Memory leak in InfluxDB 1.7.4?

jered · March 5, 2019, 7:17pm

OS: Debian 9.8 (stretch)
Version: InfluxDB 1.7.4

Since I updated from 1.7.3 to 1.7.4, I’ve had runaway memory consumption from the influxdb process, steadily growing to consume all available memory until restarted. This seems to be new behavior, but I don’t see any other threads discussing this. See annotated screenshot from my system monitoring below. (Restarts free up all locked memory – the chart doesn’t show this because of datapoint decimation.)

Data source is Icinga 2 performance data.

Any similar observations, or pointers on options to debug?

influx

jered · March 10, 2019, 2:11pm

Anyone have thoughts here? I’ve taken to restarting influxdb hourly from cron, which is hardly a fix.

I’d downgrade to 1.7.3, but the release notes indicate that version has a high likelyhood of losing data. I’m not sure if it’s safe or possible to roll back to 1.7.2.

MarcV · March 10, 2019, 11:42pm

Sorry , no idea , as you said so far no other threads discussing this.

jered · March 12, 2019, 4:17am

Thanks Marc. Someone else commented and then deleted with a similar use case; I don’t know if they resolved their problem. I wonder if an Icinga2 update (concurrent with the InfluxDB update) is now submitting data in a way that is problematic, but I can’t come up with a likely scenario.

Esity · March 14, 2019, 7:46pm

What index type are you using? Depending on the data structure, it could be using to much memory.

If you are using inmem for your index type, you should switch to TSI and check the results.

jered · March 14, 2019, 10:46pm

I’ve been serially updating since 1.0.2 (or possibly earler), so TSI did not yet exist. I’ll switch and see if that helps, but I think this is still indicative of a 1.7.4 bug because my dataset is time-limited.) Thanks for the suggestion, though – I’m sure it will help!

jered · March 15, 2019, 10:09pm

This did not resolve my issue, unfortunately.

It appears that both influxd and icinga2 are growing in size over time, so it’s possible it is an interaction between versions of both applications. Will continue to attempt to categorize.

jered · March 17, 2019, 3:53pm

Update: Switching to TSI has made the memory leak rate worse. I now have to restart influx every 6 hours to avoid taking down the machine. Any sort of trace guidance would be helpful…

Esity · March 17, 2019, 9:57pm

Can you tell use some metrics? Number of series, number of measurements, number of databases, cardinality, size of data stored, etc

jered · March 18, 2019, 1:07pm

> show databases
name: databases
name
----
_internal
icinga2
vdo

> show retention policies on _internal
name    duration shardGroupDuration replicaN default
----    -------- ------------------ -------- -------
monitor 168h0m0s 24h0m0s            1        true

> show series cardinality on _internal
cardinality estimation
----------------------
1494

> show retention policies on icinga2
name    duration shardGroupDuration replicaN default
----    -------- ------------------ -------- -------
autogen 672h0m0s 24h0m0s            1        true

> show series cardinality on icinga2
cardinality estimation
----------------------
745

> show retention policies on vdo
name    duration shardGroupDuration replicaN default
----    -------- ------------------ -------- -------
autogen 0s       168h0m0s           1        true

> show series cardinality on vdo
cardinality estimation
----------------------
12

I’m not clear on how to get number of measurements from InfluxDB: it looks to not be possible from the query language? Here’s the on-disk size:

# du -sk * /var/lib/influxdb/data/*
264996	icinga2
72712	_internal
239512	vdo

So, overall, nothing particularly large.

Esity · March 18, 2019, 2:08pm

Are there other things running on this VM?

Also, what do the logs say? What is killing Influx? OOM Killer?

jered · March 18, 2019, 5:30pm

Nothing else using an unusual amount of memory. The largest is Icinga2. Beyond that there’s small things to support monitoring and visualization: mysql (for icinga’s config), apache, saslauthd, postfix, grafana.

The culprit is clearly influxd – it grows slowly from about 6% of memory to 60-70%. I haven’t let the OOM-killer get it yet because I get alerting on low memory, and I’ve added a cron job that just restarts influxd every 6 hours (which has solved the problem for very low values of “solved”).

Neither influx or icinga logs say much interesting. Influx is doing detailed httpd logging now – I’ll turn that off.

A few years ago I ran into a problem where a version of Icinga2 wouldn’t reconnect after losing an SSL connection to influxd, queue up data, and eventually blow up. That’s not the case here – influxd is the big process, not icinga2.

jered · March 19, 2019, 3:51pm

sigh This may yet be Icinga2 related. They just released 2.10.4 today with a changelog entry of “Fix TLS connections in Influxdb/Elasticsearch features leaking file descriptors (#6989 #7018 ref/IP/12219)”. I’ll report back if this resolves problems.

Topic		Replies	Views
CPU / Memory "explodes" with InfluxDB 1.8.1 -> unusable on Debian Stretch	2	819	August 26, 2020
What's new in Influxdb 1.7.9?	3	729	November 2, 2019
InfluxDB 2.0.4 memory usage InfluxDB 2 influxdb	2	1516	May 24, 2024
Disk usage grows rapidly after a few days, cleans up with restart	6	1072	October 9, 2020
[InfluxDB 1.8] Out of memory every 45-60 days Store influxdb	1	1168	December 23, 2020

Memory leak in InfluxDB 1.7.4?

Related topics