InfluxDB Python Query Memory Leak?

austinsteamboat · September 26, 2018, 11:35pm

Hello,

I’m relatively new to InfluxDB and I’m running into some issues querying my database from python. Some background on my software versions/OS:

influxdb v. 1.6.3
python2.7
python-influxdb v. 5.2.0
Ubuntu 18.04

The measurement I’m working with has ~300M points of robotics data (IMU, GPS, commands etc., ~100 field values in each point). The data is sparse over time in that we only get a new log maybe once a week, but it’s very dense in a log. There’s around 3000 series in the data and using the default shard duration, I end up with about 70 shards each containing ~4-5M points (~100k points/series). If something seems very wrong with this schema, please let me know how I might improve it, but onto the issue.

I query this measurement with the basic InfluxDBClient query call. For example I might query for all the GPS data from a specific log name tag. This result would return maybe ~100k points and the variables in python might take up ~10 MB, but I’ve noticed that after a query call I’m left with 100MB’s or 1GB’s of memory used by python. I’m able to check what variables are present with a whos command and after systematically deleting every variable and imported module, I’m still left with a massive amount of memory use until I stop the python kernel and restart it. I’ve tried garbage collection calls as well in python, but these don’t help either.

I initially started with a smaller measurement size and query’s were fast and didn’t take up too much RAM. When I increased my measurement from my test size up to the full size (~3M points to ~300M points), I’ve noticed that the RAM required to run even basic queries is huge. In fact even basic query’s over the full measurement that should return say 10M points end up running out of RAM, into SWAP, then out of SWAP (> 20 GB). I suspect that this issue and the leftover memory in python are related, and I strongly suspect that it has to do with my schema design. So:

Does this sound like an issue anyone else has seen?
Does my schema design seem like a problem?
Should I be able to query a 300M point measurement without running out of RAM?

Thanks,
-Austin

austinsteamboat · September 27, 2018, 9:02pm

As an update I ran some more experiments. It looks like the memory that python appears to be using is just reserved. I see a similar behavior when I use the influx command line client. When I make a series of large queries, the memory the influx client appears to use accumulates rapidly. It seems to happen or is at least far more noticeable with large queries over GB’s of points. So, follow up questions:

Is this expected behavior?
Is there a way to free up that memory in python? Closing the client doesn’t do the trick, but is there something else I could try?

As a further update, I attempted to restructure my shard group duration so all the data would live in a single shard (~300M points in all ). This didn’t change the performance at all. I’m going to try to reducing my shard duration next to get closer to the recommended ~100k points/shard.

Thanks,
-Austin

Topic		Replies	Views
Influxdb keeps high memory after querying data for a long period of time InfluxDB 2 influxdb , query	0	464	May 28, 2020
InfluxDB : Memory Consumption extremely high, though DB is small influxdb	0	1806	February 4, 2020
Memory usage is still high after a finished query InfluxDB 2 influxdb	4	1393	November 28, 2023
Memory usage relation with select query on influxdb InfluxDB 2 influxdb	1	614	October 15, 2019
How can I reduce the memory usage of InfluxDB 1.7.2	14	18440	August 17, 2019

InfluxDB Python Query Memory Leak?

Related topics