I’m relatively new to InfluxDB and I’m running into some issues querying my database from python. Some background on my software versions/OS:
influxdb v. 1.6.3
python-influxdb v. 5.2.0
The measurement I’m working with has ~300M points of robotics data (IMU, GPS, commands etc., ~100 field values in each point). The data is sparse over time in that we only get a new log maybe once a week, but it’s very dense in a log. There’s around 3000 series in the data and using the default shard duration, I end up with about 70 shards each containing ~4-5M points (~100k points/series). If something seems very wrong with this schema, please let me know how I might improve it, but onto the issue.
I query this measurement with the basic InfluxDBClient query call. For example I might query for all the GPS data from a specific log name tag. This result would return maybe ~100k points and the variables in python might take up ~10 MB, but I’ve noticed that after a query call I’m left with 100MB’s or 1GB’s of memory used by python. I’m able to check what variables are present with a whos command and after systematically deleting every variable and imported module, I’m still left with a massive amount of memory use until I stop the python kernel and restart it. I’ve tried garbage collection calls as well in python, but these don’t help either.
I initially started with a smaller measurement size and query’s were fast and didn’t take up too much RAM. When I increased my measurement from my test size up to the full size (~3M points to ~300M points), I’ve noticed that the RAM required to run even basic queries is huge. In fact even basic query’s over the full measurement that should return say 10M points end up running out of RAM, into SWAP, then out of SWAP (> 20 GB). I suspect that this issue and the leftover memory in python are related, and I strongly suspect that it has to do with my schema design. So:
- Does this sound like an issue anyone else has seen?
- Does my schema design seem like a problem?
- Should I be able to query a 300M point measurement without running out of RAM?