querying is slow and high memory usagewith TB of data

I am putting a lot of data into one single influxdb instance. the size of data dir is about 20~30TB. I need all the data so none can be dropped.

There are about 0.7 million series. the influxdb version is 1.4.2. the box has 128GB RAM and 64 cores.

Recently, my influxdb reported fatal error like "can not allocate memory and crashed. and when it restarted, it reported the same error and other errors like “Failed to open shard, can not allocate memory.” I had to manually delete a database dir in data dir to get it started.

would reducing series number help with the memory issue?

Right now, I am only having less than 10 measurements for most of the data. and the query is extremely slow. would it be helpful if I separate the data into about 2K measurements?

Good question. One place to start would be to look at the startup code to see what is going on. Cold shards may get unloaded and unmapped, but it doesn’t help if everything is mapped on startup.

Moving to 1.5.x is also a good option because it uses the new TSI engine, with much improved performance.