Querying Rarely Sampled Data

I have data that I have sampled twice per hour. I looked in my node0/dbs/testdb/[table name] folder and I see that 1 day has 48 parquet files. When I run select * for a specific day in the where clause, I get a response that the query would exceed the file limit. I increased the file limit to 1,000 parquet files and I am still seeing this error. Could someone please explain what is happening?

Further, since I don’t have enterprise, is there another way to compact the files? I have 48 parquet files per day that each have 1 data point.

You can raise the query-file-limit (for example to 1000), but that this can cause:

  • Degraded query performance

  • Increased memory usage

  • Potential OOM kills or instability, especially when using object storage

InfluxDB3 Core doesn’t have compaction where as InfluxDB 3 Enterprise does and doesn’t have any parquet file limit.

Thank you for the response! I increased the query file limit but I can’t increase it high enough to get the performance I need. I don’t understand why querying 1 day, which has 48 parquet files per the file system (assuming I’m looking in the right spot), results in an error about the number of files being queried being greater than 1,000. I thought the WHERE clause would lead to pretty instant lookup since the file system has a folder per day/hour so you wouldn’t have to parse the entire database to find one day. I think that is where the biggest disconnect is for me.

InfluxDB 3 Core must consider all Parquet files that might contain data for the time range because it doesn’t compact them into larger, indexed blocks, so even a 1‑day query can hit the query-file-limit and perform poorly when many small files are involved. The WHERE clause filters rows, but the engine still has to open and inspect metadata for every candidate file in the time range, most likely resulting in the error message.