Hi everyone, we currently evaluate InfluxDB v3 oss (self hosted on 64GB machine) for our use case to store millions of species detections based on audio/video data. We planned to create different tables for the different projects to limit data on a project scope. Some tables store detections(10k/d/sensor), other tables serve as file management tables (150/d/sensor) in s3. Both tables uses <15 text tags to allow filtering the data.After inserting some months of real life data we got into problems how we configure the system like: Error while planning query: External error: Query would exceed file limit of … and timeouts. How do you store millions of measurements and support historic analysis of the data? Is it because we use s3 as storage system ? and does node adding scale properly here?
It would be cool to get some infos/hints about other on-premise installations of the v3 core handling ‘bigger’ amounts of data. Thanks…
@Thilo, this is a limitation of InfluxDB 3 Core (more info here). Core is optimized for real-time and recent data, not for historical data. Core doesn’t compact data over time, so the longer the period of time you query, the more data files the query engine has to read. This results in incredibly slow queries when querying large time ranges. To prevent these types of queries, Core limits the number of Parquet files a query can actually read.
To effectively query historical data, you’ll need to use InfluxDB 3 Enterprise. InfluxDB 3 Enterprise compacts data over time and makes historical queries over large time ranges to still be performant.
Thanks Scott, that makes things much clearer. I will check the enterprise version after we defined our benchmark datasets to evaluate the historical data querying. NRT is not so important for us than an effective access of our >1yr data. We will see how the aggregation into larger chunks works out for that use case.