Best Practices for Optimizing InfluxDB Performance with High-Volume Data

Hi everyone,

I’ve been working with InfluxDB for a while now, and I’m currently dealing with a high-volume data ingestion scenario. I’ve noticed that as the amount of data increases, performance tends to degrade, especially when running queries on large datasets.

I’m wondering if there are any best practices or tips for optimizing InfluxDB to handle this kind of load. Specifically, I’m looking for guidance on:

  1. Retention policies: How to set them up to efficiently manage large datasets.
  2. Shard management: Any tips for managing shards or tuning them to improve write/query performance?
  3. Query optimization: Best practices for structuring queries to avoid timeouts and performance hits.
  4. Indexing and compression: Are there any strategies for optimizing indexing or compressing the data?

I’ve already made some changes based on the documentation, but I’m interested in hearing about others’ experiences, particularly when dealing with large-scale environments. Any advice or real-world examples would be greatly appreciated!

Thanks in advance!

Best,
michael

Hello @michaelcarlos,
Welcome!
Have you considered InfluxDB v3 core or enterprise?

It has unlimited cardinality and last value cahces and distinct value caches for increased query performance.
These topics might interest you:
Query the Latest Values in Under 10ms with the InfluxDB 3 Last Value Cache | InfluxData!

InfluxDB 3.0 optimizes query performance using time-based indexes to quickly locate and retrieve data without needing traditional, heavy indexing structures. Part Two: InfluxDB 3.0 Under the Hood

For InfluxDB 3.0, Apache Parquet provides excellent compression without sacrificing read speed through columnar storage and compression algorithms.

It also has an embedded python processing engine for ETL, analytics, forecasting, anomaly detection tasks, etc.

To better answer your questions, what version are you using?

I’ll try to answer without though for v1 and v2:

  1. For sparse, historical data, lengthen your retention policy’s shard group duration to cover several years instead of using the default one-week duration. Having too many shards is inefficient for InfluxDB. You can modify this with the ALTER RETENTION POLICY query. InfluxDB frequently asked questions

Balance time range and data precision. For example, if you query data stored every second over six months, you’d have approximately 15.5 million points per series, which can quickly become billions of points depending on cardinality. Balance time range and data precision

  1. InfluxDB writes time series data to un-compacted or “hot” shards. When a shard is no longer actively written to, InfluxDB compacts the shard data, resulting in a “cold” shard. When backfilling historical data, InfluxDB writes to older shards that must first be un-compacted, then re-compacted when the backfill is complete. This process affects performance. InfluxDB shards and shard groups

  2. Query Optimization best practices for v2

  • Start queries with pushdowns to reduce memory and compute requirements. Optimize Flux queries

  • Avoid short window durations which can create excessive series.

  • Use “heavy” functions sparingly as they can significantly impact performance.

  • Use set() instead of map() when possible for better performance.

  • Create a task to downsample data for querying over large time periods. Balance time range and data precision

  1. There are decisions you can make with tags and fields as tags are indexed and fields arent.