I’m reaching out because we’re experiencing serious performance issues with our current setup. Most problems are within one tenant, containing ~8000 signals. New values every minute.
Schema:
org: environment
bucket: tenant
_measurement: tenant
_field:
_value: <float|bool|string>
signal:
InfluxDB instance:
2 CPU, 4GB Memory
We migrated from Influx 1.8, where this was not a problem at all. With InfluxDB 2.x, we hit massive performance issues.
Querying a single signal over 1h, via:
from(bucket: bucket)
|> range(start: <from>, stop: <to>)
|> filter(fn: (r) => r._measurement == <tenant> and r._field == <signal>)
takes around >60s
We have already improved by using the tag:
from(bucket: bucket)
|> range(start: <from>, stop: <to>)
|> filter(fn: (r) => r._measurement == <tenant> and r.signal == <signal>)
takes around 1s
Additionally, we have special signal handling. We expect, for every given signal & timestamp, the last known value, which we call the “predecessor”. This means, whenever we want to query for a time range:
[from, to]
We also have to query the predecessor, last value < from, which in InfluxDB2.x results in:
[0, from]
Simplified, this is the resulting query, which has a devastating performance:
from(bucket: bucket)
|> range(start: 0, stop: <from>)
|> filter(fn: (r) => r._measurement == <tenant> and r.signal == <signal>)
|> last()
There is no other way to get this value. I have no idea if the last value was written yesterday or a week ago. This is machine data. I have no control over it.
Finally, I have to call for around 80 Signals this way. I pack all of them in union-batches of max 20 signals, and send those union request to my instance.
We have other accounts with ~1500 Signals, working totally fine without any notice. Loading times for everything ~1s.
Questions:
- How to query for those “predecessors” efficiently
- Is it a bad design for 8000 Signals? How can we improve?
- All of this wasn’t a problem back to InfluxDB 1.8, the predecessor queries were pretty fast.
- Any other suggestions how to handle that?
- Reduce request size? More requests? But we already have downtimes on the instance. The queries seem to be super heavy for it to handle.
Thanks for any help & for any hints.
Best regards
Philipp
