We have an Influx database (v1.8) which contains data back to 2009 using daily shards (so > 4600 shards in one retetention period, series cardinality ~ 23.000).
The database takes quite long to start (> 3 minutes) and consumes a lot of memory (2GB at start, 28GB after some queries were made).
Now queries take very long expecially when using the “LAST” keyword. So my question is if performance and resource consumption would be better when e.g. monthly shards would be used instead of daily ones?
Hello @NextGen,
It depends on your schema, the ingest rate, your expiration rate, and the range over which you’re querying data.
Have you you seen the following?
Thank you Anaisdg!
I know this document. The main question is if there is a general rule like “keep shard count low”, “startup time is linear to shard count” or similar. We now did some tests, e.g. writing one point per day with shard duration also one day starting in Jan 2000, so creating 365*22 shards each just containing 1 point. The startup of the DB then takes several minutes although it is almost empty.
To provide more information on the database:
schema: measurement has two tags (name as string and type as string) and 11 fields
no shared expiration (never delete)
Ingestion rate 2.5 values / second (average over 10 years)
queries are mainly for periods until 1 year ago, but also queries over the whole timerange (10 years) may happen; also we saw that queries using the “ORDER BY DESC LIMIT x” phrase (not “LAST” as I wrote above) took very long (which was not expected).
Hello @NextGen,
Startup time isn’t linear to shard count. No you don’t have to keep shard count low, but if you have a lot of data and you’re able to expire it frequently and keep shard count low that could help. If your shard count is too low and you’re querying outside of that shard frequently then that will be slower than if the data you’re querying for is in the hot shard.
Have you enabled TSI?
That might be helpful. Unfortunately you might just be running into cardinality issues. What HW is influxdb running on? You might want to check: