Optimal shard duration

NextGen · October 18, 2021, 2:04pm

We have an Influx database (v1.8) which contains data back to 2009 using daily shards (so > 4600 shards in one retetention period, series cardinality ~ 23.000).
The database takes quite long to start (> 3 minutes) and consumes a lot of memory (2GB at start, 28GB after some queries were made).
Now queries take very long expecially when using the “LAST” keyword. So my question is if performance and resource consumption would be better when e.g. monthly shards would be used instead of daily ones?

Thanks in advance,
Ewald

Anaisdg · October 18, 2021, 10:21pm

Hello @NextGen,
It depends on your schema, the ingest rate, your expiration rate, and the range over which you’re querying data.
Have you you seen the following?

NextGen · October 19, 2021, 9:01am

Thank you Anaisdg!
I know this document. The main question is if there is a general rule like “keep shard count low”, “startup time is linear to shard count” or similar. We now did some tests, e.g. writing one point per day with shard duration also one day starting in Jan 2000, so creating 365*22 shards each just containing 1 point. The startup of the DB then takes several minutes although it is almost empty.

To provide more information on the database:

schema: measurement has two tags (name as string and type as string) and 11 fields
no shared expiration (never delete)
Ingestion rate 2.5 values / second (average over 10 years)
queries are mainly for periods until 1 year ago, but also queries over the whole timerange (10 years) may happen; also we saw that queries using the “ORDER BY DESC LIMIT x” phrase (not “LAST” as I wrote above) took very long (which was not expected).

Thanks again,
Ewald

Anaisdg · October 19, 2021, 2:58pm

Hello @NextGen,
Startup time isn’t linear to shard count. No you don’t have to keep shard count low, but if you have a lot of data and you’re able to expire it frequently and keep shard count low that could help. If your shard count is too low and you’re querying outside of that shard frequently then that will be slower than if the data you’re querying for is in the hot shard.

Have you enabled TSI?

That might be helpful. Unfortunately you might just be running into cardinality issues. What HW is influxdb running on? You might want to check:

Topic		Replies	Views
Slow Query Times influxdb	3	6904	April 4, 2017
Continuouse Query Delay	6	1065	February 20, 2019
Choose retention policy Store influxdb	1	622	January 14, 2019
Optimal shard size	4	4780	June 29, 2018
Altering shard duration Store	1	643	February 21, 2019

Optimal shard duration

Related topics