SHOW SERIES extremely slow and unacceptable on bigger DB

Same behavior on all 1.3 and 1.4.

On 1.4 we have simple info about DB cardinality:

cardinality estimation
----------------------
1856641

taking show series on this DB

show series  limit 1 offset 0 

shows 1 series as we expect but after almost 30 seconds !!!

real	0m29.573s
user	0m0.009s
sys	    0m0.007s

for

> show series  limit 100 offset 10

real	0m31.508s
user	0m0.000s
sys	    0m0.010s

for

show series limit 10000 offset 10

real	0m35.080s
user	0m0.148s
sys	    0m0.063s

Always more than 30 second

We have other DB with more tags and it takes even 7 minutes to get 1 series from a schema which is totally inefficient.

It is no different how big limit we take, it is always building whole series list on the backend?

Is there any way to improve metadata queries in influxdb, cache, some configuration improvements?

1 Like

This is still a problem in the latest version 1.5.2.

My query “show series limit 1” ran for a about 15 minutes and caused massive amounts of memory allocation (a few GB) on each server in the cluster sequentially.

The result was the first metric name (alphabetically), so I assume it does sort internally, which is implemented utterly inefficiently. InfluxData need to seriously invest in searches of distributed indicies by implementing sort on top of sorted streams pulled from each node, where each stream is jitted without memory allocation, hitting appropriate persistent index.

The same problem exists for other types of metadata queries, such as find metric starting with for tag=value.
They run for about 1 hour, when user would expect milliseconds, because Grafana makes these requests for type-ahead. Unacceptable! (Interesting, that the queries eventually do succeed and type-ahead is suggested, but I guess users would be a little inpatient if finding/typing the measurement takes a few days.)

I am not sure if this is only problem in the clustered version, because I didn’t previously see it in a single-node Dev instance. If that’s the case, why would we want to pay for clustering and may scale back to a single VM or shard data manually?

Any advice on how to overcome this issue would be greatly appreciated.
My best ideas right now is (1) to test the same on a single VM and see if clustering is the problem, (2) limit amounts of historical data (on single-node VM) to 1 day and (3) split data to multiple retention policy (or databases).