Ultra slow query execution and high cpu consumption

#1

Hello all,

I have a reproducible issue with queries where the tag is undefined on a highly ephemeral dataset. For example, the query
show tag value from measurement with key = “number” where someTag =~ /^()$/
is really slow and killing the server. This query should return immediately as there is no time series associated with it.
Also, I noticed that the automatic kill query after a certain defined period is ineffective.

This behaviour could not be observed with Influxdb 1.2.1 using an inmem tag store. I have seen it with 1.3.2 and even worse on 1.3.5 with tsi1.

Has this been observed before? Happy to raise an issue for this.

Thanks, Flavio

#2

Hi Flavio,

The performance of SHOW TAG VALUES was significantly improved in 1.3.2: https://github.com/influxdata/influxdb/pull/8660

So I’m surprised to see you having trouble with it using version 1.3.5. Does the slowness you’re encountering occur only with the specific query in your post, or does it occur when you’re executing a query that should return some results.

One thing I will say is that we can’t use our index to query tags when using a regex so it’s always going to be slower than using the someTag = 'abc' or someTag != 'abc' operators.

Cheers,
Edd

#3

Hi Edd,
I guess it is really a special case that I somehow have to deal with: in certain cases I search the data with a tag where I do not have a value, thus it would be look like where tag is null, expressed as tag =~ //. To workaround the problem, I replace now the empty value with -, which solves the issue on performance but has undesired side effects on the visualisation.
I guess my understanding was that this query should evaluate very fast as there is never a series with a null tag.
Quite possibly I am going about the problem in the wrong way…

Thanks, Flavio

#4

Flavio,

Your statement

thus it would be look like where tag is null, expressed as tag =~ //

Is not accurate. An empty regular expression match actually matches every value. If you actually want to force a check for no value, you need to use tag = ''

Here’s an example:

> show series
key
---
m,t=1,u=1
m,t=2

> show tag values from m with key=t where u =~ //
name: m
key value
--- -----
t   1
t   2

> show tag values from m with key=t where u = ''
name: m
key value
--- -----
t   2

You mentioned your data is “highly ephemeral”, does that mean high cardinality? If so, the query will attempt to return every value that has ever existed for that tag.

#5

Hi Joe,
Thanks for your response. The problem I have is that for some variables, which are dependent on the result of another query, I do not get a result. Thus, I have to search for tag = ‘’, which turns out to be ultra slow as well. I understand that the regular expression I used before tag =~ // prompts a search for all tags, but unfortunately, both queries are slow and almost bring down the DB.

Do you have a suggestion how I could overcome this problem?

Thanks, Flavio