Nested queries performing horribly - performance bug?

influxdb
#1

I’m seeing the performance of InfluxDB’s nested queries be extremely poor.

For example, this query, which returns 659 lines runs in 120~150 ms:
SELECT last("ServiceStatus") as ServiceStatus FROM "DefaultPaulo"."ServiceStats" WHERE "Environment" =~ /^PT02$/ AND Time > now() - 7m AND ServiceName != 'Asos.Service.Emailer' AND ServiceName != 'Asos.Stock.DespatchImportService' AND ServiceName != 'Asos.Stock.StockExportService' AND ServiceStartType <> 4 GROUP BY "ServiceName", "host"

While this simple SELECT FROM the same query with a trivial WHERE, runs in 4~5 seconds:
SELECT host, ServiceName, ServiceStatus FROM (
SELECT last("ServiceStatus") as ServiceStatus FROM "DefaultPaulo"."ServiceStats" WHERE "Environment" =~ /^PT02$/ AND Time > now() - 7m AND ServiceName != 'Asos.Service.Emailer' AND ServiceName != 'Asos.Stock.DespatchImportService' AND ServiceName != 'Asos.Stock.StockExportService' AND ServiceStartType <> 4 GROUP BY "ServiceName", "host"
) WHERE "ServiceStatus" <> 4


I’ve been observing this pattern in all nested queries I’ve tried so far.

Possibly related, a couple of hours ago I was trying a nested query and the InfluxDB server suddenly stopped working (O.S. included, so maybe unrelated?) - which was a real shame, since I checked yesterday and “uptime = 70d”.

#2

Query performance depends on a large number of factors such as how many series you are querying over, or how much data the query returns. Filters on fields (ServiceStartType <> 4) and regular expressions ("Environment" =~ /^PT02$/) are also two query features that are particularly non performant. However it does appear that you are seeing some unexpectedly slow query performance. Maybe @jonathan has a second to take a peek at this?

#3

Hi Jack,

Thanks for that.

I do know things like the Regex expression filters can naturally perform poorly due to the underlying complexity of regex searches. However, both of those things are performing just fine in the inner query - 120~150ms.

The nested query, though, is adding very minimal work on top of the inner query, not justifying making it 30 times slower… It’s applying 1 single WHERE filter against the only selected field. I can’t conceive any reason for that 2nd layer of querying to add such a performance hit: we’re talking of 660 lines of data, and the field “ServiceStatus” will only have some 2~5 different values, which is what’s making me guess this is a performance problem.

#4

@paulo This sounds like an issue to me. Can you open an issue on InfluxDB?

#5

@jackzampolin Done, “issue” created here:

1 Like