Passing large number of guids to a regex runs super slow

Alexander_Anguiano · June 9, 2017, 1:17pm

influx 1.2.2, grafana 4.x (any version 4 i tried )
I’ve had this problem for a while and have been trying to figure out a solution. I’m passing a large amount of guids to a regex expression, and it is taking a long time for the query to run 7 to 8 minutes. This works fine grafana 3.2 I can’t see the difference when grafana makes the call between the versions. Doing a “show queries” in the influx the query looks a lot like this 10 guid query excepted it has 3000 guids

(When i do a show queries from influx i see this query)

SELECT sum(success) / sum(total) * 100.000 AS success FROM rates WHERE id =~ /(3897f423-a4c1-4db2-b597-b2aee9485857|f3470a5f-20e6-496b-a490-de4541796748|2578a079-3dc6-448f-93e0-18915f429566|827f3a06-c9a4-4355-93a9-2202798441d5|fg013f23-62a8-4e10-94cd-f49ed3364b4d|530aed30-a58e-4b1f-bffa-7e06e885a55c|c908bad0-527b-483b-83f2-c18ee5179318|67b5cf45-b205-4295-820c-eba3515e713d|g096c2c0-b4c0-4db1-8304-dcea87c6ff99|25497141-5af2-4f0d-89b5-bb83b003a8f0)/ AND time > now() - 7d GROUP BY time(1d)

Two reasons I have to do the regex. 1) I need to be able to do grouping on the data after the collection so I can’t do it with tags. There is no in clause in influx SQL. Even if there were, that would not solve my problem because of the first reason.

Any idea how to figure out what’s wrong with this? Is the query OK?

Thanks

mark · June 9, 2017, 3:10pm

Are you saying id is a field, not a tag?

If id is a field, then influx can’t use the tag index to quickly look up data, and it will need to scan every point to compare the entire regex against the id field.

The first thing I’d try, if possible, would be to change the query from id =~ /(foo|bar|...)/ to (id = 'foo' OR id = 'bar' OR ...). It’s hard to say without benchmarking, but if you’re looking for exact values, direct string comparison may go faster than executing a regular expression against each point.

Alexander_Anguiano · June 9, 2017, 3:40pm

id is a tag.

I’m using grafana and the query is what grafana sends to influx. The OR thing would be difficult to do. In the git hup issues, doing the regex was a recomendation to work around the “in” clause missing in the influx SQL

here is the grafana query
SELECT sum(success) / sum(total) * 100.000 AS success FROM rates WHERE id =~ /$idFilter/ AND time > now() - 7d GROUP BY time(1d)

funny thing is, it works fine in grafana 3.2, it works really slow in grafana 4.x. with the same version of influx 1.2.2

is there debugging i can turn on?

Alexander_Anguiano · February 23, 2018, 5:09am

After many hours of trying to figure what is wrong with this, I finally figured out what is happening, in grafana 3.2 the list is sorted and in 4.6 the list of guids is unsorted.

I ran with 3.2 and 4.6 and found the query in each influx log. I cut out the queries and it ran this script

function timer()
{
if [[ $# -eq 0 ]]; then
echo $(date ‘+%s’)
else
local stime=$1
etime=$(date ‘+%s’)

if [[ -z "$stime" ]]; then stime=$etime; fi

dt=$((etime - stime))
ds=$((dt % 60))
dm=$(((dt / 60) % 60))
dh=$((dt / 3600))
printf '%d:%02d:%02d' $dh $dm $ds

fi
}

tmr=$(timer)
outfile=out.txt
sql=$(cat query.txt)
influx -database mydb -format csv -execute “$sql” > $outfile

the 3.2 completed in 2 seconds
the 4.6 completed in 2:08 (two minutes 8 seconds)

looking at the queries they appear to be identical.

As I look for differences I noticed that the 3.2 quids are sorted in ascending order. The 4.6 are not. That looks like the only difference. I wrote a tool that sorted the guid from the 4.6 query and re-submit it with the quids in sorted order. This new sorted list worked in 1 second.

This means that an unsorted list of guid in a regex run slow and a sorted list of guid runs fast.

Topic		Replies	Views
Convert from InfluxQL to FLUX query (Please Help) Fluxlang	9	801	December 17, 2021
Flux query filter execution time issue with regex expression InfluxDB 2 query , flux	0	450	October 14, 2022
Some "in" statement in stead of regex? InfluxDB 2 influxdb , grafana , flux	0	153	December 19, 2023
Performance in Influxdb 1.8 Store influxdb , performance	0	715	June 1, 2020
Struggling with query from 1.8 to v2 in Grafana InfluxDB 2 grafana	3	445	May 1, 2021

Passing large number of guids to a regex runs super slow

tmr=$(timer) outfile=out.txt sql=$(cat query.txt) influx -database mydb -format csv -execute “$sql” > $outfile

Related topics

tmr=$(timer)
outfile=out.txt
sql=$(cat query.txt)
influx -database mydb -format csv -execute “$sql” > $outfile