Hi,
We are working with some financial data and trying to run queries on it. Something we are noticing is that the queries don’t seem to scale linearly. When running 3h of data it takes about 800ms and the running a full day takes 40s. These queries are not doing any manipulation so this seems very odd to me.
The data is organised timestamp <2x tags> <40 x fields> I’m wondering if there are some standard things i can test to see why this is happening. Also is there something comparable to examine analyze for flux?
Thanks.
Hello @Mark_Best,
Do you know what your retention policy is? I’m wondering if some of the data is on a cold shard? Do you know what your shard duration is?
Hi, Thank you for getting back to me. The retention policy is the default of infinite but we are working with a static financial data set so that makes sense to me. We tracked down the issue and it was related to using a join. One of the join columns was field so i’m assuming it was doing a O(N^2 ) lookup to match. We found a faster way using unions and sorting.
I do have a second question about indexes. I also changed from using TSM to TSI indexes. What i found after running
sudo influx_inspect buildtsi
was that queries then took a long time to run and then were really fast. I am assuming that is because it was building an index? Is there some where to read about cold shards, indexing etc so i can work out how to better optimise the flux queries?
Mark
Hello @Mark_Best,
Ah yes joins would make sense. As far as resources go you might find these useful:
Simplifying Retention Policies-Blog
Rebalancing InfluxDB Enterprise Clusters
TSI Details
InfluxDB FAQ
Please let me know if you find these helpful, or if I can find some other resources.
