Slow query with 22 million point

Mahmoud_Goda · July 6, 2019, 1:34pm

Hi
Everyday we receive 1 TB (text data).
I installed the Influxd, in a machine (240 G RAM, 32 CUP)
I only insert around 22 million points in one measurement, one tag and 110 field.
When i do query (select id from ts limit 1) , it exceed 20 second, and this is not good.
So can you please help me in what i should do to have a good performance
BR.

rawkode · July 7, 2019, 9:17am

You’re not adding a time boundary to your query, which is an expensive operation.

Can you expand on your use case and what you’re trying to do, please?

Mahmoud_Goda · July 7, 2019, 11:58am

Thanks for your reply.
I just realized your technical recommendation that if the tag will be UUID, it should be field, i make it as a field instead of tag, i can see the query be faster.

But i see that not all rows inserted, i see that the influxd overwrite many rows, as we have the time value repeated for many rows.

So what should i do in this case? Should i add tags that not required in where or group by parts ? just to avoid the overwrite cases ?

Can you please help ?

MarcV · July 8, 2019, 9:42am

Hi @Mahmoud_Goda ,

yes , to avoid overwriting you must make the datapoints unique ,
this can be done by adding the necessary tags or by changing the precision unless you already use nanosecond precision …

best regards

Mahmoud_Goda · July 8, 2019, 10:59am

Hi @MarcV,
For the time precision, this is not option for me, as i already have time value in my data, but this value can be repeated, so i have to add tags, but adding more and more tags create lot of series, I dont know what is the best solution here ? I have only one table (measurement) with 110 column (tag or field).
Its very hard to check the combination between tags to find a unique value, the easy solution to put only one tag which is the primary key for my table, and this will create series by number of all rows which is 3 billion row

JeremySTX · July 10, 2019, 1:13am

Hi Mahmoud,

InfluxDB is a time-series database; the timestamp is the primary key.
If you have multiple datapoints (“rows”) with the same timestamp, how do you tell them apart? To put the question another way, if someone were to ask, “What was the data at 13:04:02 yesterday?” what are you going to tell them?

Jeremy Begg

Topic		Replies	Views
Duplicate values stored in database Store influxdb	6	593	May 27, 2019
How does InfluxDB handle duplicate points?	9	21489	May 30, 2017
Influx data design and performance issue influxdb	1	713	December 31, 2017
Don't get all values in DB by using python windows	2	571	February 4, 2020
Fields are different but Tags are same still getting overwritten InfluxDB 2 influxdb , client-libraries	2	789	September 30, 2024

Slow query with 22 million point

Related topics