Everyday we receive 1 TB (text data).
I installed the Influxd, in a machine (240 G RAM, 32 CUP)
I only insert around 22 million points in one measurement, one tag and 110 field.
When i do query (select id from ts limit 1) , it exceed 20 second, and this is not good.
So can you please help me in what i should do to have a good performance
You’re not adding a time boundary to your query, which is an expensive operation.
Can you expand on your use case and what you’re trying to do, please?
Thanks for your reply.
I just realized your technical recommendation that if the tag will be UUID, it should be field, i make it as a field instead of tag, i can see the query be faster.
But i see that not all rows inserted, i see that the influxd overwrite many rows, as we have the time value repeated for many rows.
So what should i do in this case? Should i add tags that not required in where or group by parts ? just to avoid the overwrite cases ?
Can you please help ?
Hi @Mahmoud_Goda ,
yes , to avoid overwriting you must make the datapoints unique ,
this can be done by adding the necessary tags or by changing the precision unless you already use nanosecond precision …
For the time precision, this is not option for me, as i already have time value in my data, but this value can be repeated, so i have to add tags, but adding more and more tags create lot of series, I dont know what is the best solution here ? I have only one table (measurement) with 110 column (tag or field).
Its very hard to check the combination between tags to find a unique value, the easy solution to put only one tag which is the primary key for my table, and this will create series by number of all rows which is 3 billion row
InfluxDB is a time-series database; the timestamp is the primary key.
If you have multiple datapoints (“rows”) with the same timestamp, how do you tell them apart? To put the question another way, if someone were to ask, “What was the data at 13:04:02 yesterday?” what are you going to tell them?