Data in a Measurement

nicfio · December 6, 2023, 5:36pm

Hello everyone,

we have organized a data collection sinking into a influxdb instance.
A bucket is a single source with a predefined list of columns.
A measurement is a collection of data from the specified bucket: so that a mesurement holds in its name the starting timestamp and the ending timestamp of the data group uploaded to the db.
Do you find any performance regarding downside in influxdb using this type of organization?

I thought about querying, being always specified with a date interval in our application, before executing the query, we can sort and filter to which measurement exactly hold the data in order to specify the minimum needed group of measurements in the query call.
Does this gives a performance increment in querying influxdb for you or is it non-sense?
Would it be better to set a unique measurement for each data source for example?

Another related question, if a row of data is contained in more than one measurment, does this consists in duplicate or influx holds a unique value reading the data contained or just the timestamp?
Meanwhile if a row with the same timestamp is uploaded in a measurment where that timestamp is already existing, the latest data is retained and the oldest deleted, right?

Thanks guys any help or suggestion or question is appreciated.

Nick

Anaisdg · December 7, 2023, 8:29pm

Hello @nicfio,
Welcome. Hello I recommend reading this documentation on schema design best practices for InfluxDB v2:

A bucket will contain one or many measurements.
Measurements dont usually have a timestamp in their name. But every line will have a timestamp.
You can filter data with the range() function thats the best way to filter by time.

A row of data cant be in more than one measurment. Series are indexed in influxdb v2. Series are defined by the unique combination of measurement names, tag key value pairs and field keys.

Designing your schema in v2 can be complicated. For this reason I recommend looking into InfluxDB v3. You don’t have to worry about cardinality or query performance.
InfluxDB 3.0 is up to 45x Faster for Recent Data Compared to InfluxDB Open Source | InfluxData.

You can start with the free trial in v3 for InfluxDB v3 Cloud serverless.

nicfio · December 12, 2023, 7:58am

Thank you @Anaisdg ,

your comments very much appreciated.
Indeed I am trying since few weeks influxdb v3 with the newer python client and I am enjoying the enhancements. For now, especially in the query operation featuring pyarrow flight client.

Regarding the schema employed, I am using the starting and closing date of a group of data as a sort of hash for the measurement name and after some activity time I am experiencing downsides.
I think globally it is best to optimize better the measurement role.

Thanks,
Nick

Topic		Replies	Views
InfluxDB Performance when handling duplicate entries InfluxDB 2 influxdb , time-series , schema , query , performance	1	717	May 10, 2022
Time comparision in Query Performance with and without Measurement Filter in InfluxDB InfluxDB 2 time , flux , performance , python	2	118	June 29, 2024
InfluxDB Performance with Non-periodic Time Series: Retrieve Latest Points InfluxDB 2 influxdb , flux , performance	3	578	April 18, 2023
What's the logical connection between buckets, measurements & retention policies in InfluxDB 2.0? InfluxDB 2	5	8785	May 26, 2023
How to get all measurements in a bucket	1	1098	March 18, 2024

Data in a Measurement

Related topics