Relate Data for downSample

According to the concept suggested in the best practices:Data Layout and Schema Design Best Practices for InfluxDB | InfluxData, the proposed data model for influxDB says that is better to create a bucket for usernames and plant_species… but then how to link the user with the collected measurements for downsampling?

can someone explain?

Hi @Daniel_Henao,
So personally I believe you have two options here:

  • Option 1: Store the username and plant species directly as tags. This does leave you open to runaway cardinality depending on how many pants and how many users you have. Though this can be managed by separating data into different buckets by region (US, EMEA, etc.).

  • Option 2: If you are using Flux then the more viable option to consider is a hybrid architecture. I personally would store plant types and usernames within a SQL DB, plus a unique ID. I would then combine the data using the SQL from function with your time series data within InfluxDB . Query SQL data sources with InfluxDB | InfluxDB OSS 2.2 Documentation

To answer your original question you would have to query each bucket and store the result within a variable. You would then need to perform a join on the data:

sensordata= from()|>yield()

The above I believe would be linked to a join on time but this means the username and plant fields would need to be updated as well.

Hi @Jay_Clifford First of all, thank you for answering me.

I really like your approach, but then I have more questions.

Regarding the Option2… I’m using Flux.
Let’s say I store plant types and usernames on a SQLdatabases… that’s good… but in any case, I would need to relate them to the measurements on any Field on influxDB, is that correct?

So for example, if a measure is humidity, and that is well placed in influxDb, I would need a Field for the unique ID that relate username and plant field, Am I Wrong? then how that would solve the problem if querying on fileds is not super performante?

how this differ from runaway cardinality?

I’m facing a problem: I already have a model to work with based on the best practices for influx DB

Bucket for events (time events for running machine, or stopped machine)
Another bucket for counter measurements (bad products or good products)
Another bucket for sensor measurements.

the thing is that I have multiple machines… lets say A,B and C and all of those machines have the same data described on the buckets above.

But I want those data measurements can be classified by production number… wich is a increassing code over time… so, for example the production number is ABC1, and happend in the range of: (1h - 2h) in between this range, the measurements are being taken… so I need to link those measurements with the order… and the order is changing over time.