Performance optimization, redundancy recommendations?

Hi there,

I am logging cell voltages of battery modules. Each module consists of lets say 10 cells. Every module sends its data every 500ms. I store the 10 voltages as fields and the module id as a tag.

So far so good :wink: Now one interesting value to observe is the cell spread per module, meaning the difference between the highest and the lowest cell voltage in a battery module. This difference may be normally low and can get high for short periods of time in between. So best would be to calculate the spread for every point in time and to observe the maximum of these spread values in aggregate windows to be able to analyze development over longer periods of time (months and years perhaps).

But - how do I do that as performant as possible?

Option 1: By calculating the spread for every point in time via flux and aggregating the results as desired. Technically works but crashes my server quite quickly :wink: Very inefficient for longer periods of observation and a database filling up with loads of modules.

Option 2: By pre-calculating the spread and storing it redundantly. Working on that with aggregateWindow and “max()” should be ok.

How to calculate and store the redundant information?

Option 2a: The clients can already do that and add it to the data that is uploaded to influx. I do not like that approach too much as it adds redundant data to the communication between clients and the server.

Option 2b: Influx could do the calculation on receiving the data (similar to triggers in SQL databases). Is that possible?

Option 2c: Influx could do the calculation periodically asynchronously (e.g. every hour for the past hour) and add the redundant information to the raw data (or already pre-aggregated into a dedicated “statistics” measurement). Can that be configured in influx?

I hope my problem is understandable…

What do you think would be a good approach? Are you aware of any alternative option? Perhaps even one that solves this problem without the need of storing redundant information?

Thanks in advance, any comment is appreciated!
Best
Keith

Hello @sauvant,
For option 2 you can use spread() function | Flux 0.x Documentation with the aggregateWindow() function.
You could execute this task more frequently and store the spread to a new bucket to avoid crashe.

Option 2b: You can execute this logic on a user defined schedule only. Triggers aren’t supported yet.

Option 2c is possible and recommended. You can create a downsampling task.

I would go with 2c.

I don’t know how you’re collecting the data, but if you’re using telegraf you could try as well:

1 Like

Hi Anaisdg,

very cool, thanks a lot!
Meanwhile I wrote a task that aggregates the spread data to 5m maximum values:

from(bucket: “raw_bucket”)
|> range(start: 2022-01-01T00:00:00Z, stop: 2022-01-01T01:00:00Z)
|> filter(fn: (r) => r._measurement == “modules”)
|> filter(fn: (r) => r["_field"] == “v0” or r["_field"] == “v1” or r["_field"] == “v10” or r["_field"] == “v11” or r["_field"] == “v2” or r["_field"] == “v3” or r["_field"] == “v4” or r["_field"] == “v5” or r["_field"] == “v6” or r["_field"] == “v7” or r["_field"] == “v8” or r["_field"] == “v9”)
|> group(columns: ["_time", “_start”, “_stop”, “id”])
|> spread()
|> group(columns: [“id”])
|> aggregateWindow(
every: 5m,
fn: max
)
|> set(key: “_measurement”, value: “modules”)
|> set(key: “_field”, value: “v_spread”)
|> to(bucket: “stat_bucket_spread_5m_max”)

Works, but does not seam to be very efficient: the job takes about 30s to crunch 1h of raw data on your cloud environment. Any idea what I could do to speed that up/consume less ressources?

Best regards and thanks in advance,
Keith