General DB Design Question

Greetings- I’m looking just for some general guidelines. I’ve gone through most of the docs, done plenty of research and haven’t found what I’m after. In short, I’m sending data from about 10 devices at around 10 samples per sec to a pi 4 broker/client using MQTT. Its being coded in python and C, and the endpoint will be grafana. Before writing to influxdb some of it will need to be rolled up/aggregated into new series (usage data), some of it will need to be run through various functions (eg. cost functions). My question is, in general, when does it make sense to do time based aggregations prior to writing to Influxdb (eg. hourly usage rollup) and when does it make sense to have Influxdb do these aggregations through revolving tasks? Clearly, much/most of it could be done from within the influxdb platform after logging the raw data, but under what circumstances does it make more sense to do it prior to writing to influxdb? Appreciate any guidance.

Hello @WharfRat,
Welcome! Thanks for your question and research.
I think it largely depends on what your end goal is and how many devices writing to InfluxDB. With 10 devices I think there isn’t a best practice, but rather how you prefer to perform those aggregations. I’d suggest aggregating in InfluxDB because I think it’s easy, but that’s what I’m also comfortable with. If you were writing from thousands of devices then you might consider preaggregating to reduce your InfluxDB size (depending again on how frequently you’re expiring data), or if you were concerned about the volume of data you’re writing to InfluxDB.

Excellent. Appreciate the guidance. Cheers.

1 Like