Recommended schema and method for lookups/"joins"?

Hi, I’m very new to time-series databases, so I’m hoping someone can help point me in the right direction.

I’m setting up Influx to monitor power usage in my house, using data reported by the utility power meters and collected by a Raspberry Pi. The data collection part is working great, but it needs a bit of massaging to really be usable for viewing/analysis. In particular:

  • The data reported by the RPi has nothing except the time of the reading, the ID of the meter (a key value), and the reading (a field value). I want to be able to query readings by site name (main house, secondary apartment, etc) and function (solar production, consumption, net).
  • The different meters report at unpredictable and inconsistent intervals, so I’ll have something like a report every minute for one of the production meters but only every 5 minutes for a consumption meter, except that sometimes I won’t receive a report for several intervals in a row. So each report comes in at a different time and I can’t, for example, pivot on the field name in order to get production and net for a single site, because the report times are misaligned. I can’t even come up with a good query range - if I want to see usage patterns over a given ten minutes, for example, one or more of the meters might not have sent even a single report in that time, but the last-known-reading still needs to be taken into account and displayed.

As I understand it, this probably means that I should be using the site name as a key, with each “function” as a field name whose value is the energy production/consumption. Or, I suppose, both site and function-type as keys, and a simple “usage” as field? I want to be able to easily relate production to consumption/net for a given site, for example, but I can also see wanting to sum raw production values over all sites, without taking consumption into account. Which schema makes more sense? I’m still a little fuzzy on the proper usage of fields vs keys.

The second part of my question is, how can/should I be doing the mapping of meter IDs to sites/functions? If this were a relational database I’d just create a lookup table with the mappings in it and include it as a JOIN when querying the data. There are few enough of them that any filter on, say, site name could be easily translated into a filter on meter IDs. I don’t imagine that’s the proper solution in Influx, though, since data stored in the database is inherently time-based.

The two solutions I’ve come up with are:

  1. In the Flux script, assign a variable to a dict that maps each meter ID to a record of site name and meter function. In the query, use map() with dict.get() to add the site/function keys to the raw meter records.
  2. In the Flux script, use array.from() to create a table with records consisting of meter ID, site name, and function. Use group() to repartition this into a stream of single-record tables (is this necessary/helpful?), and then join() on the meter ID to add the other fields into the query. Doing a group() on site×function at that point doesn’t (shouldn’t?) change the partitioning at all, but marking site/function as part of the group key seems to be necessary for the graph to show those values in the legend.

What’s the best way to do this? Also, is there any way I can share the mapping between Notebooks and Dashboards? the “custom variables” feature only seems to be accessible from the latter.

Thank you!

1 Like