I’m a total influxdb newbie and I would really appreciate a schema review from experts.
My (academic) project is based on some Green Button data (energy consumption in kWh) stored in CSV files. We have one CVS file per site, each containing the following columns: date time utc, consumption value, ‘estimated’ flag. On the other end, we have some metadata concerning the sites: industrial sector, latitude, longitude, time zone. Reading a previous post I decided to create a single measurement called ‘energy_usage’ containing points with the following format :
This scheme seems to provide acceptable performances but could we do better?
I had a look to the schema design recommandations provided by the (excellent) influxdb doc [1]. The very first recommandation tells us to limit the number of series. My schema generates 205 series [2] for the single measurement we have. Is that to much or acceptable ?
I’m a bit confused cause my need is, for instance, to be able to place ‘energy usage’ data on a map. With the schema you propose, I don’t see how I can do that due to the following influxdb limitation. Clearly speaking, I don’t see how to execute an inner join on “energy_usage” and “energy_usage_site” measurements.
The other option is to put everything in the same measurement but make industry, latitude, longitude, geohash and time_zone fields instead of tags, unless you’ve got some good reason to use them as tags.
That’s what I meant by “unless you’ve got some good reason to use them as tags”. Sorry if I was not explicit enough. I meant to suggest to place the data you want to group by in tags and everything else in fields.