Schema design: peer review request

nick.fr · October 9, 2018, 7:42pm

Hi all,

I’m a total influxdb newbie and I would really appreciate a schema review from experts.

My (academic) project is based on some Green Button data (energy consumption in kWh) stored in CSV files. We have one CVS file per site, each containing the following columns: date time utc, consumption value, ‘estimated’ flag. On the other end, we have some metadata concerning the sites: industrial sector, latitude, longitude, time zone. Reading a previous post I decided to create a single measurement called ‘energy_usage’ containing points with the following format :

energy_usage_point = {
    ‘measurement’: ‘energy_usage’,
    ‘time’: string ,
    ‘fields’: {
        ‘value’: float
    },
    ‘tags’: {
        ‘site’: int,
        ‘industry’: string,
        ‘latitude’: float,
        ‘longitude’: float,
        'geohash': string,
        ‘time_zone’: string
    }
}

This scheme seems to provide acceptable performances but could we do better?

I had a look to the schema design recommandations provided by the (excellent) influxdb doc [1]. The very first recommandation tells us to limit the number of series. My schema generates 205 series [2] for the single measurement we have. Is that to much or acceptable ?

Thanks for your help.
Nicolas.

[1] https://docs.influxdata.com/influxdb/v1.6/concepts/schema_and_data_layout 1

[2] influx -database ‘mydatabase’ -format ‘csv’ -execute ‘SHOW SERIES’ | grep -v “name,_id,host” | wc -l

samaust · October 9, 2018, 11:41pm

I’m assuming that:

You have one sensor per site.
The industry, latitude, longitude, geohash and time_zone rarely or never changer per site.

So I suggest to use two series

energy_usage_point = {
    'measurement': 'energy_usage',
    'time': string,
    'fields': {
        'value': float
    },
    'tags': {
        'site': int
    }
}

energy_usage_site = {
    'measurement': 'energy_usage_site',
    'time': string,
    'fields': {
        'industry': string,
        'latitude': float,
        'longitude': float,
        'geohash': string,
        'time_zone': string
    },
    'tags': {
        'site': int
    }
}

You’ll import all your energy measurements in energy_usage.

You’ll import the site information only when it changes, so maybe just once.

The site tag will allow to group by “site”.

When you want to get the current site metadata, you just do a query such as

SELECT LAST(*) FROM energy_usage_site

nick.fr · October 11, 2018, 6:10pm

Thanks Samust.

I’m a bit confused cause my need is, for instance, to be able to place ‘energy usage’ data on a map. With the schema you propose, I don’t see how I can do that due to the following influxdb limitation. Clearly speaking, I don’t see how to execute an inner join on “energy_usage” and “energy_usage_site” measurements.

samaust · October 12, 2018, 12:59am

The other option is to put everything in the same measurement but make industry, latitude, longitude, geohash and time_zone fields instead of tags, unless you’ve got some good reason to use them as tags.

nick.fr · October 12, 2018, 5:01pm

samaust,
industry, is one the entities I need to “GROUP BY”. Influxdb documentation says that such data should but a tag in this case. I’m a bit lost…

samaust · October 14, 2018, 3:07am

That’s what I meant by “unless you’ve got some good reason to use them as tags”. Sorry if I was not explicit enough. I meant to suggest to place the data you want to group by in tags and everything else in fields.

energy_usage_point = {
    'measurement': 'energy_usage',
    'time': string,
    'fields': {
        'value': float
        'latitude': float,
        'longitude': float,
        'geohash': string,
        'time_zone': string
    },
    'tags': {
        'site': int,
        'industry': string
    }
}

nick.fr · October 14, 2018, 7:07am

Ok. If we move geohash to the tags section then I think we have the ultimate schema.
Thanks a lot for your help.

Topic		Replies	Views
Ask for help about schema design InfluxDB 1 schema	4	1136	July 25, 2021
Recommended schema and method for lookups/"joins"? InfluxDB 2 influxdb , schema , flux	0	974	January 25, 2022
Schema design best practice, storing equipment measurements Store influxdb	0	687	September 25, 2018
Measurement / schema design Store influxdb , time-series , schema	1	1094	November 1, 2017
Schema Design for IoT Metrics Store influxdb , time-series , iot , schema	9	7506	February 12, 2024

Schema design: peer review request

Related topics