Need help deciding proper schema for data

I needed a little help deciding on the schema for my data. I feel like the current schema I am using is not the most efficient for my purposes and is causing a lot of problems when I attempt to query and visualize my data.

This is what I currently am using - These are all things related to my system information which I am storing in the DB.

single_alert = influxdb_client.Point(“alerts”)
.tag(“operator”, msg_dict[‘operator’])
.tag(“app_type”, msg_dict[‘app_type’])
.tag(“app_name”, msg_dict[“app_name”])
.tag(“app_host”, msg_dict[‘app_host’])
.tag(“datacenter”, msg_dict[‘datacenter’])
.tag(“alert_type”, msg_dict[‘alert_type’])
.field(“comment”, msg_dict[‘comment’])

I decided to have almost everything as a tag as I query on these values often.
The comment I made a field as I don’t often query it. Is this correct or are there better ways of doing so. I can provide more information on the type of queries I need to run if that helps.

But overall, I want to create a time series on this data where I want to show a count of the different alert_types for a operator over time. I am having trouble writing the query for this, and I am assuming this trouble is coming from the fact that my schema is wrong.

Hello @omkudalkar9,
Thanks for your question.

I encourage you to read the following section:

Or follow these guidelines:

Generally, you should abide by the following recommendations when designing your schema:

  • Keep bucket and measurement names short and simple.
  • Avoid encoding data in measurement names.
  • Encode meta data in tags.
  • Commonly queried metadata should be stored in tags for better query performance because tags are indexed while field values are not indexed.
  • Limit the number of series or try to reduce series cardinality and avoid runaway series cardinality. Make sure that your tag values aren’t unbounded.
  • Separate data into different buckets when you need to either:
    • assign different retention policies to that data.
    • or need to scope authentication token to that bucket.

I’m not familiar with your data enough to be able to say. Superficially it looks good to me right now. It appears that your tag values are mostly strings and that the number of values won’t grow indefinitely.

PS what type or app are you building? Do you feel like sharing some details about your project? I’m always curious to learn more about what community is doing with InfluxDB. Thanks!

I just saw your other question.

Based off of this, I would also think about tags as your metadata and fields as the actual values that you want to manipulate and transform.

So for example if I was measuring weather I might have the following schema:

measurement: weather
tags: location
fields: temp, humidity, pressure, sensor_id.

Here we made location a tag because we’re assuming that we’ll only be monitoring the same location (this tag won’t have infinite number of values). By contrast if we’re continually adding more sensors then we’ll keep that value as a field to reduce series cardinality. Otherwise our “actual” data is in fields and our metadata is in a tag.

I hope this helps!