I have 500K objects with 15 boolean properties, that are measured once per day (with timestamp T00:00:00Z). As output, I want:
A count of all true bools graph, grouped by property, over time
To query an object/property on a specific date
I tried loading the object id’s as tags, but quickly ran out of mem.
I tried using the object id’s as measurement names, but apparently I cannot group by over multiple measurements.
Can I use InfluxDB for this data/queries?
Or should I wait for the new time series index in 1.3?
What is the best schema design here?
It seems there is a big optimization possible: the booleans don’t change much, so theoretically I would only have to store them when they change. Not sure how to implement this in InfluxDB though.
Hey @gwillem, not quite sure what you mean about ‘500k’ objects. Can you give an example of what your data looks like?
The primary thing re: cardinality is the unique combinations of measurement + tags + fields. This is a very good rundown of how the count works, and provided you keep your cardinality as low as possible you should be able to write/read all of your data without problem!
Also, we’ve had a few discussions on this here in the community so maybe some of this will help too?
Objects: I’m monitoring 500K websites for certain properties, such as “SSL enabled”.
I want to store these properties as tags, because I want to run queries on them for graphs (“how many sites have SSL installed over time”). That produces a theoretical cardinality limit of: 500K * 2^15 = 16.4 billion. Hmm.
Perhaps I’m better off storing the aggregates separately (calculated per day) and storing the bools per site in a measurement based on the sitename? Eg:
I think you just need to restructure your data in such a way to limit overall series, and of course testing is going to be essential.
The influx_inspect report -details /path/to/shard/num is your friend here, will give you good insight into the overall breakdown of your data.
I would write something as follows and see what it looks like. It’s hard to fully calculate ahead of time, worth trying it out first and iterating (premature optimization and all):