Hi all, I’m new to working with InfluxDB and working to wrap my wetware around the storage model and best practices and would love some advice.
I have the following scenario: I’m storing a stream of event objects, each of which has the same set of attributes. As a reduced example, lets think of them as {time, measure, location_group, location, val1, val2, val3}
.
These are originating from live sources with ms accuracy (tagged at the source), as well as batch imports from systems with much lower time granularity (minutes or ten’s of minutes - human survey info).
In an ideal world location_group
and location
would both be tags, however location
is a string with a potentially high number of variations, making this not possible without severely blowing out series cardinality. As a result, it is currently mapped to a field along with val1
, val2
and val3
.
When querying with flux it’s possible to pivot on time
and location_group
to restore the original events… unless there are two with the same location_group
at the same moment in time. This situation occurs frequently from the low time granularity source (e.g. a reading at 09:30am). When this happens events are overwritten at storage time as they share the same timestamp, measure and tagset.
To counteract this, one approach that I’ve been experimenting with is to hash the location into 20 bits noise that is appended to the timestamp prior to storage. This provides unique, deterministic time values for pivoting at the expense of up to ~1.04ms of error (which is fine for this use case).
This works, but feels hacky. It also looks as it may impact compression based on info here.
Are other approaches for handling this situation?