Maintaining field relationships

kim · February 5, 2020, 6:10am

Hi all, I’m new to working with InfluxDB and working to wrap my wetware around the storage model and best practices and would love some advice.

I have the following scenario: I’m storing a stream of event objects, each of which has the same set of attributes. As a reduced example, lets think of them as {time, measure, location_group, location, val1, val2, val3}.

These are originating from live sources with ms accuracy (tagged at the source), as well as batch imports from systems with much lower time granularity (minutes or ten’s of minutes - human survey info).

In an ideal world location_group and location would both be tags, however location is a string with a potentially high number of variations, making this not possible without severely blowing out series cardinality. As a result, it is currently mapped to a field along with val1, val2 and val3.

When querying with flux it’s possible to pivot on time and location_group to restore the original events… unless there are two with the same location_group at the same moment in time. This situation occurs frequently from the low time granularity source (e.g. a reading at 09:30am). When this happens events are overwritten at storage time as they share the same timestamp, measure and tagset.

To counteract this, one approach that I’ve been experimenting with is to hash the location into 20 bits noise that is appended to the timestamp prior to storage. This provides unique, deterministic time values for pivoting at the expense of up to ~1.04ms of error (which is fine for this use case).

This works, but feels hacky. It also looks as it may impact compression based on info here.

Are other approaches for handling this situation?

scott · February 11, 2020, 9:46pm

@kim, I’d have to look at some sample data to give you a definitive “yes,” but if I understand the nature of your data and what you’re trying to accomplish, it is possible with Flux. I don’t know that you need to pivot by both time and location_group though. The table groupings should already group by location_group after the pivot.

from(bucket: "example-bucket")
  |> range(start: -1h)
  |> pivot(rowKey: ["_time", "location_group"], columnKey: ["_field"], valueColumn: "_value")

kim · February 12, 2020, 3:26am

Awesome. Thanks @scott.

The pivot uses both the time and location_group as keys to reduce the probability of a hash collision. It drops the collision space from all location identifiers, to just those within a location group (there’s a few other tags that are used for keying in the ‘real’ model too). This lets this setup get away with a smaller hash output (less timestamp noise), at the expense of a little more complexity when pivoting.

Topic		Replies	Views
Fields and tags InfluxDB 2 influxdb , time-series , python	1	1203	November 5, 2020
Reassemble Influx query InfluxDB 2 influxdb	3	598	May 20, 2021
Influxdb 2 - getting tags and fields grouped InfluxDB 2 influxdb , query	11	2283	March 30, 2022
Searching for libraries in influxdb2 to apply large data processing techniques InfluxDB 2	0	119	March 13, 2024
Merge fields into one row, even with different timestamps? Fluxlang schema , time , query , flux , join	0	28	January 22, 2025

Maintaining field relationships

Related topics