Strategies for linking time-series and relational data

mkarlesky · May 9, 2019, 4:57pm

I have two different use cases that share a need to link time-series data and relational data. I’m interested in strategies on how to best accomplish this.

The particulars aren’t especially important. But to paint a picture with an invented example… Imagine a manufacturing line where certain metrics are clearly time-series — e.g. defects, widgets produced — but these metrics are also tied to clearly relational data — e.g. a widget is built from a particular bill of material filled by specific lots of parts from different vendors. Say we wanted to correlate defects to vendors. It’s best to avoid forcing Influx to store complex relationships it’s not designed for. Perhaps it could be done in simple cases, but I think our use cases are over that threshold of simplicity.

At first blush it seems we need one or more unique IDs to tie things together. Along with this comes necessary but manageable complications in maintaining unique IDs and linked records in disconnected systems.

Generally, are there other ways to accomplish something like the above? Entirely different strategies than a unique ID to link records in different databases? If linking IDs are the best option, are there existing good practices or even tools to help accomplish this?

I’m interested in the larger philosophical thinking and design than drilling down into the specifics of our use cases (which are not yet well defined anyhow).

Thanks in advance for any guidance or resources.

Anaisdg · May 16, 2019, 7:10pm

Hello @mkarlesky,

Good question! I’m not entirely sure what the best practices are. You might want to take a look at these: SideloadNode | Kapacitor 1.5 Documentation
telegraf/plugins/inputs/postgresql_extensible at master · influxdata/telegraf · GitHub

You’re on the right track though. You would probably need some unique ID’s to tie things together. This reminds me of how trucking companies use geohashes with influxdb to get an approximation for the trucks’ position. Maybe you could apply a similar logic? Maybe you could have tags that describe a range of vendors and then use sideload to add the exact ID once you’ve identified a default? Using some process of elimination logic? Again, great question!

Anaisdg · May 16, 2019, 8:02pm

@mkarlesky
Here’s a more eloquent answer from a coworker “People will either store extra data needed as tags or do a DIY implementation to add a tag that points back to data in another system”.

mkarlesky · July 19, 2019, 3:50pm

Thank you, @Anaisdg. The more eloquent answer is the approach we’re taking. It’s nice to have some confirmation. It was hard to envision any other way to do it, but we thought it best to ask.

mkarlesky · July 30, 2019, 4:37pm

UPDATE: Flux’s ability to join multiple data sources is ultimately one of the best approaches for the use case I described.

Topic		Replies	Views
Feedback requested / tags and grouping Store influxdb , schema	0	486	December 31, 2021
Relate Data for downSample InfluxDB 2	2	285	June 14, 2022
Should i couple InfluxDB with another Database for long but rare info? influxdb	2	1269	September 26, 2017
Storing millions of unique ID's time-based Tasks influxdb , time-series	1	808	June 29, 2022
Grouping/correlating data InfluxDB 2	5	889	June 19, 2019

Strategies for linking time-series and relational data

Related topics