Architecture to combine meta data source with influx timeseries data

#1

Hi there,

As an employee of a software company, I currently investigate big data technologies and especially time series databases for a rather big industrial customer in an IoT scenario.

The customer wants to “go for big data” with all of his sensor data generated inside his factories. He aims to consolidate knowledge, be able to react faster on certain events and do better predictions with the aggregated data. It’s pretty much the same as all big data IoT guys want to have.

In the very first step, we will import the sensor data into a time series database and we currently evaluate InfluxDB as our tool of choice. Once decided for, this step will of course (or hopefully?) be easily implemented.

In a second step, we need to feed the sensor data with some kind of meta information. For instance: We have a factory. Each factory has multiple buildings. Each building has multiple machines. Each machine has multiple sensors. In order to gather knowledge out of the data, it must be possible to aggregate and work with the data in a structured manner like so: “If the power consumption measured over all machines in building A exceeds the average of all buildings power consumption by a factor of 2, then do something… Trigger a script or such”. Or a much simpler usecase: “Compute the difference between the total building power consumption minus the energy generated by the on-factory-power-plant. If that difference exceeds a certain threshold, make sure that our factory power plant gets some kind of notification to power up a bit more if possible.”

We can expect the structure to be hierarchical, a tree to be precise or to be even more exact: Multiple trees. We could have one tree for “logical data”, one tree for “financial aggregated data”, and so on…

We do not yet worry about the time series aggregation calculation which is a topic for itself. (What happens if some sensor doesn’t send data? What if they sample in different intervals, what if they have different offsets in sampling, how to interpolate between?, …), but this will definitely be a topic in (near to mid) future.

We now want to evaluate, if and how at best, we can deal with that structure-meta-data in the Influx-environment. On one hand, we have our meta-data service holding the graph (and more meta data about the devices) in some kind of registry (most likely not Influx as this is not time dependendent data but classical CRUD-User/Sensor-Management) and on the other hand, we have the timeseries database. At the end, we want to kind of marry those two.

I took a short look at Kapacitor and liked that a user is able to write his own rules in a JavaScript like syntax(, where I would have liked it even more if it would have been fully JavaScript compliant). The question is: Could I kind of combine writing the rule with my metadata? Currently, I think for myself, it will be the easiest way to let the user who wants to generate a rule perform two kind of operations : He first must create a computed timeseries with the data he cares about. Therefore, he would need to send an API request to the meta-data-service in terms of “Create a measurement for me which spans over all of the buildings power-income-sensors”, where the API would create a continuous query or kapacitor streaming script which then aggregates that data into a measurement. (The meta-data-service would do the “translation” from “building power-income-sensors” to “sensor-34589093-power + sensor-49853495-power + sensor-2389243-power + …”. Once the service did that translation, it would upload a read-to-run Kapacitor-Script, where Kapacitor would know everything needed about the involved measurements. If the computed measurement from the previous script is stored inside the InfluxDB, the user can then write his own Kapacitor script on his behalf working on the generated measurement as he likes to, e.g. write his alarming rules. This way, there would be no need for changes or Addons in Kapacitor or InfluxDB. However, in terms of the user’s view, he would need to perform two actions in order to make his rule run instead of writing only one rule…

I write to you as I think, this kind of problem is quite common. We do almost never have only raw sensor data but usually have some kind of meta information at hand which describes the sensor in more detail and which we might want to use in our computations. I also think it is likely that this meta information is not stored alongside with the time series data. (Whereas I can imagine to store this data inside my measurements in form of tags if I want to use the information in computations). I am wondering if any of you had to deal with the same issue and which kind of solutions you came up with?! I would also like to know if there is a need for a better integrated solution: Would it be worth the effort to write a plugin into Kapacitor which could kind of extend the “identifier evaluation” and query the resolution of an unknown identifier by a REST API or such?

For a solution where I don’t change the code of Influx/Kapacitor:
-Do you think it is better to write my metadata in an own measurement? E.g. I have the metadata “upstream resistance” which was up to 2017-01-01 100 Ohms and since then 1000 Ohms and thus make it useable from inside Kapacitor rule about a measurement. Or:
-Do you think it is better to write my metadata into tags from a measurement? Can I make a new write to always use the last used tagset (fillna…)?
Or do you think, I would benefit a lot from Influx/Kapacitor code changes/plugins:
-What should be done in there then?
-How much would it cost?
-Who else would benefit or use such plugins?

I am very grateful for each response on any of those questions and would be happy if a discussion about this topic would arise.

Best regards

4 Likes