Immutable Storage for InfluxDB

#1

I am exploring some concepts around using Influx as an immutable datastore for a few different use cases. There are some scenarios in manufacturing where this could be an attractive feature.

Two concepts come to mind and I wanted to check to see if others had created or are working on similar solutions.

The first level of “protection” would be to disallow writing any value that precedes the latest entry. Said another way, don’t allow insertion or update of any “old” data. I speculate a window would need to be defined for what “old” data is. I have reviewed the configuration file and I can’t seem to find any obvious methods for implementing this restriction.

The second level of “protection” might involve something along the lines of calculating a hash of a data file once it is written to disk and all compaction and other maintenance are complete. This hash would then be persisted to an alternate storage location that is also immutable. Yes, maybe blah blah blockchain but that’s a detail of implementation. Now this hash can be used to verify the integrity of the data file.

Curious if anyone else has solved one or both of these challenges or if you are working on these.

Thanks,
Andy

#2

Hi Andy,

I am also looking into making a already existing Influx database immutable.

Your second approach is what i am currently trying to build on my own. Though I am beginning.

My question would be:
A) What kind of hash function would you use?
B) Why even having a data file, while you could send the calculated hash already to an immutable storage (like a etherium contract)?
C) How to structure the hash function when new data is to be expected? (continuous writes)
D) What if a retention policy is setup - how to handle the immutable storage on the other side with already deleted data?

Hope to get the conversation started.

Kind regards
Marco

#3

Cool that there are others with a similar interest. I bet there are more.

My responses
A) What kind of hash function would you use? --> Probably something basic like SHA-256

B) Why even having a data file, while you could send the calculated hash already to an immutable storage (like a etherium contract)? --> Not sure I understand this. Concept is that Influx writes to storage file as always and we just commit a hash of that file to immutable storage somewhere so that you can verify the file has not changed after the hash was calculated.

C) How to structure the hash function when new data is to be expected? (continuous writes) --> My concept is that you don’t calculate and commit the hash until the data file has stopped changing. With Influx retention policies, compaction, sharding etc. you’d have to wait till that was complete before capturing hash.

D) What if a retention policy is setup - how to handle the immutable storage on the other side with already deleted data? --> In these environments you would configure the RP to maintain data for as long as the agreement dictates. In regulated manufacturing (like pharmaceuticals) the data must typically be retained until the expiration data of the drug + some amount of years. Environmental data in chemical plants has different requirements. In a scenario where this is data that substantiates a piece of equipment ran a particular way and met performance requirements, the two parties involved in the contract would have to agree. This is an example of a fundamental difference between what it seems the typical Influx use case is with short term storage for manufacturing historians that can easily maintain data for many years, sometimes 10-15. A dramatically different use case.

The overarching idea I think is layers of protection.

First, protect from the basic attack of a user inserting old data to overwrite already written data. Once you have protected against that then you need to protect against the file being directly manipulated. This is where the concept of the has comes in. I don’t have a great idea on how you prevent manipulation while the file is open and being written to, short of recalculating the has every x seconds or x minutes but that doesn’t help because you don’t know if it was Influx writing and changing data or someone else.

More questions than answers.

-Andy