We’re using Influx to store sensor data. It’s a great fit.
We’re now wrestling with the best way to manage our “bad” data. Sometimes sensors are noisy. Sometimes a step in a maintenance procedure is missed and wonky measurements end up in the database. There’s any number of scenarios that can cause known bad data to be stored.
Our analytics and visualizations can be directly impacted by this. In a previous manual, non-Influx-based workflow we’d identify ranges of bad data from logs (i.e. X action raised temperature in zone Y outside its normal parameters over time span Z) or by simple inspection & investigation (“That doesn’t look right. Hey, Bob…”). That bad data would be culled before processing it.
What’s a good strategy and tooling setup to deal with data in a similar way in Influx? Quite honestly we’d like to keep the bad data so we can use it to see patterns and better our procedures.
We think a workable solution is to manually tag measurements with a category as to its validity: good or bad (and what kind of badness). Queries can then filter for either appropriately. Of course, the question is how to construct a setup that easily and safely allows us to tag bad data and later edit those tags if necessary. Manually filtering out bad data at the visualization step is not practical in our usage.
Is what we’re thinking reasonable, or is there a better way? Any thoughts on best practices and how to implement those practices with specific tools for Influx are much appreciated.