Coarsely speaking, our problem is that we monitor complexe systems whose state is distributed between channels (sensors) and time and we need to intelligently re-merge these streams of data at the server to create actionable information.
Let me give one such example: we monitor industrial data and while generating alerts at the source (e.g. sensor X emitting a message that component Y has failed is sometimes an option), we absolutely require that we be able to verify (functionally) that component Y has not failed.
In the above scenario, imagine we have signal and sense feedback on breaker A (what position it’s being controlled to be in versus what position it is in), that breaker powers motor B which starts engine C. Engine C might have an intelligent “failed to start” flag in it.
However, we want to be able to check continuously that if at time t0 A was given a close signal, at time t1 A’s feedback should read high, at time t2 motor B should be spinning (threshold rpm value), and at t3 engine C is running at designated RPM. This is one such thread of actionable information, but I could also be checking A and B combined with another effect D. And I could also be checking C against A’ and B’, meaning these aren’t readily pre-processable into their own “final destination” data streams. In other words, the problem isn’t merely that the data is ‘denormalized’.
All of these signals will be coming from completely heterogeneous sensors and we need to be able to time-align them and time-shift them.
We currently achieve this workflow by downsampling, change detecting, timeshifting, pivoting, forward filling, and doing a multi-column logic statement on the pivoted table, we then unpivot this into a time-series column that says “OK/NOK”. Doing these checks at ingest time is/can be painful because we get data arriving out of order on a 10-15 horizon. (although we do pre-process some of our data using skylark and imbue “last-seen state” into parallel streams).
Note: that there are too many combinations of usable threads of information for us to just “write them all back to the database”. We need to be able to query and explore.
I can easily imagine an example along these lines where chemical properties of a biological life-support system are monitored (e.g. aquaculture) where you have very long lag markers that need to be merged back into individual streams.
If I were to summarize it, it would be that our measurements are multiple semi-stochastic views of an underlying process that cannot be directly observed.
I cannot overstate how valuable the ability to diagnose these things are in our market segment.
As an aside: given my understanding of how flux works, is the push-down query not the only problem that needs solving? The in-memory part already works, does it not? Why reinvent that part in js if the in-memory part already works?
The value-add of flux is its syntax and logical design intent. I would be absolutely happy with a “sub-optimally” performing flux interpreter for tasks forcing me to correctly do my pushdown queries. Heck, as I type this right now, it occurs to me I’d be perfectly happy using flux using a sql.from() |>
construct where the push-down query is explicitly passed down to the IOx server.