I’ve started working on this when I saw that the histogram() function wasn’t implemented in InfluxQL, and before I discovered that it was in Flux.
I continued because we’re not planning to migrate to InfluxDB 1.7 soon (which is the first version where Flux is integrated if I’m correct), and also because I feel like I’m close to making it work.
I’m posting here because I need help to solve a data combination operation
My requirements are:
store histograms indefinitely
compute and store histograms for multi-fields measurements
and of course, compute percentiles from histogram data
I haven’t worked yet on point 3, which will be in the alerting part of the project, but for the other two points, I’ve created a set of two TICK scripts.
A “data-quantizer” stream script which:
defines the bins’ logarithmic scheme for the histogram, according to an estimation of data values range (the bin scheme extends dynamically if the value range has been underestimated)
computes the bin# in which falls each data point
and stores the “quantized” data points in a short-lived measurement
A “data-stacker” batch script which regularly:
extracts the quantized data points, and computes their distribution by bin#
extracts the last accumulated histogram data from the long duration measurement
adds the newly computed frequencies to the accumulated ones, bin# by bin#
stores the result as a new data point in the long duration measurement
Now I’m stuck on the adding operation described in the third dot above
If you use Telegraf, you could try the histogram aggregator plugin to see if it suits your needs. If it does not, you could start from its code and code a new plugin that works as you intend. You’ll write the logic in go so it will be easier than figuring out a query that process the data as intended.
Thanks @samaust for your suggestion, but I’m afraid I don’t have the Go skills -nor the time- to edit a Telegraf plugin’s code
One of my requirements is to keep the complete distribution of values ever collected for the whole life of my application (10-15 years), without having to keep so many years of history inside InfluxDB. Hence, I need perpetually stacking bins.
InfluxDB’s 64 bits signed integers allow to increment a number by 1 every second for more than 292 billion years, so it shouldn’t be a problem to stack forever.
From what I understand from the Telegraf plugin, the stacking only occurs in the runtime context of the Telegraf daemon, and gets reset whenever it’s restarted.
I’ve also checked Flux’s histogram() function, and in my understanding, it doesn’t allow to do that either. It builds a static histogram from a static input table.
I need a periodically refreshed histogram to perpetually store the distribution of incoming data!