Can't emulate a simple query from mysql in flux (calculating extra column)

Hey,

I recently started using influxDB 2.0.8 with flux (we are planning migration from MYSQL to influx) integrated with grafana. I want to replicate something like this in flux (__timeGroupAlias is a grafana function for grouping data into intervals in MYSQL):

SELECT
  $__timeGroupAlias(date, ${Interval}),
  MEAN(mem_fragmentation_ratio) AS "ALIAS_1",
  MEAN(hit_rate) AS "ALIAS_2",
  SUM(keyspace_hits_diff) / (SUM(keyspace_misses_diff) + SUM(keyspace_hits_diff))  AS "ALIAS_3"
FROM MY_TABLE
GROUP BY 1

But I don’t know how, I tried using pivot (it is not ideal as i need to calculate MEAN of MEANs instead of MEAN of SUMs, but it would be something to start with):

common = from(bucket: "redis_stats")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "redis_info")
  |> filter(fn: (r) => r["role"] == "cluster")
  |> filter(fn: (r) => r["clusterName"] == "mobile_cluster")
  |> drop(columns: ["clusterName", "role"])

new_columns = common
  |> filter(fn: (r) => r["_field"] == "keyspace_hits_diff" or r["_field"] == "keyspace_misses_diff")
  |> pivot(
    rowKey:["_time"],
    columnKey: ["_field"],
    valueColumn: "_value"
  )
  |> map(fn: (r) => ({r with
      _value: (float(v: r.keyspace_hits_diff) / float(v: (r.keyspace_hits_diff + r.keyspace_misses_diff))),
      _field: "current_hit_ratio"
    })
  )
  |> drop(columns: ["keyspace_hits_diff", "keyspace_misses_diff"])
  

old_columns = common
  |> filter(fn: (r) => r["_field"] == "mem_fragmentation_ratio" or r["_field"] == "hit_rate")

union(tables: [new_columns, old_columns])
  |> aggregateWindow(every: 10s, fn: mean, createEmpty: true)
  |> yield()

But this does not work:

In data Explorer _field is null instead of “current_hit_ratio”. (Btw there is also a bug with _value column values are shown in millions while they are floats of 0.74 and 0.53).
In Grafana there is no value bug, but name is set to redis_info insted of desired “current_hit_ratio”.

I am guessing that empty field may be because for table new_columns column “_field” does not belong to group key of that table. But I don’t know how to deal with that.

Is there a way to fix my approach and is there a better one. I can only think of custom aggregation function, but I would need to read more into that in order to judge, whether it’s a viable approach.

Also I don’t know how to alias fields in flux, but that is less important.

Hello @Mikolaj_G,
Can you please share the SQL query you want for the mean of means? I’d like to help you get exactly what you are looking for.
Why do you drop the cluster name? I think you might just want to group instead?
It would be super helpful if you could apply |> limit(n:5) to your common variable and export the results of just that data and share it with me.

I don’t know why the UI isn’t showing your data as expected. You might want to try hitting the customize button and changing the Y-Value Unit Prefix to none, although I’m assuming you’re right and that’s just a bug. I would hugely appreciate it if you could submit an issue here:

If I were to translate your SQL query directly I would do:

common = from(bucket: "redis_stats")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "redis_info")
  |> filter(fn: (r) => r["_field"] == "mem_fragmentation_ratio" or r["_field"] == "hit_rate"  or r["_field"] == "mem_fragmentation_ratio" or  r["_field"] == "keyspace_hits_diff"  or  r["_field"] == "keyspace_misses_diff"  )
|> fieldsAsCol() 

ALIAS_1 = common 
|> mean( column: "mem_fragmentation_ratio") 
|> rename(columns: {mem_fragmentation_ratio: "_value"})
|> yield(name:"ALIAS_1")

ALIAS_2 = common
|> mean( column: "hit_rate") 
|> rename(columns: {hit_rate: "_value"})
|> yield(name:"ALIAS_2")

ALIAS_3 = common
|> map(fn: (r) => ({r with
      _value: (float(v: r.keyspace_hits_diff) / float(v: (r.keyspace_hits_diff + r.keyspace_misses_diff))),
      _field: "current_hit_ratio"
    })
  )
|> yield(name:"ALIAS_3")

The union confuses me. I don’t see what part of your SQL query it addresses.
Also union() does not preserve the sort order of the rows within tables. A sort operation may be added if a specific sort order is needed.

Hello @Anaisdg,
Thank you very much for helping me out and I am sorry I could not respond earlier.

I don’t know how to send you CSV with results, the “Upload” option does not let me.

For now I was making dashboard for specific cluster in grafana and passed cluster name as “${Cluster}”. I dropped them because otherwise every plotted line had the following appended to its name: {clusterName="mobile_cluster", role="cluster"}. When every plotted line has exactly the same thing appended it does look bad, that is why I removed it.

Changing Y-Value Unit Prefix form default SI worked well, however I needed to change it to binary not to None (both SI and None had the same effect). I will add an issue on github shortly.

Thanks to your query I managed to figure out how to write what I’ve wanted (I didn’t know you can have multiple yields that is why I tried adding a join). This is the corresponding flux code to my original my SQL query:

import "influxdata/influxdb/schema"

common = from(bucket: "redis_stats")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "redis_info")
  |> filter(fn: (r) => r["role"] == "cluster")
  |> filter(fn: (r) => r["clusterName"] == "cookie_matching_cluster")
//  |> drop(columns: ["role", "clusterName"])

MEANS = common
  |> filter(fn: (r) => r["_field"] == "mem_fragmentation_ratio" or r["_field"] == "hit_rate")
  |> aggregateWindow(every: 10s, fn: mean, createEmpty: true)
  |> schema.fieldsAsCols()

SUMS = common
  |> filter(fn: (r) => r["_field"] == "keyspace_hits_diff" or r["_field"] == "keyspace_misses_diff")
  |> aggregateWindow(every: 10s, fn: sum, createEmpty: true)
  |> schema.fieldsAsCols()

// Results

TOTAL_HIT_RATE = MEANS 
//   |> drop(columns: ["mem_fragmentation_ratio"])
  |> rename(columns: {hit_rate: "_value"})
  |> yield(name:"Total hit rate")

CURRENT_MEMORY_FRAGMENTATION_RATIO = MEANS 
//   |> drop(columns: ["hit_rate"])
  |> rename(columns: {mem_fragmentation_ratio: "_value"})
  |> yield(name:"Current memory fragmentation ratio")

CURRENT_HIT_RATE = SUMS
  |> map(fn: (r) => ({ r with current_hit_rate: float(v: r.keyspace_hits_diff) / float(v: (r.keyspace_misses_diff + r.keyspace_hits_diff))}))
//   |> drop(columns: ["keyspace_hits_diff", "keyspace_misses_diff"])
  |> rename(columns: {current_hit_rate: "_value"})
  |> yield(name:"Current hit rate")

However, there are graphing issues with this query… When I run the query with dropping commented out I get first two graphs (first in InfluxDB explorer second in grafana). The second graph has ALL the columns, so I decided to drop all the columns but “_value” column (uncommented all drops), which leaves me with graphs three and four. Graph four is far from ideal as all plot lines have the same name… Eventually what kind of worked out for me was the query below, that produced the last graph (in grafana).

Note I had to merge all images into one as new users can have only one media file per post.

import "influxdata/influxdb/schema"

common = from(bucket: "redis_stats")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "redis_info")
  |> filter(fn: (r) => r["role"] == "cluster")
  |> filter(fn: (r) => r["clusterName"] == "cookie_matching_cluster")
  |> drop(columns: ["role", "clusterName"])

MEANS = common
  |> filter(fn: (r) => r["_field"] == "mem_fragmentation_ratio" or r["_field"] == "hit_rate")
  |> aggregateWindow(every: 10s, fn: mean, createEmpty: true)
  |> schema.fieldsAsCols()

SUMS = common
  |> filter(fn: (r) => r["_field"] == "keyspace_hits_diff" or r["_field"] == "keyspace_misses_diff")
  |> aggregateWindow(every: 10s, fn: sum, createEmpty: true)
  |> schema.fieldsAsCols()

// Results

TOTAL_HIT_RATE = MEANS 
  |> drop(columns: ["mem_fragmentation_ratio"])
  |> drop(columns: ["_stop", "_start"])
  |> yield(name:"Total hit rate")

CURRENT_MEMORY_FRAGMENTATION_RATIO = MEANS 
  |> drop(columns: ["hit_rate"])
  |> drop(columns: ["_stop", "_start"])
  |> yield(name:"Current memory fragmentation ratio")

CURRENT_HIT_RATE = SUMS
  |> map(fn: (r) => ({ r with current_hit_rate: float(v: r.keyspace_hits_diff) / float(v: (r.keyspace_misses_diff + r.keyspace_hits_diff))}))
  |> drop(columns: ["keyspace_hits_diff", "keyspace_misses_diff"])
  |> drop(columns: ["_stop", "_start"])
  |> yield(name:"Current hit rate")

It is what I originally wanted, but it feels like dropping “_start” and “_stop” columns should not be necessary, I didn’t see it in any example on grafana website. Am I doing something wrong here? There is almost no documentation on grafana-influxDB integration so most of what I am doing is guesswork…

Hello @Mikolaj_G,
I agree you shouldn’t have to drop those columns. Thanks for putting in the effort.
So just to check you’re getting a visualization problem with Grafana? The output is as you expect WITHOUT having to drop the start and stop columns in InfluxBD?

Hello @Anaisdg,

Thank you for you help so far!

Yes the issues are with graphs in grafana.

Yes the output is as expected using influxDB data explorer WITHOUT dropping any columns (first graph from the top in the last image), while in grafana the same query results in a complete mess (second graph from the top).

I have been using grafana for a while now and I have a lot of dashboards there, that is why I would really like to have everything in one place…