hi there,
I was trying to get this query here replaced with flux: select count(distinct(tracking_id)) from em where time > now() -15h group by time(1h)
I created a flux query that looks like this, but is creating totally different results somehow.
from(bucket:"genios/autogen")|>range($range)|> filter(fn: (r) =>r._measurement == "em" and r.env == "production" and r._field == "tracking_id" ) |> group(columns:["tracking_id"])|>aggregateWindow(every:1h,fn:count)
We use grafana, and I want to calculate a percentage between two measurements.
scott
March 30, 2020, 3:04pm
2
Just to make sure I understand correctly, you’re trying to count the number of distinct tracking_id
s per hour? If so, this should do it:
from(bucket: "db/autogen")
|> range(start $range)
|> filter(fn: (r) => r._measurement == "em")
|> distinct(column: "tracking_id")
|> group()
|> window(every: 1h)
|> count()
You also submitted this issue on the Flux repo that is a little further down the road and tries to join the two streams with updated field/tag names. (Is tracking_id
/tid
a field or a tag? If it’s a tag, it’ll change the query)
I’m not sure I fully understand your use case. Can you describe the schema of the input data and your desired output?
Thanks for getting back…
Your suggestion does not work, as I need to join it.
I have two measurements, one of which I need to count the number of unique tracking_ids (or tid - there is a in my example).
The other one just counts the number of rows more or less
Then I want to join those two and calculate a percentage
And This I want to do with several measurements
Context: Its email sending / opening events
scott
March 30, 2020, 3:21pm
4
Do you need a percentage per unique tracking ID? Or just the number of unique tracking IDs / number of total rows/events?
I just need a total numer of unique tracking ids per hour for example. Or per day maybe.
scott
March 30, 2020, 3:55pm
6
Ok, a few more questions:
Is tracking_id
/tid
a field or a tag?
What percentage are you trying to calculate?
Sorry,
the tracking_id
is a field. I need to calculate the distinct number of tracking ids in one measurement and put it in relation of the total count of entries in another measurement.
Context: one measurement contains one entry per mail being sent. The other one contains one entry for mail being opened. But mails are opened more than once, so I need a distinct count on the 2nd measurement.
this is the complete flux query:
sm=from(bucket:"genios/autogen")|>range($range)|> filter(fn: (r) =>r._measurement == "monitoring" and r.env == "prod" and r.sent == "yes" and r._field == "tracking_id") |>group(columns:["tracking_id"]) |>aggregateWindow(every: 1h, fn: count)
op=from(bucket:"genios/autogen")|>range($range)|> filter(fn: (r) =>r._measurement == "email_opened" and r.env == "production" and r._field == "tracking_id" ) |> group(columns:["tracking_id"])|>aggregateWindow(every:1h,fn:count)
join(
tables: {sentMails:sm, openedMails:op},
on: ["_time", "_stop", "_start"]
)
>> map(fn: (r) => ({
_time: r._time,
_value: (float(v:r._value_openedMails) / float(v:r._value_sentMails))
})
)
And yes, there is a differnce between prod and production…
Cheers,
Stephan
scott
March 30, 2020, 9:19pm
8
Try this:
sent_mail = from(bucket:"genios/autogen")
|> range($range)
|> filter(fn: (r) => r._measurement == "monitoring" and r.env == "prod" and r.sent == "yes" and r._field == "tracking_id")
|> aggregateWindow(every: 1h, fn: count)
|> truncateTimeColumn(unit: 1h)
opened_mail =from(bucket:"genios/autogen")
|> range($range)
|> filter(fn: (r) => r._measurement == "email_opened" and r.env == "production" and r._field == "tracking_id" )
|> aggregateWindow(every: 1h, fn: (tables=<-, column) => tables |> distinct() |> count())
|> truncateTimeColumn(unit: 1h)
join(
tables: {sentMails: sent_mail, openedMails: opened_mail},
on: ["_time"]
)
>> map(fn: (r) => ({
_time: r._time,
_value: (float(v:r._value_openedMails) / float(v:r._value_sentMails))
})
)
I do a couple of things in here:
I high-jack the aggregateWindow
fn
parameter to do both a distinct and count on each window.
I normalize the timestamps in case there are small differences between the time values returned from aggregateWindow
in each stream.
this does not work with influx 1.7.8 (sorry, did not mention that before):
sent_mail = from(bucket:"genios/autogen") |> range(start:-2h) |> filter(fn: (r) => r._measurement == "monitoring" and r.env == "prod" and r.sent == "yes" and r._field == "tracking_id") |> aggregateWindow(every: 1h, fn: count) |> truncateTimeColumn(unit: 1h)
Error: type error 1:230-1:248: undefined identifier "truncateTimeColumn"
scott
March 30, 2020, 9:44pm
10
Oh, ok. Try removing the truncateTimeColumn
calls. You may be ok without those.
ok… tried that. the singel queries work and do return something, but the join fails:
> join(
tables: {sentMails: sent_mail, openedMails: opened_mail},
on: ["_time"]
)
>> map(fn: (r) => ({
_time: r._time,
_value: (float(v:r._value_openedMails) / float(v:r._value_sentMails))
})
)
Error: loc 0:0-0:0: missing right hand side of expression
I try to run it in influx shell, via Grafana I do not get proper errors back.
This looks like a syntax error and I can’t tell, where… I also tried writing it in one line with a lot of whitespace in between (had some issues with the shell not working properly with newlines), but it did not help…
scott
March 30, 2020, 10:23pm
12
Does the join (without the map) return anything? I’m wondering if the _time
values are different an you don’t actually join anything…
If join does return something, try:
join(
tables: {sentMails: sent_mail, openedMails: opened_mail},
on: ["_time"]
)
|> map(fn: (r) => ({ r with
_value: float(v: r._value_openedMails) / float(v: r._value_sentMails)
})
)
yes, the join itself seems to do something.
but your fix does not work, still the same error.
I am trying the last 2 hours, there is data there, mails are sent once an hour and opened… well, when they are opened might be, that now there are none, but I doubt it…
scott
March 30, 2020, 10:30pm
14
Do the _value_openedMails
and _value_sentMails
columns both exist in the join output?
yes… several times… the output is grouped weirdly:
olution_openedMails, solution_sentMails, type, username]
_field_openedMails:string _field_sentMails:string _measurement_openedMails:string _measurement_sentMails:string _start_openedMails:time _start_sentMails:time _stop_openedMails:time _stop_sentMails:time env_openedMails:string env_sentMails:string sent:string solution_openedMails:string solution_sentMails:string type:string username:string _value_sentMails:int _time:time _value_openedMails:int
------------------------- ----------------------- ------------------------------- ----------------------------- ------------------------------ ------------------------------ ------------------------------ ------------------------------ ---------------------- ---------------------- ---------------------- --------------------------- ------------------------- ---------------------- ---------------------- -------------------------- ------------------------------ --------------------------
tracking_id tracking_id email_opened monitoring 2020-03-30T20:28:50.449664670Z 2020-03-30T20:28:50.449664670Z 2020-03-30T22:28:50.449664670Z 2020-03-30T22:28:50.449664670Z production prod yes WebApp1 WebApp2 search user1 0 2020-03-30T21:00:00.000000000Z 1
tracking_id tracking_id email_opened monitoring 2020-03-30T20:28:50.449664670Z 2020-03-30T20:28:50.449664670Z 2020-03-30T22:28:50.449664670Z 2020-03-30T22:28:50.449664670Z production prod yes WebApp1 WebApp2 search user1 0 2020-03-30T22:00:00.000000000Z 9
tracking_id tracking_id email_opened monitoring 2020-03-30T20:28:50.449664670Z 2020-03-30T20:28:50.449664670Z 2020-03-30T22:28:50.449664670Z 2020-03-30T22:28:50.449664670Z production prod yes WebApp1 WebApp2 search user1 0 2020-03-30T22:28:50.449664670Z 2
Table: keys: [_field_openedMails, _field_sentMails, _measurement_openedMails, _measurement_sentMails, _start_openedMails, _start_sentMails, _stop_openedMails, _stop_sentMails, env_openedMails, env_sentMails, sent, solution_openedMails, solution_sentMails, type, username]
_field_openedMails:string _field_sentMails:string _measurement_openedMails:string _measurement_sentMails:string _start_openedMails:time _start_sentMails:time _stop_openedMails:time _stop_sentMails:time env_openedMails:string env_sentMails:string sent:string solution_openedMails:string solution_sentMails:string type:string username:string _value_sentMails:int _time:time _value_openedMails:int
------------------------- ----------------------- ------------------------------- ----------------------------- ------------------------------ ------------------------------ ------------------------------ ------------------------------ ---------------------- ---------------------- ---------------------- --------------------------- ------------------------- ---------------------- ---------------------- -------------------------- ------------------------------ --------------------------
tracking_id tracking_id email_opened monitoring 2020-03-30T20:28:50.449664670Z 2020-03-30T20:28:50.449664670Z 2020-03-30T22:28:50.449664670Z 2020-03-30T22:28:50.449664670Z production prod yes WebApp1 WebApp2 search user2 0 2020-03-30T21:00:00.000000000Z 1
tracking_id tracking_id email_opened monitoring 2020-03-30T20:28:50.449664670Z 2020-03-30T20:28:50.449664670Z 2020-03-30T22:28:50.449664670Z 2020-03-30T22:28:50.449664670Z production prod yes WebApp1 WebApp2 search user2 0 2020-03-30T22:00:00.000000000Z 9
tracking_id tracking_id email_opened monitoring 2020-03-30T20:28:50.449664670Z 2020-03-30T20:28:50.449664670Z 2020-03-30T22:28:50.449664670Z 2020-03-30T22:28:50.449664670Z production prod yes WebApp1 WebApp2 search user2 0 2020-03-30T22:28:50.449664670Z 2
Table: keys: [_field_openedMails, _field_sentMails, _measurement_openedMails, _measurement_sentMails, _start_openedMails, _start_sentMails, _stop_openedMails, _stop_sentMails, env_openedMails, env_sentMails, sent, solution_openedMails, solution_sentMails, type, username]
_field_openedMails:string _field_sentMails:string _measurement_openedMails:string _measurement_sentMails:string _start_openedMails:time _start_sentMails:time _stop_openedMails:time _stop_sentMails:time env_openedMails:string env_sentMails:string sent:string solution_openedMails:string solution_sentMails:string type:string username:string _value_sentMails:int _time:time _value_openedMails:int
------------------------- ----------------------- ------------------------------- ----------------------------- ------------------------------ ------------------------------ ------------------------------ ------------------------------ ---------------------- ---------------------- ---------------------- --------------------------- ------------------------- ---------------------- ---------------------- -------------------------- ------------------------------ --------------------------
tracking_id tracking_id email_opened monitoring 2020-03-30T20:28:50.449664670Z 2020-03-30T20:28:50.449664670Z 2020-03-30T22:28:50.449664670Z 2020-03-30T22:28:50.449664670Z production prod yes WebApp1 WebApp2 search user3 0 2020-03-30T21:00:00.000000000Z 1
tracking_id tracking_id email_opened monitoring 2020-03-30T20:28:50.449664670Z 2020-03-30T20:28:50.449664670Z 2020-03-30T22:28:50.449664670Z 2020-03-30T22:28:50.449664670Z production prod yes WebApp1 WebApp2 search user3 0 2020-03-30T22:00:00.000000000Z 9
tracking_id tracking_id email_opened monitoring 2020-03-30T20:28:50.449664670Z 2020-03-30T20:28:50.449664670Z 2020-03-30T22:28:50.449664670Z 2020-03-30T22:28:50.449664670Z production prod yes WebApp1 WebApp2 search user3 0 2020-03-30T22:28:50.449664670Z
just did it’s more than that, but it all looks similar… just noticed - there are usernames in there. now its a bit more anonymous.
scott
March 30, 2020, 10:48pm
18
Ok, I think I see the problem. All the _value_sentMails
are 0
. You can’t divide by 0
. But I think the reason you’re getting all the 0s is because your data is getting grouped by all the different tag values so you’re getting a bunch of separate output tables when you aggregate everything.
Try this:
sent_mail = from(bucket:"genios/autogen")
|> range($range)
|> filter(fn: (r) => r._measurement == "monitoring" and r.env == "prod" and r.sent == "yes" and r._field == "tracking_id")
|> group(columns: ["_field"])
|> aggregateWindow(every: 1h, fn: count)
opened_mail =from(bucket:"genios/autogen")
|> range($range)
|> filter(fn: (r) => r._measurement == "email_opened" and r.env == "production" and r._field == "tracking_id" )
|> group(columns: ["_field"])
|> aggregateWindow(every: 1h, fn: (tables=<-, column) => tables |> distinct() |> count())
join(
tables: {sentMails: sent_mail, openedMails: opened_mail},
on: ["_time"]
)
|> map(fn: (r) => ({
_time: r._time,
_value: float(v:r._value_openedMails) / float(v:r._value_sentMails)
})
)
Hey, this is awesome! This works… (although I truncateTimeColumn still does not work with my version of flux and I need to give the group an array as collumns).
Now I still need to understand it a bit better…
the sent_mail part is rather simple and did not change. The join also.
but what is that fn in aggregateWindow in opened_mail? I do not quite understand what it does…
Thanks a lot for your help!
well, it works kind of…
for whatever reason I get values > 1, which seems wrong. The unique part seems to fail somehow.
this is the query, that needs to be replaced: SELECT count(distinct("tracking_id")) FROM "email_opened" WHERE ("env" = 'production') AND $timeFilter GROUP BY time(1h) fill(null)
As the tracking ids in sentmail and email_opened are the same, the resulting number of dividing the number of unique number of opned_mail tids by the number of total mails sent should never be higher than 1.0 which would be a 100% opening rate…
BUT : as it turned out the problem is the 1h grouping. Mails are sent at e.g. 12:00 am and opened the next hour. If I set the aggregateWindow to 1d it looks way better. Sorry.
system
Closed
March 31, 2020, 8:21am
21
This topic was automatically closed 60 minutes after the last reply. New replies are no longer allowed.