Problem with tasks and downsampling

Please help me to create following statistic - I need to count daily user registrations and output percentage of difference between current day and the average of 4 the previous weekdays.

So I’m trying to create task to downsample data first, like this:

|> from(bucket: “Test”)
|> range(start: 2020-01-01T11:24:00Z)
|> filter(fn: ® => r["_measurement"] == “paid”)
|> aggregateWindow(every: 1d, fn: count)
|> to(bucket: “d”)

But I see only one record with count instead of 30 or more rows (one per day) for existing data…

Hello @aksonov,
Welcome!

Hmm that’s very strange.
When I execute the same query for the last 30. days, I get 30 rows in each table where there is a table for each measurement.

Can you please share a screen shot of your raw data view? or table view?

Thanks

@aksonov ,
Alternatively, can you export some data to annotated csv? Maybe for just a couple of days and I can try to look at that?

Thank you.

How to do the export of data? When I run this query I’m getting 30 rows too (from bucket “Test”), but running task with that query returns just one row (“d” bucket)

It would be very useful for me to see some data example with similar use case (like user registration) and sample of downsampling task for another bucket where I can see number of registration per day

Okey, looks like I’ve found the problem - I have to add _measurement, _field fields to new bucket to make it work… But another problem - the daily count now is duplicated every time task is launched. Is it possible to update daily count every 10minutes for example and don’t create new record each time?

Hello @aksonov ,
Without some example input and output data and your flux query I’m afraid I can’t do much to help you. Can you please share your input data and expected output data and your flux queries?
Thanks :slight_smile:

Data:
paid,usr_id=1 username=“test1”
paid,usr_id=2 username=“test2”
paid,usr_id=3 username=“test3”
paid,usr_id=4 username=“test4”
paid,usr_id=5 username=“test5”
paid,usr_id=6 username=“test6”
paid,usr_id=7 username=“test7”

Query for task:
option task = {
name: “DailyCount”,
every: 1m,
offset: 0m
}

from(bucket: “Test”)
|> range(start: -1d)
|> filter(fn: ® =>
(r["_measurement"] == “paid”))
|> group()
|> aggregateWindow(every: 1d, fn: count)
|> set(key: “_measurement”, value: “paid”)
|> set(key: “_field”, value: “daily_count”)
|> to(bucket: “d”)

All I want is to have one record count per day (i.e. 7), not new record every minute (what I see now, 7 is inserted every minute)

Hello @aksonov,
So the line protocol you shared is the output of the task?
Can you please share your input data from(bucket: “Test”)
Can you try and use last() before the to() function?
https://docs.influxdata.com/influxdb/cloud/reference/flux/stdlib/built-in/transformations/selectors/last/

Line protocol was input data (Test). Output is a new row with _value 7 (count) every minute.

Last() doesn’t have any effect

@aksonov,
You might have to specify the right column

last(column: "daily_count")

Also why are you using the following line? Since your measurement is already “paid”?

|> set(key: “_measurement”, value: “paid”)

You might also need to group() your data into one table instead of a stream of tables before applying the last() function.

I had to set _measurement because otherwise it says “_measurement” field is not found. Have you run that query by yourself? I tried to use last(column: “daily_count”) but it says “daily_count field is not found”. Could you give me exact query?

I would love to give you the exact query, but I’m having some trouble with the data you gave me. Yes I was able to run it by myself, but since you didn’t provide timestamps I had to write my own timestamps. I made the assumption that all of the data you gave me occurred in one day. And it works for me. Only one value is returned with the query you returned. So I can use the to() function to write one data point to any bucket of my choosing.


please note that the count is = 5 because I only wrote a subset of your data from id = 1 to id = 5. as I felt it was sufficient to try and understand your problem.

Can you please provide me with timestamps for your data? Or alternatively use the export to CSV button to export your raw input data to CSV so I can try it for myself?

Communicating data transformations can be hard! Thanks for sticking with me.

Thank you for your answer! Looks like here is some misunderstanding. That query really returns one row. But the task with that query inserts that row every time it runs. But I need something like “UPDATE” SQL, not “INSERT”. I need just to have ONE record per day (when task is executed every 5m to have actual data). Maybe I don’t understand Flux query language well…

Any response? Is it possible not to create new record every task launch?