Inconsistent aggregation results when grouping by time

shermster · March 13, 2017, 4:34pm

Hi,

I’m trying to group a set of call data records per hour. The sample dataset is available here

https://dl.dropboxusercontent.com/u/21030364/call_details_influxdb.zip

and can be unzipped and imported using

influx -import -path=call_details_influxdb_filtered -precision=s

When I run the following sample query

select count(asterisk_uniqueid) as call_count, mean(talk_time) as talk_time_average, sum(talk_time) as talk_time_total, min(talk_time) as talk_time_min, max(talk_time) as talk_time_max, mean(waiting_time) as waiting_time_average, sum(waiting_time) as waiting_time_total, min(waiting_time) as waiting_time_min, max(waiting_time) as waiting_time_max, mean(original_position) as original_position_average, min(original_position) as original_position_min, max(original_position) as original_position_max from call_details where time >= '2017-02-26' AND time <= '2017-03-10' AND queue='ob-7018' GROUP BY queue,call_disposition,time(1d) fill(0) order by time desc

I get a result set with two results - which is correct.

If I narrow the date range and run the following

select count(asterisk_uniqueid) as call_count, mean(talk_time) as talk_time_average, sum(talk_time) as talk_time_total, min(talk_time) as talk_time_min, max(talk_time) as talk_time_max, mean(waiting_time) as waiting_time_average, sum(waiting_time) as waiting_time_total, min(waiting_time) as waiting_time_min, max(waiting_time) as waiting_time_max, mean(original_position) as original_position_average, min(original_position) as original_position_min, max(original_position) as original_position_max from call_details where time >= '2017-02-26' AND time <= '2017-03-02' AND queue='ob-7018' GROUP BY queue,call_disposition,time(1d) fill(0) order by time desc

I get three result sets. The set with tags: call_disposition=ANSWERED, queue=ob-7018 appears twice.

I really don’t understand why this is happening. Why aren’t they merging in the second scenario. Can someone please explain what is possibly going wrong?

Thanks,
Ian

michael · March 14, 2017, 2:41pm

Can you include the output you’re getting.

shermster · March 15, 2017, 6:53am

This is a simplified query of the above on the same dataset that demonstrates the issue I’m having

When querying from the 26 Feb to 10 March I get the correct results. One group for ANSWERED and one group for UNANSWERED (these are the two distinct call_dispositions)

select count(asterisk_uniqueid) as call_count from call_details where time >= '2017-02-26' AND time <= '2017-03-10' AND queue='ob-7018' GROUP BY call_disposition,time(1d) fill(0) order by time desc
name: call_details
tags: call_disposition=UNANSWERED
time                 call_count
----                 ----------
2017-03-10T00:00:00Z 0
2017-03-09T00:00:00Z 1
2017-03-08T00:00:00Z 3
2017-03-07T00:00:00Z 1
2017-03-06T00:00:00Z 2
2017-03-05T00:00:00Z 3
2017-03-04T00:00:00Z 2
2017-03-03T00:00:00Z 0
2017-03-02T00:00:00Z 0
2017-03-01T00:00:00Z 5
2017-02-28T00:00:00Z 0
2017-02-27T00:00:00Z 0
2017-02-26T00:00:00Z 0

name: call_details
tags: call_disposition=ANSWERED
time                 call_count
----                 ----------
2017-03-10T00:00:00Z 0
2017-03-09T00:00:00Z 9
2017-03-08T00:00:00Z 17
2017-03-07T00:00:00Z 16
2017-03-06T00:00:00Z 25
2017-03-05T00:00:00Z 41
2017-03-04T00:00:00Z 19
2017-03-03T00:00:00Z 0
2017-03-02T00:00:00Z 9
2017-03-01T00:00:00Z 31
2017-02-28T00:00:00Z 6
2017-02-27T00:00:00Z 0
2017-02-26T00:00:00Z 0

When querying from the 26 Feb to 2 March I get the incorrect results. There are 3 groups with the ANSWERED group repeated

> select count(asterisk_uniqueid) as call_count from call_details where time >= '2017-02-26' AND time <= '2017-03-02' AND queue='ob-7018' GROUP BY call_disposition,time(1d) fill(0) order by time desc
name: call_details
tags: call_disposition=ANSWERED
time                 call_count
----                 ----------
2017-03-02T00:00:00Z 0
2017-03-01T00:00:00Z 31
2017-02-28T00:00:00Z 0
2017-02-27T00:00:00Z 0
2017-02-26T00:00:00Z 0

name: call_details
tags: call_disposition=UNANSWERED
time                 call_count
----                 ----------
2017-03-02T00:00:00Z 0
2017-03-01T00:00:00Z 5
2017-02-28T00:00:00Z 0
2017-02-27T00:00:00Z 0
2017-02-26T00:00:00Z 0

name: call_details
tags: call_disposition=ANSWERED
time                 call_count
----                 ----------
2017-03-02T00:00:00Z 0
2017-03-01T00:00:00Z 0
2017-02-28T00:00:00Z 6
2017-02-27T00:00:00Z 0
2017-02-26T00:00:00Z 0

Why is the ANSWERED group merged in the first example and not in the second?

michael · March 15, 2017, 7:04am

Very weird. A few more asks.

What version of InfluxDB are you using?
Can you change the time format so that the results are specified in seconds instead of rfc3339 and rerun the query? I wan’t to write the same data into my local instance to test it out.
What are the results of SHOW SERIES, SHOW TAG KEYS, and SHOW MEASUREMENTS?

shermster · March 15, 2017, 7:19am

Version 1.2.0

The original data is available in that Dropbox link if you want to import into your own instance.

> select count(asterisk_uniqueid) as call_count from call_details where time >= '2017-02-26' AND time <= '2017-03-02' AND queue='ob-7018' GROUP BY call_disposition,time(1d) fill(0) order by time desc
name: call_details
tags: call_disposition=ANSWERED
time       call_count
----       ----------
1488412800 0
1488326400 31
1488240000 0
1488153600 0
1488067200 0

name: call_details
tags: call_disposition=UNANSWERED
time       call_count
----       ----------
1488412800 0
1488326400 5
1488240000 0
1488153600 0
1488067200 0

name: call_details
tags: call_disposition=ANSWERED
time       call_count
----       ----------
1488412800 0
1488326400 0
1488240000 6
1488153600 0
1488067200 0

> SHOW TAG KEYS
name: call_details
tagKey
------
agent
call_direction
call_disposition
global_server_id
queue
time_zone

> SHOW MEASUREMENTS
name: measurements
name
----
call_details

Show series exceeds the post limit on this forum. I’ve put the results here

http://pastebin.com/raw/VPfcxiQP

michael · March 15, 2017, 8:12am

This looks like a bug. Can you please open an issue on the InfluxDB repo.

michael · March 15, 2017, 8:20am

Note that without the ORDER BY time DESC it works correctly.

> select count(asterisk_uniqueid) as call_count from call_details where time >= '2017-02-26' AND time <= '2017-03-02' AND queue='ob-7018' GROUP BY call_disposition,time(1d) fill(0)
name: call_details
tags: call_disposition=ANSWERED
time			call_count
----			----------
1488067200000000000	0
1488153600000000000	0
1488240000000000000	6
1488326400000000000	31
1488412800000000000	0

name: call_details
tags: call_disposition=UNANSWERED
time			call_count
----			----------
1488067200000000000	0
1488153600000000000	0
1488240000000000000	0
1488326400000000000	5
1488412800000000000	0

michael · March 15, 2017, 8:21am

@shermster Sorry for having you run all of those queries before. I missed that you included the dataset in your original post.

shermster · March 15, 2017, 8:25am

Thanks Michael for investigating this. I can manage then without the time sorting.

I did open a bug about this yesterday so I will update it with a link to this discussion.

github.com/influxdata/influxdb

duplicate tags in aggregation result when grouping by time period

opened 12:23PM - 14 Mar 17 UTC

closed 08:57PM - 01 Feb 18 UTC

shermster

area/queries

### Bug report __System info:__ [InfluxDB 1.2.0 running on Centos 6] __Ste…ps to reproduce:__ There may be a simple explanation for this but I'm just struggling to understand the results I'm getting for aggregations grouped by a time period. 1. The data set being used is available here https://dl.dropboxusercontent.com/u/21030364/call_details_influxdb.zip and can be loaded using influx -import -path=call_details_influxdb_filtered -precision=s There are 10,000 records of timestamped call data records. 2. I'm getting inconsistent grouping results depending on the time period specified in the where clause. For example running the query from the 26 Feb to 10 March select count(asterisk_uniqueid) as call_count from call_details where time >= '2017-02-26' AND time <= '2017-03-10' AND queue='ob-7018' GROUP BY call_disposition,time(1d) fill(0) order by time desc This produces the correct result set with results for call_disposition ANSWERED and call_disposition UNANSWERED ``` name: call_details tags: call_disposition=UNANSWERED time call_count 2017-03-10T00:00:00Z 0 2017-03-09T00:00:00Z 1 2017-03-08T00:00:00Z 3 2017-03-07T00:00:00Z 1 2017-03-06T00:00:00Z 2 2017-03-05T00:00:00Z 3 2017-03-04T00:00:00Z 2 2017-03-03T00:00:00Z 0 2017-03-02T00:00:00Z 0 2017-03-01T00:00:00Z 5 2017-02-28T00:00:00Z 0 2017-02-27T00:00:00Z 0 2017-02-26T00:00:00Z 0 name: call_details tags: call_disposition=ANSWERED time call_count 2017-03-10T00:00:00Z 0 2017-03-09T00:00:00Z 9 2017-03-08T00:00:00Z 17 2017-03-07T00:00:00Z 16 2017-03-06T00:00:00Z 25 2017-03-05T00:00:00Z 41 2017-03-04T00:00:00Z 19 2017-03-03T00:00:00Z 0 2017-03-02T00:00:00Z 9 2017-03-01T00:00:00Z 31 2017-02-28T00:00:00Z 6 2017-02-27T00:00:00Z 0 2017-02-26T00:00:00Z 0 ``` However if I shorten the time period for example 26 Feb to 2 March select count(asterisk_uniqueid) as call_count from call_details where time >= '2017-02-26' AND time <= '2017-03-02' AND queue='ob-7018' GROUP BY call_disposition,time(1d) fill(0) order by time desc I then get three results ``` name: call_details **tags: call_disposition=ANSWERED** time call_count 2017-03-02T00:00:00Z 0 2017-03-01T00:00:00Z 31 2017-02-28T00:00:00Z 0 2017-02-27T00:00:00Z 0 2017-02-26T00:00:00Z 0 name: call_details tags: call_disposition=UNANSWERED time call_count 2017-03-02T00:00:00Z 0 2017-03-01T00:00:00Z 5 2017-02-28T00:00:00Z 0 2017-02-27T00:00:00Z 0 2017-02-26T00:00:00Z 0 name: call_details **tags: call_disposition=ANSWERED** time call_count 2017-03-02T00:00:00Z 0 2017-03-01T00:00:00Z 0 2017-02-28T00:00:00Z 6 2017-02-27T00:00:00Z 0 2017-02-26T00:00:00Z 0 ``` Why is this happening? I don't understand why there are two ANSWERED groups? Regards, Ian

Appreciate the quick help.

Topic		Replies	Views
Query Problem - including GROUP BY time() acts weird? Store influxdb , influxql	3	2261	May 4, 2017
GROUP BY time leaving tag - mixing aggregate and non-aggregate queries is not supported InfluxQL influxdb , influxql	7	2775	July 8, 2021
Grouped data shows a higher Sum than expected influxdb	6	1421	July 14, 2017
Group by time - why different grouping for mean() and integral() Telegraf	6	1739	April 14, 2021
Output from aggregateWindow and group time does not match row timestamps InfluxDB 2 time , python	2	261	October 16, 2023

Inconsistent aggregation results when grouping by time

Related topics