Inconsistent aggregation results when grouping by time

Hi,

I’m trying to group a set of call data records per hour. The sample dataset is available here

https://dl.dropboxusercontent.com/u/21030364/call_details_influxdb.zip

and can be unzipped and imported using

influx -import -path=call_details_influxdb_filtered -precision=s

When I run the following sample query

select count(asterisk_uniqueid) as call_count, mean(talk_time) as talk_time_average, sum(talk_time) as talk_time_total, min(talk_time) as talk_time_min, max(talk_time) as talk_time_max, mean(waiting_time) as waiting_time_average, sum(waiting_time) as waiting_time_total, min(waiting_time) as waiting_time_min, max(waiting_time) as waiting_time_max, mean(original_position) as original_position_average, min(original_position) as original_position_min, max(original_position) as original_position_max from call_details where time >= '2017-02-26' AND time <= '2017-03-10' AND queue='ob-7018' GROUP BY queue,call_disposition,time(1d) fill(0) order by time desc

I get a result set with two results - which is correct.

If I narrow the date range and run the following

select count(asterisk_uniqueid) as call_count, mean(talk_time) as talk_time_average, sum(talk_time) as talk_time_total, min(talk_time) as talk_time_min, max(talk_time) as talk_time_max, mean(waiting_time) as waiting_time_average, sum(waiting_time) as waiting_time_total, min(waiting_time) as waiting_time_min, max(waiting_time) as waiting_time_max, mean(original_position) as original_position_average, min(original_position) as original_position_min, max(original_position) as original_position_max from call_details where time >= '2017-02-26' AND time <= '2017-03-02' AND queue='ob-7018' GROUP BY queue,call_disposition,time(1d) fill(0) order by time desc

I get three result sets. The set with tags: call_disposition=ANSWERED, queue=ob-7018 appears twice.

I really don’t understand why this is happening. Why aren’t they merging in the second scenario. Can someone please explain what is possibly going wrong?

Thanks,
Ian

Can you include the output you’re getting.

This is a simplified query of the above on the same dataset that demonstrates the issue I’m having

When querying from the 26 Feb to 10 March I get the correct results. One group for ANSWERED and one group for UNANSWERED (these are the two distinct call_dispositions)

select count(asterisk_uniqueid) as call_count from call_details where time >= '2017-02-26' AND time <= '2017-03-10' AND queue='ob-7018' GROUP BY call_disposition,time(1d) fill(0) order by time desc
name: call_details
tags: call_disposition=UNANSWERED
time                 call_count
----                 ----------
2017-03-10T00:00:00Z 0
2017-03-09T00:00:00Z 1
2017-03-08T00:00:00Z 3
2017-03-07T00:00:00Z 1
2017-03-06T00:00:00Z 2
2017-03-05T00:00:00Z 3
2017-03-04T00:00:00Z 2
2017-03-03T00:00:00Z 0
2017-03-02T00:00:00Z 0
2017-03-01T00:00:00Z 5
2017-02-28T00:00:00Z 0
2017-02-27T00:00:00Z 0
2017-02-26T00:00:00Z 0

name: call_details
tags: call_disposition=ANSWERED
time                 call_count
----                 ----------
2017-03-10T00:00:00Z 0
2017-03-09T00:00:00Z 9
2017-03-08T00:00:00Z 17
2017-03-07T00:00:00Z 16
2017-03-06T00:00:00Z 25
2017-03-05T00:00:00Z 41
2017-03-04T00:00:00Z 19
2017-03-03T00:00:00Z 0
2017-03-02T00:00:00Z 9
2017-03-01T00:00:00Z 31
2017-02-28T00:00:00Z 6
2017-02-27T00:00:00Z 0
2017-02-26T00:00:00Z 0

When querying from the 26 Feb to 2 March I get the incorrect results. There are 3 groups with the ANSWERED group repeated

> select count(asterisk_uniqueid) as call_count from call_details where time >= '2017-02-26' AND time <= '2017-03-02' AND queue='ob-7018' GROUP BY call_disposition,time(1d) fill(0) order by time desc
name: call_details
tags: call_disposition=ANSWERED
time                 call_count
----                 ----------
2017-03-02T00:00:00Z 0
2017-03-01T00:00:00Z 31
2017-02-28T00:00:00Z 0
2017-02-27T00:00:00Z 0
2017-02-26T00:00:00Z 0

name: call_details
tags: call_disposition=UNANSWERED
time                 call_count
----                 ----------
2017-03-02T00:00:00Z 0
2017-03-01T00:00:00Z 5
2017-02-28T00:00:00Z 0
2017-02-27T00:00:00Z 0
2017-02-26T00:00:00Z 0

name: call_details
tags: call_disposition=ANSWERED
time                 call_count
----                 ----------
2017-03-02T00:00:00Z 0
2017-03-01T00:00:00Z 0
2017-02-28T00:00:00Z 6
2017-02-27T00:00:00Z 0
2017-02-26T00:00:00Z 0

Why is the ANSWERED group merged in the first example and not in the second?

Very weird. A few more asks.

  1. What version of InfluxDB are you using?

  2. Can you change the time format so that the results are specified in seconds instead of rfc3339 and rerun the query? I wan’t to write the same data into my local instance to test it out.

  3. What are the results of SHOW SERIES, SHOW TAG KEYS, and SHOW MEASUREMENTS?

Version 1.2.0

The original data is available in that Dropbox link if you want to import into your own instance.

> select count(asterisk_uniqueid) as call_count from call_details where time >= '2017-02-26' AND time <= '2017-03-02' AND queue='ob-7018' GROUP BY call_disposition,time(1d) fill(0) order by time desc
name: call_details
tags: call_disposition=ANSWERED
time       call_count
----       ----------
1488412800 0
1488326400 31
1488240000 0
1488153600 0
1488067200 0

name: call_details
tags: call_disposition=UNANSWERED
time       call_count
----       ----------
1488412800 0
1488326400 5
1488240000 0
1488153600 0
1488067200 0

name: call_details
tags: call_disposition=ANSWERED
time       call_count
----       ----------
1488412800 0
1488326400 0
1488240000 6
1488153600 0
1488067200 0
> SHOW TAG KEYS
name: call_details
tagKey
------
agent
call_direction
call_disposition
global_server_id
queue
time_zone
> SHOW MEASUREMENTS
name: measurements
name
----
call_details

Show series exceeds the post limit on this forum. I’ve put the results here

http://pastebin.com/raw/VPfcxiQP

This looks like a bug. Can you please open an issue on the InfluxDB repo.

Note that without the ORDER BY time DESC it works correctly.

> select count(asterisk_uniqueid) as call_count from call_details where time >= '2017-02-26' AND time <= '2017-03-02' AND queue='ob-7018' GROUP BY call_disposition,time(1d) fill(0)
name: call_details
tags: call_disposition=ANSWERED
time			call_count
----			----------
1488067200000000000	0
1488153600000000000	0
1488240000000000000	6
1488326400000000000	31
1488412800000000000	0

name: call_details
tags: call_disposition=UNANSWERED
time			call_count
----			----------
1488067200000000000	0
1488153600000000000	0
1488240000000000000	0
1488326400000000000	5
1488412800000000000	0

@shermster Sorry for having you run all of those queries before. I missed that you included the dataset in your original post.

Thanks Michael for investigating this. I can manage then without the time sorting.

I did open a bug about this yesterday so I will update it with a link to this discussion.

Appreciate the quick help.