Query results in empty for some points when using group by

influxdb
#1

I used below command to query data
influx -database “mydb” -precision="rfc3339" -execute "select sum(count) as count, sum(request_sum) / sum(count) as request_avg,sum(total_sum) / sum(count) as total_avg from "atomic_rp"."atomic_detail_for_url" where time > now() - 10s group by domain,url,code slimit 10000" > /tmp/result.txt

As every point’s fields has values, so I expected the sum(count), sum(request_sum) / sum(count), sum(total_sum) / sum(count) also has values. However, the result looks weird. For example

name: atomic_detail_for_url
tags: code=200, domain=apihydra.com, url=http://apihydra.com/baseJson/getUserPrivilege
time count request_avg total_avg
---- ----- ----------- ---------
2018-12-12T09:16:14.171895574Z 491

you can see the count and request_avg are empty, sometimes all of the fields are empty.
I can make sure that every point has values for the fields.
So, what could cause the weird result?

My influxdb version is 1.2, and just one instance.
It runs on a Linux server, the server has 32 cores and 64G memory.

#2

Someone else will have to shed some light on this but I just wanted to mention that 1.2 is going to be 2 years old in January. Is there a specific reason why you are running this version?

Also are you sure you are ingesting that data?
Is there any chance whatever is collecting and/or writing aren’t collecting all your data points?
What does your collect config look like?

Example if you use a shared log(access and app logs) maybe it isn’t collecting response time because it wasn’t an access log item.

#3

I never changed anything for the config files, so the values should be as default.
I’m sure that, there are values for every field when inserting them into influxdb. Since we don’t have update action, we only have insert action. After I saw the weird result, I also runs another query to make sure the points have values.

#4

More information

What I know is that the empty data always happens for the first N fields. For example

item1: the first field is empty
time count request_avg total_avg request_avg1 count1
---- ----- ----------- --------- ------------ ------
2018-12-13T02:36:09.185974696Z 7 7 7 1

item2: the first 3 fields are empty
time count request_avg total_avg request_avg1 count1
---- ----- ----------- --------- ------------ ------
2018-12-13T02:36:09.185974696Z 1 21

itme3: the first 4 fields are empty
time count request_avg total_avg request_avg1 count1
---- ----- ----------- --------- ------------ ------
2018-12-13T02:36:09.185974696Z 1

Note that, count1 should be equals with count while request_avg1 should be equals with request_avg

Here are my questions:

  1. What is the process flow for a query? Is it calculate the fields from the last to the first?
  2. Is it possible that it’s interrupted by some reasons after calculating the last N fields?
#5

More information

if change the time range
from
where time > now() - 10s
to
where time > now() - 11s and time < now() - 1s
Could resolve the issue.

So I think there are some bugs for the workflow of cache, WAL, and TSM.