SELECTing both mean + percentiles produces non-deterministic results

When I’m doing selects like:

SELECT mean(“Percent_Processor_Time”), sum(“Percent_Processor_Time”) FROM “CPU” WHERE …

I’m getting deterministic results as expected, like:

host | mean | sum
serverA | 12 | 34
serverB | 56 | 78

However, when I do selects like:

SELECT mean(“Percent_Processor_Time”), percentile(“Percent_Processor_Time”, 95) FROM “CPU” WHERE …

I’m getting an incorrect «table», and further the behavior is not deterministic:

host | mean | percentile
serverA | 12 | 34
serverB | 56 | __
serverB | __ | 78

^ For some of the GROUP BYs it shows all results aligned into one single row, for some other GROUP BYs it shows only some of the results. If I select mean + 3 percentiles, it appears I’m getting basically all possible combinations where for each row we have at least one result, seemingly at random.

I’m seeing the same bug. Here’s the dummy data to reproduce the issue:

cpu,host=a value=1 1434155562000000000
cpu,host=a value=2 1434265562000000000
cpu,host=b value=3 1434375562000000000
cpu,host=b value=4 1422568543702900257

And the query which doesn’t work:

select sum(value), percentile(value, 75) from cpu group by host

(the sum function can be replaced with mean/median/mode/percentile and the query still wont work)

which returns

host | sum | percentile
a    | 3   | __
b    | 7   | 4
a    | __  | 2

.

{
    "results": [
        {
            "statement_id": 0,
            "series": [
                {
                    "name": "cpu",
                    "tags": {
                        "host": "a"
                    },
                    "columns": [
                        "time",
                        "sum",
                        "percentile"
                    ],
                    "values": [
                        [
                            "1970-01-01T00:00:00Z",
                            3,
                            null
                        ]
                    ]
                },
                {
                    "name": "cpu",
                    "tags": {
                        "host": "b"
                    },
                    "columns": [
                        "time",
                        "sum",
                        "percentile"
                    ],
                    "values": [
                        [
                            "1970-01-01T00:00:00Z",
                            7,
                            4
                        ]
                    ]
                },
                {
                    "name": "cpu",
                    "tags": {
                        "host": "a"
                    },
                    "columns": [
                        "time",
                        "sum",
                        "percentile"
                    ],
                    "values": [
                        [
                            "1970-01-01T00:00:00Z",
                            null,
                            2
                        ]
                    ]
                }
            ]
        }
    ]
}

However the query does work if the percentile is changed to be any value below 75 (i.e. 74):

    select sum(value), percentile(value, 74) from cpu group by host

Another way the bug doesn’t appear is when I used similar timestamps (i.e. omitting the timestamps when inserting the data):

cpu,host=a value=1 1234567890000000001
cpu,host=a value=2 1234567890000000002
cpu,host=b value=3 1234567890000000003
cpu,host=b value=4 1234567890000000004

Conclusion
So to sum up, it appears with random different timestamps when grouping with at least 2 aggregate functions (one being percentile) in the select clause.

My Workaround
I query percentiles separately (one at a time) and add them to the previous results manually

Thanks for identifying how to reproduce this bug. Would you mind opening an issue on GitHub?

Hi Mark,

I actually found a similar issue on GitHub, so I posted a coment over there too: