Unexpected values returned for count() and GROUP BY time query

My VDSL modem does not support SNMP so I scrape the html stats page every 5 minutes with a python script and push it to Influxdb via POST. Grafana is happily graphing my line rate, attenuation and SNR data from this db. I am having issues with visualizing the Error data I could use some pointers on.

The modem presents errors as “ES, SES, UAS” (errored seconds, severely errored seconds, unavailable seconds) so these data points are ones that would only increase over time (unless reset). When I add visualization (Grafana) for any of these I get data points showing some single errors over time but the actual data does not show an increase in the count.

My database looks like this:

select * from vdsl_stats
name: vdsl_stats
time attDown attUp esErrDown esErrUp location rateDown rateUp sesErrDown sesErrUp snrDown snrUp uasErrDown uasErrUp


1575858309067795983 42.8 32.8 389 97 home 61763000 23840000 6 34 5.5 6.1 364 364
1575858609160093857 42.8 32.8 389 97 home 61763000 23840000 6 34 5.6 6.1 364 364
1575858910692556909 42.8 32.8 389 97 home 61763000 23840000 6 34 5.6 6.1 364 364
1575859209217305088 42.8 32.8 389 97 home 61763000 23840000 6 34 5.6 6.1 364 364
1575859509277762155 42.8 32.8 389 97 home 61763000 23840000 6 34 5.6 6.1 364 364
1575859809481228449 42.8 32.8 389 97 home 61763000 23840000 6 34 5.6 6.1 364 364
1575860109736123306 42.8 32.8 389 97 home 61763000 23840000 6 34 5.6 6.1 364 364

Taking esErrDown as an example, that value hasn’t changed form 389 in the entire database (ok only a few hours old so far) yet:

select count(“esErrDown”) from “vdsl_stats” GROUP BY time(1m)
name: vdsl_stats
time count


1575859200000000000 1
1575859260000000000 0
1575859320000000000 0
1575859380000000000 0
1575859440000000000 0
1575859500000000000 1
1575859560000000000 0
1575859620000000000 0
1575859680000000000 0
1575859740000000000 0
1575859800000000000 1

There is some sort of pattern going on there I suspect to do with time (5m collection vs 1m grouping?) or precision in the database (can I set that lower given I’m only collecting a point every 5m?). Or am I collecting or querying the data incorrectly?

Thanks

Your suspect is right.
You are asking for a count (which returns the number of non-null field values), for every minute, but your data are gathered every five, that’s why you see a value only every 5min, if you have no data point in a given minute than the count is 0.

What are you expecting exactly? which is the desired output?

Goal is to graph the rate of errors over time, so it appears count() is wrong thing to use here. What should I use instead?

I think I have it. I changed Grafana to use mean() aggregation and non_negative_difference() transformation and the rate looks good now.

fwiw the InfluxDB query looks like this:

SELECT non_negative_difference(mean(“esErrUp”)) FROM “vdsl_stats” WHERE time >= now() - 24h GROUP BY time(2m) fill(previous)

3rd reply is the charm?

Checking other dashboards that graph interface errors, I removed mean() in favour of last(). Might be more accurate.

I’m glad that you found a solution.
I will just point out that (to me) it does not make lot of sense to group by a time unit that is smaller than the data gathering frequency, especially if the difference is just of 3min. (you gather data every 5m but group by 2m)
Also having less values on your chart x axis will probably make it more readable.

If the problem has been solved you should be able to mark this question as closed/solved

Ack that, I’ve forced the rate to 5m in the Grafana query.