Grouping data shows date from before start of storing metrics

Hi,

I have a problem with how influxdb is grouping data. Let me start by saying that I am using a project called SBFSpot to read data from an invertor (with solar panels attached). Al this data is written to csv files. Telegraf reads these csv’s and put them inside influxdb. I read powergeneration, currents, voltage relay status,… The tool creates 3 csv’s:

  • Month data: contains one line per day with power generated for that day.
  • Spot data: Live data from the invertor like currents ac/dc, voltages ac/dc, power, relay status. Should be measured every minute. But often the values are just 0 (still looking into that).
  • Day data, contains power generated in the last 5 min.

Data collection started on 8th november 2019. Before that there should be no data at al as I didnt have the solar panels yet. I also recently updated chronograf, influxdb and grafana and besides that I cleaned up a bit in the db and reimported all the data.

Now over to the issue. My goal is to show the power generated every month. So I need to make a sum of the last field of the file that contains the month data. To get there I used several steps. First show what data is present:

SELECT “kWh” FROM “file” WHERE “month”=~ /kas.lan*/ FILL(null)
name: file
time kWh


1573171200000000000 11.357
1573257600000000000 4.1
1573344000000000000 10.617
1573430400000000000 5.793

1576800000000000000 0.63
1576886400000000000 2.339

This looks fine to me. Start on 8th and ends on the 21th or 22. To calculate generated power last few months I’ll make a sum of the data and group by month. Time interval is 30 days:

SELECT sum(“kWh”) FROM “file” WHERE “month”=~ /kas.lan*/ GROUP BY time(30d) FILL(null)
name: file
time sum


1570752000000000000 15.456999999999999
1573344000000000000 119.755
1575936000000000000 36.087

Here is already something strange. Epoch time is Friday, October 11, 2019 12:00:00 AM but collection hadnt even started yet. When I group by month I get this:

SELECT sum(“kWh”) FROM “file” WHERE “month”=~ /kas.lan*/ GROUP BY time(30d), “month” FILL(null)
name: file
tags: month=kas.lan-201911.csv
time sum


1570752000000000000 15.456999999999999
1573344000000000000 93.5
1575936000000000000

name: file
tags: month=kas.lan-201912.csv
time sum


1573344000000000000 26.255
1575936000000000000 36.087

3 or 2 values for one month? When making the sum it seems to be correct though… Grafana does show me the correct graph using this query:

Chronograf does also show me 3 values:

“time”,“file.sum_kWh”
“2019-10-11T00:00:00.000Z”,“15.456999999999999”
“2019-11-10T00:00:00.000Z”,“119.755”
“2019-12-10T00:00:00.000Z”,“36.087”

Anyone know why this is happening?

No one? Should I create a bug for this on https://github.com/influxdata/influxdb/issues?

I think the problem is caused by the “GROUP BY time(30d)” in combination with GROUP BY “month”, below my theory

Influx has no clue about what a month is (it lacks time functions), so it will group your data in 30 days intervals, and then by month, (a month is just a tag… so a string value)

The time you see is the minimum time for the defined range.

therefore from the result set I can tell you that:
The query produces 3 ranges:

  • from 2019-10-11:12:00:00 to 2019-11-10 11:59:59
  • from 2019-11-10 12:00:00 to 2019-12-10 11:59:59
  • from 2019-12-10 12:00:00 to now (since 30 days have not passed yet)
    Every interval covers exactly 30 days

if you also select the month in your query (I expect) you will see something like this:
time | month | kwh
2019-10-11:12:00:00 | 11 | 15.456999999999999
2019-11-10 12:00:00 | 11 | 119.755
2019-12-10 12:00:00 | 12 | 36.087

As you might have noticed November has been split into two different rows, because of the time range, the first row covers data up to 2019-11-10 11:59:00 (and you have data since 8th November).

I would do the following:

  • filter by time (WHERE) so you get the data of the selected interval
  • Do not group by time, since you won’t be able to control the range, group only by month

Grafana also offers some smart Time units but I doubt those will help you in this situation

That’s completely correct. But when using chronograf I just get 2 dots on a graph (no bar gauge). The dots are the correct data. It just doesn’t display it correctly while grafana does (albeit with a different query). I want to graph the power production in the last 6 month. So that should give me 6 bars with each production per month. Any tips on how to do this specifically in chronograf?

Yesterday I found out an issue (link to post) with chronograf charts, the first point was plotted but somehow hidden in the chart.
can you check if the problem is the same?

I’m not an expert of Chronograf but for what I Know:
About how to do that in a reliable way in chronograf, I’m not sure since it is not “time aware” (months, etc).
maybe with a combination of WHERE, LIMIT and ORDER BY but I don’t like it and I also doubt the result will be helpful (even because it will be sorted in the opposite way)

SELECT {...} WHERE time > now() - 186d GROUP BY {...} LIMIT 6 ORDER BY {tag/time} DESC
# in the worst case you have a range of 31d * 6 months = 186d
# LIMIT 6 -> only top 6 points returned
# ORDER BY {tag} DESC so the LIMIT gets only the 6 most recent points

There is still a problem, you can’t define what to put in the X axis, therefore you won’t be able to set visualize the result strictly by month.

EDIT: a solution based on a continuous query requires group by time() so it’s not a viable option in this case