Hello,
After some work I ended up with the correct data in influxdb.
The query is the following:
import "strings"
from(bucket: "connections")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => not strings.containsStr(v: r["interface_name"], substr: "Management"))
|> filter(fn: (r) => not strings.containsStr(v: r["interface_name"], substr: "Loopback"))
|> filter(fn: (r) => r["_field"] == "/interfaces/interface/state/oper-status")
|> filter(fn: (r) => r["_value"] == "UP")
|> window(every: v.windowPeriod)
|> truncateTimeColumn(unit: v.windowPeriod)
|> group(columns: ["_time", "_measurement"])
|> count(column: "interface_name")
|> duplicate(column: "interface_name", as: "_value")
|> drop(columns: ["_start", "_stop", "interface_name"])
|> group(columns: ["_measurement"])
|> rename(columns: {"_measurement": "switch_hostname"})
|> sort(columns: ["_time"], desc: false)
This gives a correct graph on both influx web UI & grafana, however, I have datapoints every 10 seconds for 3 switches (this results in 3 lines on a single graph). When I put the query beyond 15 minutes, to 1h, or higher, it simply times out on grafana. In the influx web UI it shows that the query is running for >60 seconds before timing out.
This is being run an EC2 AWS instance: t3.large, so 2 vCPUs and 8 GB of memory. I can see the CPU spike to ~35%, but I guess that the memory is just straight up being consumed. I will try to confirm this.
However, after running this query 2/3x I suddenly start getting unauthorized on my queries, looking at the logs they show this:
Apr 03 08:21:48 - systemd[1]: influxdb.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Apr 03 08:21:48 - systemd[1]: Unit influxdb.service entered failed state.
Apr 03 08:21:48 - systemd[1]: influxdb.service failed.
Apr 03 08:21:48 - systemd[1]: influxdb.service holdoff time over, scheduling restart.
Apr 03 08:21:48 - systemd[1]: Stopped InfluxDB is an open-source, distributed, time series database.
Apr 03 08:21:48 - systemd[1]: Starting InfluxDB is an open-source, distributed, time series database...
Alright, so instead of gracefully handling the data, the server is full on crashing. What’s more after the crash the server goes into setup mode, forgetting all the users, all the tokens, basically making it a useless server.
This cannot be the way to handle an out-of-memory exception for a query which is written in an unoptimized way.
So from this 1 post I have 3 issues:
- How does a simple query like this cause such a huge query delay
- How does the server not gracefully handle out of memory queries
- Why does the server go into setup mode after crashing due to the above queries
I’m the only one using this server and the only one doing these queries, as such I’m 100% sure there isn’t anybody else using the server.
Version is 2.7.5