I’m trying to keep track of switchport status for a large number of switches. These switches use Arista’s streaming telemetry solution, which only pushes updates if the status changes. For switchports in our network, this is very infrequent, state sometimes won’t change for months.
I would like to visualize the port status history in Grafana using a discrete measurements panel. The challenge I have is that quite often, the time period shown in the graph (for example 1 month) does not contain any data points in the database. I can do some tricks by adjusting the query so the time period queried goes back further than the time window shown in the graph, but since port status sometimes won’t change for months, I’d have to go back a long time to be sure I still have measurements.
I checked with Arista if there’s a solution on the data sending side, but unfortunately there isn’t.
So now I was thinking on doing something with Kapacitor, perhaps with StateDurationNode to introduce additional measurements by repeating the previous value. If that would be possible (for example once a day or week) I can use the query with adjusted time periods.
Would this be feasible? Has anyone done something similar? Or is there a better solution I missed?
Hello @teunvink,
If you’re pulling data periodically, than StateDurationNode is a great solution for alerting on how long a system has been in the same state. The state is defined via a lambda expression. For each consecutive point for which the expression evaluates as true
, the state duration will be incremented by the duration between points. When a point evaluates as false
, the state duration is reset. StateDurationNode doesn’t introduce new points or repeat the previous valule. The stateDuration
node computes the duration of a given state.
I think maybe ChangeDetectNode might be more inline with what you’re looking for? I’m not that familiar with grafana, but perhaps then you could just visualize your alert history.
If this is just a visualization problem, and your main problem is just injecting old values, you might also consider just using fill(previous)? Maybe you can help me by explaining why you can’t use fill previous?
Hi @Anaisdg, thanks for taking the time to reply.
Let me try to explain this a bit more: the problem with fill(previous) is that I don’t know how far back I need to go to get the last value. Some switchports change state many times a day (due to instable hardware connected to it), some can be at one value (so no updates in the database) for months or even longer. My problem boils down to the fact that only state changes are logged.
By default, grafana does a query to influxDB for the time window being displayed. So if I want to show a discrete map of the link status for the last month, grafana would do a query for the last month of values. If the link state didn’t change within that month, fill(previous) won’t help, since there just isn’t any value in the time window to start with. I could “cheat” by adjusting the influx query so it goes back further than the display window, but the question then is how far back I need to go. For some ports, a few days or weeks could be enough, for others one year won’t even be enough. And going back too far could mean that for some ports, I’d be gathering many, many more measurements than I need to show within the time window I want to display.
So what I am thinking about is making sure I have at least one measurement every day, so that I can be certain what my ‘search window’ in the query should be. I could do this with some python and influxDB code, but I’d rather use Kapacitor since we’re already using that to do some other data manipulation.
ChangeDetectNode doesn’t seem to do what I’m looking for, it’s mostly useful in other situations I think. It feels as if ‘StateDurationNode’ could be used to detect if a port status hasn’t changed for a day, and if so, insert a new datapoint using InfluxDBOutNode. I haven’t found the time to test this yet, and since I’m still not that experienced in the possibilities of InfluxDB and Kapacitor I was mostly hoping on some advice on how to handle this type of problems.
1 Like
Hello @teunvink,
Thank you for explaining. I see where you’re going with it now. You could also consider writing a flux function that returns the timestamp of the last value, and using that as input to your query. When you do find the time, I encourage you to share it! Thanks.
Thanks again for the pointers. I ran into batch-mode tick scripts (I’ve been using only stream mode so far) and thought I could do something like this (actual time values should be a lot longer, set to 5m for testing) to create a batch job which checks the last value and writes it to the output database.
dbrp "opentsdb"."retention_30d"
batch
|query('''SELECT last(value) FROM "opentsdb"."retention_30d"."eos.operstatus" WHERE ("host" = 'testswitch.lab' AND "intf" = 'Ethernet2')''')
.period(5m)
.offset(5m)
.every(5m)
.groupBy(*)
|influxDBOut()
.cluster('writer')
.database('outdb')
.retentionPolicy('retention_400d')
.measurement('eos.operstatus')
.precision('s')
Although the script is started without problems, it doesn’t do a thing (yet), and I don’t see any data appearing in the output database.
The query I’m using works:
> SELECT last(value) FROM "opentsdb"."retention_30d"."eos.operstatus" WHERE ("host" = 'testswitch.lab' AND "intf" = 'Ethernet2')
name: eos.operstatus
time last
---- ----
2020-05-22T10:33:33Z 1
I fail to see what I’m doing wrong here or what I can do to debug this. log() seems to be unavailable for these operations. Any pointers on how to proceed would be appreciated.
You wouldn’t want a continuous query for example to run once a day and capture the last known state, maybe with elapsed time, and keep that as your starting point? Your chart could plot both as two series: midnight value and aggregated value over the day, stack them if needed.
I was actually looking into something like that, @rvdheij, but so far the timestamp of the original value would be copied (and not replaced with the current timestamp), so the problem would remain.
I tried something like this:
CREATE CONTINUOUS QUERY operstatus ON opentsdb_400d BEGIN SELECT last(value) AS value INTO opentsdb_400d.autogen."eos.operstatus" FROM opentsdb.retention_30d."eos.operstatus" GROUP BY time(30d), * END