How to make http_response telegraf input plugin include unresponsive web servers

Hello friends,
I am new to both influxdb and telegraf, I am trying to monitor my websites and internal web servers. It shows fine on influxdb UI but when I try to access it on grafana I am not seeing one website which is not responsive for now, am testing it with my internal apache server and I stopped the service. I have tried with the following query:

from(bucket: "system_stat")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "http_response")
  |> filter(fn: (r) => r["_field"] == "result_type")
  |> filter(fn: (r) => r["status_code"] == "200" and r["status_code"] == "null")
  |> filter(fn: (r) => r["host"] == "grafana")
  |> filter(fn: (r) => r["method"] == "GET")
  |> aggregateWindow(every: v.windowPeriod, fn: last, createEmpty: false)
  |> yield(name: "last")

Is there a way http_response sends a data to influxdb even if the web server is not responding?
Thanks

Hello @Amanuel_Elhanan,
Welcome!
Wait, I’m sorry I’m a little confused. So you’re getting all of your data in InfluxDB but you’re missing a small subset in Grafana?
Metrics collected when Telegraf is offline are added to the metric buffer and sent when a connection is re-established.

Hey @Anaisdg thanks for replying
I wanted to include status_code of null into grafana, so that I can label it as “down”. I am currently trying on influxdb UI and copying the script to grafana.
I am using http_response_code as a filtering field for now, and when I use this for a webservice that’s not responding, Its value will be omitted from the result.

That’s why I wanted to add status_code as a filtering field to add the additional website which is currently not showing as the webserver is not running for testing purposes. Is there a way I can combine two of them and show the whole result?

I am new to influxdb and telegraf sorry if am asking too easy question :nerd_face:

Hello @Amanuel_Elhanan,
You can create a deadman check.

1 Like

Hello @Anaisdg I have tried the first option that you have sent me (monitor.deadman() function | Flux 0.x Documentation) influxdb showed me an error, here is flux script I have used:


import "influxdata/influxdb/monitor"
import "experimental"

from(bucket: "system_stat")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "http_response")
  |> filter(fn: (r) => r["_field"] == "http_response_code")
  |> filter(fn: (r) => r["host"] == "grafana")
  |> filter(fn: (r) => r["method"] == "GET")
  |> group(columns: ["server"])
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: true)
  |> monitor.deadman(t: experimental.addDuration(d: -1m, from: now()))
  |> yield(name: "last")

Error:

I am trying to use the exact script on Grafana, but I wanted to check it here first.

Hello @Amanuel_Elhanan,
Thank you that’s an issue in the docs. You should use subDuration instead:

import "influxdata/influxdb/monitor"
import "experimental"

from(bucket: "noaa")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "average_temperature")
  |> filter(fn: (r) => r["_field"] == "degrees")
  |> filter(fn: (r) => r["location"] == "coyote_creek")
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
  |> monitor.deadman(t: experimental.subDuration(d: 5m, from: now())) 
  |> yield(name: "mean")

Does that work for you?

I made this issue:

Hello @Anaisdg thanks a lot its works now, I tried on Influxdb and it worked fine, but currently I am implementing that on Grafana unfortunately it doesn’t work there, I might use Influxdb it self for alerting I guess.

1 Like

@Amanuel_Elhanan,
Thanks for the update. Yah you have to use InfluxDB for that type of tasks and alerting. Do you mind telling me more about your use case? I’m always curious to learn about what users are doing with InfluxDB, but people usually just want to get help and go :stuck_out_tongue:

No, I won’t leave like that :joy:
What I am trying to achieve is, that I have some web applications that are running on different servers, such as collabora online office, Nextcloud, Turn server… and I wanted to monitor them and get alerts when they are not responding. Currently, I am using http_response to get their status, and if their status_code is not 200 am sending an alert to my telegram group through grafana’s contact point setup. Everything else working fine but when the server it self is down, there is nothing sent to influxdb as a result grafana won’t show any result for that system.

Since I can’t use the same influxdb script on grafana’s alert policy am thinking of using other measurements to like uptime to monitor the systems.

hello, @Anaisdg am sorry to bring this up again.
I really kinda need this thing to work on grafana, I know as you mentioned I can monitor servers that are down inside influxdb but I need telegraf to send a status of the server to Grafana when it’s down. Should I mention this thing to Grafana or Telegraf?