Telegraf Plugin for scraping data from html page body content

Hi,

I am looking for a plugin to scrape the body content from a url

For example:

URL = http://192.168.2.13/js/status.js

Response:

var version=“H4.01.38Y1.0.09W1.0.08”;var m2mMid=“632788327”;var wlanMac=“BC:54:F9:F2:78:5C”;var m2mRssi=“44%”;var wanIp=“192.168.2.13”;var nmac=“BC54F9F2785F”;var fephy=“off”;var webData=“NLBN402017AL2144,NL2-V1.0-45943,V5.3-90170,omnik4000tl2,4000,900,396,201488,1,”; …

Data to extract:

Rated power = 4000 W
Current power = 900 W
Yield today = 396 kWh
Total yield = 201488 kWh
Alerts =
Last updated = 1 min Ago

My data is in the bold section this I can parse with regex processor.

Is this possible with a standard Telegraf plugin?

You can use the http plugin to call to a web page and get the response. However, that response needs to be in something we can parse. Like JSON, CSV, or values to correctly and easily parse out the data.

The other option is to use the exec plugin to curl or wget the file and parse it with a script and send the output of that parsing to telegraf.

Thanx for the help!

It would be the second option then because the format is not a supported format!

You might also try to use the http plugin with the grok parser to extract the webData part and then use the parser processor to split the inner CSV… Like

[[inputs.http]]
  ...
  data_format = "grok"
  grok_patterns = ['''var webData="%{DATA:value}";''']

[[processors.parser]]
  parse_fields = ["value"]
  drop_original = true
  data_format = "csv"
  csv_column_names = ["SN", "device", "version", "name", "power_rated", "power_current", "yield_today", "yield_total", "last_updated", "alerts"]
  csv_column_types = ["string", "string", "string", "string", "float", "float", "float", "float", "int", "string"]
  csv_tag_columns = ["name"]

which leads to

file,name=omnik4000tl2 SN="NLBN402017AL2144",alerts="",device="NL2-V1.0-45943",last_updated=1i,power_current=900,power_rated=4000,version="V5.3-90170",yield_today=396,yield_total=201488 1666879627513394591

in your example. Please note that I replaced the strange double-quotes in your example, so you might need to adapt those in the grok-pattern…

1 Like

Hi,

Thanx for this solution looks very good!