hi @helenosheaa,
that ist the config … from the “relay” node.
# Telegraf Configuration
#
# THIS FILE IS MANAGED BY PUPPET
#
[global_tags]
dc = "fc"
domain = "example.com"
rack = "r02"
role = "srv"
[agent]
hostname = "fc-r02-srv-monproxy"
omit_hostname = false
interval = "600s"
round_interval = false
metric_batch_size = 20000
metric_buffer_limit = 200000
collection_jitter = "0s"
flush_interval = "600s"
flush_jitter = "0s"
precision = ""
logfile = ""
debug = true
quiet = false
#
# OUTPUTS:
#
[[outputs.influxdb]]
database = "telegraf"
metric_buffer_limit = "25000"
password = "spinat"
retention_policy = ""
timeout = "180s"
urls = ["https://graph-01.example.com:8086"]
username = "telegraf"
write_consistency = "any"
[[outputs.prometheus_client]]
collectors_exclude = ["gocollector", "process"]
expiration_interval = "300s"
export_timestamp = false
ip_range = ["192.168.43.0/24", "127.0.0.1/8"]
listen = ":9273"
metric_version = 2
path = "/metrics"
string_as_label = false
tls_cert = "/etc/ssl/private/example_chain.crt"
tls_key = "/etc/ssl/private/example.com.key"
[inputs.http_listener]
max_body_size = "0"
max_line_size = "0"
read_timeout = "10s"
service_address = ":8086"
write_timeout = "10s"
[inputs.snmp]
agents = ["172.21.1.1:161"]
auth_password = "spargel"
auth_protocol = "SHA"
interval = "1m"
priv_password = "karotte"
priv_protocol = "AES"
sec_level = "authPriv"
sec_name = "monitoring"
version = 3
[[inputs.snmp.field]]
is_tag = true
name = "switchname"
oid = "RFC1213-MIB::sysName.0"
[[inputs.snmp.table]]
inherit_tags = ["switchname"]
name = "network_interface"
oid = "IF-MIB::ifTable"
[[inputs.snmp.table.field]]
is_tag = true
name = "ifName"
oid = "IF-MIB::ifName"
[[inputs.snmp.table]]
inherit_tags = ["switchname"]
name = "network_interface_x"
oid = "IF-MIB::ifXTable"
[[inputs.snmp.table.field]]
is_tag = true
name = "ifName"
oid = "IF-MIB::ifName"
[[inputs.snmp.table]]
inherit_tags = ["switchname"]
name = "network_interface_stats"
oid = "EtherLike-MIB::dot3StatsTable"
[[inputs.snmp.table.field]]
is_tag = true
name = "ifName"
oid = "IF-MIB::ifName"
[[inputs.cpu]]
percpu = false
totalcpu = true
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "devfs", "udev"]
[[inputs.diskio]]
[[inputs.io]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.net]]
[[inputs.netstat]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
The config from a normal node is the same, except the influxDB output and the buffer limits etc. pp, which is the relay (fc-r02-srv-monproxy) host.
I tried a lot … and changed the buffer / timeout etc. values … The InfluxDB values on the Grafana dashboard looks valid, while for Prometheus (Thanos) …
What I find strange … the identical config for a normal node … works perfectly, if I scrape the metrics directly:
# Telegraf Configuration
#
# THIS FILE IS MANAGED BY PUPPET
#
[global_tags]
dc = "default"
domain = "example.com"
rack = "default"
role = "git"
[agent]
hostname = "git"
omit_hostname = false
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
logfile = ""
debug = false
quiet = false
#
# OUTPUTS:
#
[[outputs.influxdb]]
database = "telegraf"
password = "kartoffel"
skip_database_creation = true
timeout = "180s"
urls = ["https://graph-01.exampe.com:8086"]
username = "telegraf"
[[outputs.prometheus_client]]
collectors_exclude = ["gocollector", "process"]
expiration_interval = "60s"
export_timestamp = false
ip_range = ["192.168.43.0/24", "127.0.0.1/8"]
listen = ":9273"
metric_version = 2
path = "/metrics"
string_as_label = false
tls_cert = "/etc/ssl/private/example_local_chain.crt"
tls_key = "/etc/ssl/private/example.local.key"
#
# INPUTS:
#
[[inputs.cpu]]
percpu = false
totalcpu = true
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "devfs", "udev"]
[[inputs.diskio]]
[[inputs.io]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.net]]
[[inputs.netstat]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
This works perfect … as Thanos (Prometheus) can reach this node directly.
Very strange. Also I have no idea … where I can check … Also the logs looks fine:
...
May 3 13:26:10 fc-r02-srv-monproxy telegraf[20824]: 2021-05-03T11:26:10Z D! [outputs.prometheus_client] Wrote batch of 20000 metrics in 230.060484ms
May 3 13:26:10 fc-r02-srv-monproxy telegraf[20824]: 2021-05-03T11:26:10Z D! [outputs.prometheus_client] Buffer fullness: 325 / 200000 metrics
May 3 13:26:11 fc-r02-srv-monproxy telegraf[20824]: 2021-05-03T11:26:11Z D! [outputs.influxdb] Wrote batch of 20000 metrics in 922.251916ms
May 3 13:26:11 fc-r02-srv-monproxy telegraf[20824]: 2021-05-03T11:26:11Z D! [outputs.influxdb] Buffer fullness: 614 / 200000 metrics
...
May 3 13:28:19 fc-r02-srv-monproxy telegraf[20824]: 2021-05-03T11:28:19Z D! [outputs.prometheus_client] Wrote batch of 20000 metrics in 536.046541ms
May 3 13:28:19 fc-r02-srv-monproxy telegraf[20824]: 2021-05-03T11:28:19Z D! [outputs.prometheus_client] Buffer fullness: 286 / 200000 metrics
May 3 13:28:20 fc-r02-srv-monproxy telegraf[20824]: 2021-05-03T11:28:20Z D! [outputs.influxdb] Wrote batch of 20000 metrics in 917.353407ms
May 3 13:28:20 fc-r02-srv-monproxy telegraf[20824]: 2021-05-03T11:28:20Z D! [outputs.influxdb] Buffer fullness: 2457 / 200000 metrics
May 3 13:28:50 fc-r02-srv-monproxy telegraf[20824]: 2021-05-03T11:28:50Z D! [outputs.prometheus_client] Wrote batch of 20000 metrics in 205.839428ms
May 3 13:28:50 fc-r02-srv-monproxy telegraf[20824]: 2021-05-03T11:28:50Z D! [outputs.prometheus_client] Buffer fullness: 1349 / 200000 metrics
May 3 13:28:51 fc-r02-srv-monproxy telegraf[20824]: 2021-05-03T11:28:51Z D! [outputs.influxdb] Wrote batch of 20000 metrics in 861.643249ms
May 3 13:28:51 fc-r02-srv-monproxy telegraf[20824]: 2021-05-03T11:28:51Z D! [outputs.influxdb] Buffer fullness: 2065 / 200000 metrics
From the Git example node …
May 3 13:29:50 git telegraf[26924]: 2021-05-03T11:29:50Z D! [outputs.prometheus_client] Wrote batch of 40 metrics in 8.747049ms
May 3 13:29:50 git telegraf[26924]: 2021-05-03T11:29:50Z D! [outputs.prometheus_client] Buffer fullness: 14 / 10000 metrics
May 3 13:29:50 git telegraf[26924]: 2021-05-03T11:29:50Z D! [outputs.influxdb] Wrote batch of 54 metrics in 80.20002ms
May 3 13:29:50 git telegraf[26924]: 2021-05-03T11:29:50Z D! [outputs.influxdb] Buffer fullness: 27 / 10000 metrics
May 3 13:30:00 git telegraf[26924]: 2021-05-03T11:30:00Z D! [outputs.prometheus_client] Wrote batch of 44 metrics in 1.771804ms
May 3 13:30:00 git telegraf[26924]: 2021-05-03T11:30:00Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
May 3 13:30:00 git telegraf[26924]: 2021-05-03T11:30:00Z D! [outputs.influxdb] Wrote batch of 30 metrics in 20.38124ms
May 3 13:30:00 git telegraf[26924]: 2021-05-03T11:30:00Z D! [outputs.influxdb] Buffer fullness: 37 / 10000 metrics
....
May 3 13:30:10 git telegraf[26924]: 2021-05-03T11:30:10Z D! [outputs.prometheus_client] Wrote batch of 41 metrics in 4.241971ms
May 3 13:30:10 git telegraf[26924]: 2021-05-03T11:30:10Z D! [outputs.prometheus_client] Buffer fullness: 36 / 10000 metrics
May 3 13:30:10 git telegraf[26924]: 2021-05-03T11:30:10Z D! [outputs.influxdb] Wrote batch of 41 metrics in 23.595417ms
May 3 13:30:10 git telegraf[26924]: 2021-05-03T11:30:10Z D! [outputs.influxdb] Buffer fullness: 37 / 10000 metrics
I’ve attached an examle “node” output, which gets relayed. pmox-01_prom_output.txt (132.7 KB)
I’ve changed the export_timestamp = true
… to see if it helps. Also changing scrape interval does not help. Changing the collection_interval / agent interval/flush_interval
was not touched.
It would be great … if someone has an idea.
cu denny