Telegraf - multiple instances of jolokia2_agent is not working

In my telegraf.conf, to avoid Client timeout error on few of the URL’s. I have tried to split jolokia agent group as follows with different response timeout values in each group.

PRIMARY (MAIN)

[[inputs.jolokia2_agent]]
response_timeout=“250ms”
urls = [ // Around 200 URL’s //
]

SECONDARY (NEW)

[[inputs.jolokia2_agent]]
response_timeout=“350ms”
urls = [ // Around 10 URL’s //
]

When I execute the above configs, I can see metrics are getting collected only for SECONDARY (NEW) group but not for the PRIMARY (MAIN) group.

could you please help me to resolve this?

NOTE: having enlarged response timeout value reduces the client timeout error in telegraf.log. I want to apply the enlarged timeout value only for problematic url’s, not for all the URL’s.

Are they both in the same config file? Or are both the config files actually loaded?

yes, same config file

Do you get any errors in the log?

Can you paste the config file as is?

When I try to copy the config file content this forum popup’s a dialog stating “Sorry, new users can only put 2 links in a post.” hence copied it here part by part…

Global tags can be specified here in key=“value” format.

[global_tags]

Configuration for telegraf agent

[agent]
interval = “10s”
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = “0s”
flush_interval = “10s”
flush_jitter = “0s”
precision = “”
logfile = “/var/opt/logs/MYTelegraf/telegraf.log”
logfile_rotation_interval = “1d”
logfile_rotation_max_archives = 60
hostname = “”
omit_hostname = false

###############################################################################

OUTPUT PLUGINS

###############################################################################

Configuration for sending metrics to InfluxDB

[[outputs.influxdb]]
urls = [“http://121.2.2.1:8086”]
database = “newmonitor”
skip_database_creation = false

HTTP Basic Auth

username = “zeus”
password = “appMkd1p6_dp”

Just put a space somewhere in the URLs so that they no longer look like URLs.

Between the http and the :// should be good.

Antony.

1 Like

###############################################################################

INPUT PLUGINS

###############################################################################

Read formatted metrics from one or more HTTP endpoints

[[inputs.http]]
urls = [
“http ://a.b.c.d:13098/myapp1/info”,
“http ://p.q.r.s:11097/myapp2/info”,
“http ://m.n.o.p:13098/myapp3/info”,
“http ://s.t.u.v:11097/myapp4/info”
// TOTAL 203 INFO URL’s listed here
]
## Amount of time allowed to complete the HTTP request
timeout = “250ms”
name_override = “app_info”
data_format = “json”
json_query = “build”
json_string_fields = [“name”, “version”]
[[inputs.http]]
urls = [
“http ://a.b.c.d:13098/myapp1/health”,
“http ://p.q.r.s:11097/myapp2/health”,
“http ://m.n.o.p:13098/myapp3/health”,
“http ://s.t.u.v:11097/myapp4/health”
// TOTAL 203 HEALTH URL’s listed here
]
## Amount of time allowed to complete the HTTP request
timeout = “250ms”
name_override = “app_health”
data_format = “json”
json_string_fields = [“status”]

# Read JMX metrics through Jolokia

[[inputs.jolokia2_agent]]
response_timeout=“250ms”
urls = [
“http ://a.b.c.d:13098/myapp1/jolokia”,
“http ://p.q.r.s:11097/myapp2/jolokia”
// TOTAL 193 Jolokia URL’s listed here
]

[[inputs.jolokia2_agent]]
response_timeout=“350ms”
urls = [
“http ://m.n.o.p:13098/myapp3/jolokia”,
“http ://s.t.u.v:11097/myapp4/jolokia”
// TOTAL 10 Jolokia URL’s listed here
]

[[inputs.jolokia2_agent.metric]]
    name  = "java_runtime"
    mbean = "java.lang:type=Runtime"
    paths = ["Uptime"]

[[inputs.jolokia2_agent.metric]]
    name  = "java_memory"
    mbean = "java.lang:type=Memory"
    paths = ["HeapMemoryUsage", "NonHeapMemoryUsage"]

[[inputs.jolokia2_agent.metric]]
    name  = "java_os"
    mbean = "java.lang:type=OperatingSystem"
    paths = ["ProcessCpuLoad", "ProcessCpuTime", "SystemCpuLoad", "TotalPhysicalMemorySize", "TotalSwapSpaceSize", "FreePhysicalMemorySize", "FreeSwapSpaceSize"]

# windows hosts don't have those metrics, hence splitting inputs
[[inputs.jolokia2_agent.metric]]
    name  = "java_os"
    mbean = "java.lang:type=OperatingSystem"
    paths = ["OpenFileDescriptorCount", "MaxFileDescriptorCount"]

[[inputs.jolokia2_agent.metric]]
    name     = "java_memory_pool"
    mbean    = "java.lang:name=*,type=MemoryPool"
    paths    = ["Usage", "PeakUsage", "CollectionUsage"]
    tag_keys = ["name"]

[[inputs.jolokia2_agent.metric]]
    name     = "java_garbage_collector"
    mbean    = "java.lang:name=*,type=GarbageCollector"
    paths    = ["CollectionTime", "CollectionCount"]
    tag_keys = ["name"]

[[inputs.jolokia2_agent.metric]]
    name  = "java_last_garbage_collection"
    mbean = "java.lang:name=*,type=GarbageCollector"
    paths = ["LastGcInfo"]
    tag_keys = ["name"]

[[inputs.jolokia2_agent.metric]]
    name  = "java_threading"
    mbean = "java.lang:type=Threading"
    paths = ["TotalStartedThreadCount", "ThreadCount", "DaemonThreadCount", "PeakThreadCount"]

[[inputs.jolokia2_agent.metric]]
    name  = "java_class_loading"
    mbean = "java.lang:type=ClassLoading"
    paths = ["LoadedClassCount", "UnloadedClassCount", "TotalLoadedClassCount"]

[[inputs.jolokia2_agent.metric]]
    name  = "app_info"
    mbean = "org.springframework.boot:name=infoEndpoint,type=Endpoint"
    paths = ["Data/build/name", "Data/build/version"]

[[inputs.jolokia2_agent.metric]]
    name  = "app_health"
    mbean = "org.springframework.boot:name=healthEndpoint,type=Endpoint"
    paths = ["Data/status", "Data.diskSpace.total", "Data.diskSpace.free"]

[[inputs.jolokia2_agent.metric]]
    name = "tomcat_threads"
    mbean  = "Tomcat:name=*,type=ThreadPool"
    paths = ["maxThreads", "currentThreadsCount", "currentThreadsBusy"]
    tag_keys = ["name"]

[[inputs.jolokia2_agent.metric]]
    name  = "app_db_connection_pool"
    mbean = "com.app.analytics.cal:type=DataSource,name=*"
    tag_keys = ["name"]

###############################################################################

PROCESSOR PLUGINS

###############################################################################

[[processors.regex]]
order = 1

[[processors.regex.tags]]
key = “jolokia_agent_url”
pattern = “^http://[^/]+/([^/]+)/jolokia$”
replacement = “${1}”
result_key = “service_name”

[[processors.regex.tags]]
key = “jolokia_agent_url”
pattern = “^http://([^:]+):\d+/[^/]+/jolokia$”
replacement = “${1}”
result_key = “instance_host”

[[processors.regex.tags]]
key = “jolokia_agent_url”
pattern = “^http://([^/]+)/[^/]+/jolokia$”
replacement = “${1}”
result_key = “instance_address”

[[processors.regex.tags]]
key = “url”
pattern = “^http://[^/]+/([^/]+)/(info|health)$”
replacement = “${1}”
result_key = “service_name”

[[processors.regex.tags]]
key = “url”
pattern = “^http://([^:]+):\d+/[^/]+/(info|health)$”
replacement = “${1}”
result_key = “instance_host”

[[processors.regex.tags]]
key = “url”
pattern = “^http://([^/]+)/[^/]+/(info|health)$”
replacement = “${1}”
result_key = “instance_address”

[[processors.regex]]
order = 2

[[processors.regex.tags]]
key = "url"
pattern = "^http://([^/]+/[^/]+/)(info|health)$"
replacement = "http://${1}jolokia"
result_key = "jolokia_agent_url"

[[processors.rename]]
order = 3
namepass = [“app_info”,“app_health”]

[[processors.rename.replace]]
  field = "name"
  dest = "Data.build.name"

[[processors.rename.replace]]
  field = "version"
  dest = "Data.build.version"

[[processors.rename.replace]]
  field = "status"
  dest = "Data.status"

You can upload it as attachment or put in preformatted text block. Please update your post before we can analyse…

# Telegraf Configuration
#
# Telegraf is entirely plugin driven. All metrics are gathered from the
# declared inputs, and sent to the declared outputs.
#
# Plugins must be declared in here to be active.
# To deactivate a plugin, comment out the name and any variables.
#
# Use 'telegraf -config telegraf.conf -test' to see what metrics a config
# file would generate.
#
# Environment variables can be used anywhere in this config file, simply surround
# them with ${}. For strings the variable must be within quotes (ie, "${STR_VAR}"),
# for numbers and booleans they should be plain (ie, ${INT_VAR}, ${BOOL_VAR})


# Global tags can be specified here in key="value" format.
[global_tags]
  # dc = "us-east-1" # will tag all metrics with dc=us-east-1
  # rack = "1a"
  ## Environment variables can be used as tags, and throughout the config file
  # user = "$USER"


# Configuration for telegraf agent
[agent]
  ## Default data collection interval for all inputs
  interval = "10s"
  ## Rounds collection interval to 'interval'
  ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
  round_interval = true

  ## Telegraf will send metrics to outputs in batches of at most
  ## metric_batch_size metrics.
  ## This controls the size of writes that Telegraf sends to output plugins.
  metric_batch_size = 1000

  ## Maximum number of unwritten metrics per output.  Increasing this value
  ## allows for longer periods of output downtime without dropping metrics at the
  ## cost of higher maximum memory usage.
  metric_buffer_limit = 10000

  ## Collection jitter is used to jitter the collection by a random amount.
  ## Each plugin will sleep for a random time within jitter before collecting.
  ## This can be used to avoid many plugins querying things like sysfs at the
  ## same time, which can have a measurable effect on the system.
  collection_jitter = "0s"

  ## Default flushing interval for all outputs. Maximum flush_interval will be
  ## flush_interval + flush_jitter
  flush_interval = "10s"
  ## Jitter the flush interval by a random amount. This is primarily to avoid
  ## large write spikes for users running a large number of telegraf instances.
  ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
  flush_jitter = "0s"

  ## By default or when set to "0s", precision will be set to the same
  ## timestamp order as the collection interval, with the maximum being 1s.
  ##   ie, when interval = "10s", precision will be "1s"
  ##       when interval = "250ms", precision will be "1ms"
  ## Precision will NOT be used for service inputs. It is up to each individual
  ## service input to set the timestamp at the appropriate precision.
  ## Valid time units are "ns", "us" (or "µs"), "ms", "s".
  precision = ""

  ## Log at debug level.
  # debug = false
  ## Log only error level messages.
  # quiet = false

  ## Log target controls the destination for logs and can be one of "file",
  ## "stderr" or, on Windows, "eventlog".  When set to "file", the output file
  ## is determined by the "logfile" setting.
  # logtarget = "file"

  ## Name of the file to be logged to when using the "file" logtarget.  If set to
  ## the empty string then logs are written to stderr.
  logfile = "/var/opt/logs/newappsTelegraf/telegraf.log"

  ## The logfile will be rotated after the time interval specified.  When set
  ## to 0 no time based rotation is performed.  Logs are rotated only when
  ## written to, if there is no log activity rotation may be delayed.
  logfile_rotation_interval = "1d"

  ## The logfile will be rotated when it becomes larger than the specified
  ## size.  When set to 0 no size based rotation is performed.
  # logfile_rotation_max_size = "0MB"

  ## Maximum number of rotated archives to keep, any older logs are deleted.
  ## If set to -1, no archives are removed.
  logfile_rotation_max_archives = 60

  ## Override default hostname, if empty use os.Hostname()
  hostname = ""
  ## If set to true, do no set the "host" tag in the telegraf agent.
  omit_hostname = false

###############################################################################
#                            OUTPUT PLUGINS                                   #
###############################################################################


# Configuration for sending metrics to InfluxDB
[[outputs.influxdb]]
  ## The full HTTP or UDP URL for your InfluxDB instance.
  ##
  ## Multiple URLs can be specified for a single cluster, only ONE of the
  ## urls will be written to each interval.
  # urls = ["unix:///var/run/influxdb.sock"]
  # urls = ["udp://127.0.0.1:8089"]
  urls = ["http://127.0.0.1:8086"]

  ## The target database for metrics; will be created as needed.
  ## For UDP url endpoint database needs to be configured on server side.
  database = "monitoring"

  ## The value of this tag will be used to determine the database.  If this
  ## tag is not set the 'database' option is used as the default.
  # database_tag = ""

  ## If true, the 'database_tag' will not be included in the written metric.
  # exclude_database_tag = false

  ## If true, no CREATE DATABASE queries will be sent.  Set to true when using
  ## Telegraf with a user without permissions to create databases or when the
  ## database already exists.
  skip_database_creation = false

  ## Name of existing retention policy to write to.  Empty string writes to
  ## the default retention policy.  Only takes effect when using HTTP.
  # retention_policy = ""

  ## The value of this tag will be used to determine the retention policy.  If this
  ## tag is not set the 'retention_policy' option is used as the default.
  # retention_policy_tag = ""

  ## If true, the 'retention_policy_tag' will not be included in the written metric.
  # exclude_retention_policy_tag = false

  ## Write consistency (clusters only), can be: "any", "one", "quorum", "all".
  ## Only takes effect when using HTTP.
  # write_consistency = "any"

  ## Timeout for HTTP messages.
  # timeout = "5s"

  ## HTTP Basic Auth
  username = "XXXXXX"
  password = "yyyyyyyy"

  ## HTTP User-Agent
  # user_agent = "telegraf"

  ## UDP payload size is the maximum packet size to send.
  # udp_payload = "512B"

  ## Optional TLS Config for use on HTTP connections.
  # tls_ca = "/etc/telegraf/ca.pem"
  # tls_cert = "/etc/telegraf/cert.pem"
  # tls_key = "/etc/telegraf/key.pem"
  ## Use TLS but skip chain & host verification
  # insecure_skip_verify = false

  ## HTTP Proxy override, if unset values the standard proxy environment
  ## variables are consulted to determine which proxy, if any, should be used.
  # http_proxy = "http://corporate.proxy:3128"

  ## Additional HTTP headers
  # http_headers = {"X-Special-Header" = "Special-Value"}

  ## HTTP Content-Encoding for write request body, can be set to "gzip" to
  ## compress body or "identity" to apply no encoding.
  # content_encoding = "identity"

  ## When true, Telegraf will output unsigned integers as unsigned values,
  ## i.e.: "42u".  You will need a version of InfluxDB supporting unsigned
  ## integer values.  Enabling this option will result in field type errors if
  ## existing data has been written.
  # influx_uint_support = false

###############################################################################
#                            INPUT PLUGINS                                    #
###############################################################################


# Read formatted metrics from one or more HTTP endpoints

[[inputs.http]]
    urls = [
        "http ://a.b.c.d:13098/myapp1/info",
        "http ://p.q.r.s:11097/myapp2/info",
        "http ://m.n.o.p:13098/myapp3/info",
        "http ://s.t.u.v:11097/myapp4/info"
    ]
    ## Amount of time allowed to complete the HTTP request
    timeout = "250ms"

    name_override = "app_info"
    data_format = "json"
    json_query = "build"
    json_string_fields = ["name", "version"]

 [[inputs.http]]
    urls = [
        "http ://a.b.c.d:13098/myapp1/health",
        "http ://p.q.r.s:11097/myapp2/health",
        "http ://m.n.o.p:13098/myapp3/health",
        "http ://s.t.u.v:11097/myapp4/health"
    ]
    ## Amount of time allowed to complete the HTTP request
    timeout = "250ms"

    name_override = "app_health"
    data_format = "json"
    json_string_fields = ["status"]

# # Read JMX metrics through Jolokia

[[inputs.jolokia2_agent]]
    response_timeout="250ms"
    urls = [
        "http ://a.b.c.d:13098/myapp1/jolokia",
        "http ://p.q.r.s:11097/myapp2/jolokia"
	## TOTAL 193 Jolokia URL's listed here
    ]

[[inputs.jolokia2_agent]]
    response_timeout="350ms"
    urls = [
        "http ://m.n.o.p:13098/myapp3/jolokia",
        "http ://s.t.u.v:11097/myapp4/jolokia"
	## TOTAL 10 Jolokia URL's listed here
    ]


    [[inputs.jolokia2_agent.metric]]
        name  = "java_runtime"
        mbean = "java.lang:type=Runtime"
        paths = ["Uptime"]

    [[inputs.jolokia2_agent.metric]]
        name  = "java_memory"
        mbean = "java.lang:type=Memory"
        paths = ["HeapMemoryUsage", "NonHeapMemoryUsage"]

    [[inputs.jolokia2_agent.metric]]
        name  = "java_os"
        mbean = "java.lang:type=OperatingSystem"
        paths = ["ProcessCpuLoad", "ProcessCpuTime", "SystemCpuLoad", "TotalPhysicalMemorySize", "TotalSwapSpaceSize", "FreePhysicalMemorySize", "FreeSwapSpaceSize"]

    # windows hosts don't have those metrics, hence splitting inputs
    [[inputs.jolokia2_agent.metric]]
        name  = "java_os"
        mbean = "java.lang:type=OperatingSystem"
        paths = ["OpenFileDescriptorCount", "MaxFileDescriptorCount"]

    [[inputs.jolokia2_agent.metric]]
        name     = "java_memory_pool"
        mbean    = "java.lang:name=*,type=MemoryPool"
        paths    = ["Usage", "PeakUsage", "CollectionUsage"]
        tag_keys = ["name"]

    [[inputs.jolokia2_agent.metric]]
        name     = "java_garbage_collector"
        mbean    = "java.lang:name=*,type=GarbageCollector"
        paths    = ["CollectionTime", "CollectionCount"]
        tag_keys = ["name"]

    [[inputs.jolokia2_agent.metric]]
        name  = "java_last_garbage_collection"
        mbean = "java.lang:name=*,type=GarbageCollector"
        paths = ["LastGcInfo"]
        tag_keys = ["name"]

    [[inputs.jolokia2_agent.metric]]
        name  = "java_threading"
        mbean = "java.lang:type=Threading"
        paths = ["TotalStartedThreadCount", "ThreadCount", "DaemonThreadCount", "PeakThreadCount"]

    [[inputs.jolokia2_agent.metric]]
        name  = "java_class_loading"
        mbean = "java.lang:type=ClassLoading"
        paths = ["LoadedClassCount", "UnloadedClassCount", "TotalLoadedClassCount"]

    [[inputs.jolokia2_agent.metric]]
        name  = "app_info"
        mbean = "org.springframework.boot:name=infoEndpoint,type=Endpoint"
        paths = ["Data/build/name", "Data/build/version"]

    [[inputs.jolokia2_agent.metric]]
        name  = "app_health"
        mbean = "org.springframework.boot:name=healthEndpoint,type=Endpoint"
        paths = ["Data/status", "Data.diskSpace.total", "Data.diskSpace.free"]

    [[inputs.jolokia2_agent.metric]]
        name = "tomcat_threads"
        mbean  = "Tomcat:name=*,type=ThreadPool"
        paths = ["maxThreads", "currentThreadsCount", "currentThreadsBusy"]
        tag_keys = ["name"]

    [[inputs.jolokia2_agent.metric]]
        name  = "app_db_connection_pool"
        mbean = "com.agl.analytics.dpa:type=DataSource,name=*"
        tag_keys = ["name"]

###############################################################################
#                            PROCESSOR PLUGINS                                #
###############################################################################

[[processors.regex]]
  order = 1

  [[processors.regex.tags]]
    key = "jolokia_agent_url"
    pattern = "^http://[^/]+/([^/]+)/jolokia$"
    replacement = "${1}"
    result_key = "service_name"

  [[processors.regex.tags]]
    key = "jolokia_agent_url"
    pattern = "^http://([^:]+):\\d+/[^/]+/jolokia$"
    replacement = "${1}"
    result_key = "instance_host"

  [[processors.regex.tags]]
    key = "jolokia_agent_url"
    pattern = "^http://([^/]+)/[^/]+/jolokia$"
    replacement = "${1}"
    result_key = "instance_address"

  [[processors.regex.tags]]
    key = "url"
    pattern = "^http://[^/]+/([^/]+)/(info|health)$"
    replacement = "${1}"
    result_key = "service_name"

  [[processors.regex.tags]]
    key = "url"
    pattern = "^http://([^:]+):\\d+/[^/]+/(info|health)$"
    replacement = "${1}"
    result_key = "instance_host"

  [[processors.regex.tags]]
    key = "url"
    pattern = "^http://([^/]+)/[^/]+/(info|health)$"
    replacement = "${1}"
    result_key = "instance_address"

[[processors.regex]]
    order = 2

    [[processors.regex.tags]]
    key = "url"
    pattern = "^http://([^/]+/[^/]+/)(info|health)$"
    replacement = "http://${1}jolokia"
    result_key = "jolokia_agent_url"

[[processors.rename]]
    order = 3
    namepass = ["app_info","app_health"]

    [[processors.rename.replace]]
      field = "name"
      dest = "Data.build.name"

    [[processors.rename.replace]]
      field = "version"
      dest = "Data.build.version"

    [[processors.rename.replace]]
      field = "status"
      dest = "Data.status"

The formatting isn’t completely correct yet, but the fist Jolokia input isn’t having any metrics specified.

A) copied my telegraf.conf in the above preformatted text component, please check.
B) When we have first Jolokia input alone, it is working fine for us (response timeout 250 ms). fetches complete Jolokia response and save metrics in Influx DB.

But, when I create second Jolokia input (response timeout 350 ms), then metrics are fetched only for second input not for first input.

Are we allowed to have multiple inputs.jolokia2_agent ? any other metrics config need to be amended along with jolokia2_agent? is there any max URL array limit for jolokia2_agent?

As said, the first jolokia input config does not define any metrics. Those need to be present in both Jolokia input sections for it to be working.

1 Like

Thanks for your suggestion, it worked…

Can you mark it as answered please?