How to run telegraf jolokia2 plugin in a multi-tenant enviroment?

telegraf
#1

Hello there,

My scenario is the next one: I have applications deployed in clusters. The servers of each cluster are distributed in different machines. You can have a look at the attached diagram.

I would like to provide dashboards to the users where they can choose which application to monitor. They would select first the application from the list and then they could select a specific server.

My question is which would be the best option for implementing this.

  1. One telegraf instance per machine
  2. Multiple telegraf instances running in the same machine and pointing to their specific server

For the option 1 my telegraf.conf would look like something like this:

# URL of the first target
[[inputs.jolokia2_agent]]
  urls = [
          "http://server_a_1:8001/jolokia"
         ]

  ## List of metrics
  [[inputs.jolokia2_agent.metric]]
    name  = "jvm_runtime"
    mbean = "java.lang:type=Runtime"
    paths = ["Uptime"]
.../...
  ## Name of the application
  [inputs.jolokia2_agent.tags]
    application = "application1"

# URL of the second target
[[inputs.jolokia2_agent]]
  urls = [
          "http://server_b_1.com:8002/jolokia"
         ]

  ## List of metrics
  [[inputs.jolokia2_agent.metric]]
    name  = "jvm_runtime"
    mbean = "java.lang:type=Runtime"
    paths = ["Uptime"]
.../...
  ## Name of the application
  [inputs.jolokia2_agent.tags]
    application = "application2"

The drawback is that I have to repeat the metrics part for all the servers in the machine.

In the option 2, I have to run multiple instances of telegraf. I have quickly test it and the overhead is quite minimal:

CONTAINER           CPU %               MEM USAGE / LIMIT       MEM %               NET I/O             BLOCK I/O           PIDS
telegraf-application-1     0.00%               10.62 MiB / 4.564 GiB   0.23%               955 MB / 1.35 GB    328 kB / 0 B        0
telegraf-application-2     0.00%               13.96 MiB / 4.564 GiB   0.30%               2.4 GB / 3.19 GB    688 kB / 0 B        0
telegraf-application-3     0.01%               14.21 MiB / 4.564 GiB   0.30%               2.4 GB / 3.19 GB    184 kB / 0 B        0
telegraf-application-4     0.00%               14.39 MiB / 4.564 GiB   0.31%               2.4 GB / 3.19 GB    135 kB / 0 B        0

Any thoughts on this?

Thanks in advance,

Luis

#2

Both methods are valid, and often the two methods are combined. One improvement you can make to option 2 is to run a single Telegraf that combines all the jolokia agent plugins into a single plugin by listing all agents in the urls array, or you can define multiple instances of the plugin in the case that the metric definition differs:

[[inputs.jolokia2_agent]]
  # snip

[[inputs.jolokia2_agent]]
  # snip

This should reduce the overhead of Telegraf and makes it a bit easier to setup the init scripts.

#3

Hello @daniel,

Thank you very much for your prompt reply.

I am not sure if I am understanding you. With your suggestion we are transforming the option 2 into the 1, aren’t we?

The problem about having all the agent urls in the same array is that then I can not distinguish which application I am monitor.

Cheers,

Luis

#4

Sorry, I misunderstood your question so just ignore my last comment, I can think of a few other ideas that you could consider:

Instead of adding a tag with the application name, you could combine them into one plugin and just use the url tag, applied to all metrics, to differentiate the services. To avoid confusion you would need to use a URL that uniquely indicates the service, which might be tricky.

In the near future, we will have a processor that will be able to add tags based on other tags, it should be possible to use this to add the application tag. Here is the pull request: https://github.com/influxdata/telegraf/pull/3773

I notice you are using containers, so that sort of solves the init script problem. We use Kubernetes in our cloud offering, and I believe our team is deploying Telegraf as a sidecar to each kubernetes pod. In our model, each application would be a pod with it’s own Telegraf instance. Additionally, my understanding is that we deploy a Telegraf as a daemon set to collect system level stats such as memory and cpu usage. So we would have a total of 6 Telegraf instances in this example. I think this may be the best solution when dealing with highly dynamic environments.

1 Like
#5

Hello Daniel,

Thank you for your answer.

Finally we solved this using one telegraf.conf files with the main configuration file + the output (influxdb) and one configuration file per server process in the machine. Each of these last ones contains the url of the jolokia endpoint, the inputs.jolokia2_agent.metric + inputs.jolokia2_agent.tags:

/usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d

This configuration is automatically generated (puppet)

We will have a look at the tags processor, it looks very interesting.

Cheers,

Luis

1 Like