My scenario is the next one: I have applications deployed in clusters. The servers of each cluster are distributed in different machines. You can have a look at the attached diagram.
I would like to provide dashboards to the users where they can choose which application to monitor. They would select first the application from the list and then they could select a specific server.
My question is which would be the best option for implementing this.
One telegraf instance per machine
Multiple telegraf instances running in the same machine and pointing to their specific server
For the option 1 my telegraf.conf would look like something like this:
# URL of the first target
[[inputs.jolokia2_agent]]
urls = [
"http://server_a_1:8001/jolokia"
]
## List of metrics
[[inputs.jolokia2_agent.metric]]
name = "jvm_runtime"
mbean = "java.lang:type=Runtime"
paths = ["Uptime"]
.../...
## Name of the application
[inputs.jolokia2_agent.tags]
application = "application1"
# URL of the second target
[[inputs.jolokia2_agent]]
urls = [
"http://server_b_1.com:8002/jolokia"
]
## List of metrics
[[inputs.jolokia2_agent.metric]]
name = "jvm_runtime"
mbean = "java.lang:type=Runtime"
paths = ["Uptime"]
.../...
## Name of the application
[inputs.jolokia2_agent.tags]
application = "application2"
The drawback is that I have to repeat the metrics part for all the servers in the machine.
In the option 2, I have to run multiple instances of telegraf. I have quickly test it and the overhead is quite minimal:
Both methods are valid, and often the two methods are combined. One improvement you can make to option 2 is to run a single Telegraf that combines all the jolokia agent plugins into a single plugin by listing all agents in the urls array, or you can define multiple instances of the plugin in the case that the metric definition differs:
Sorry, I misunderstood your question so just ignore my last comment, I can think of a few other ideas that you could consider:
Instead of adding a tag with the application name, you could combine them into one plugin and just use the url tag, applied to all metrics, to differentiate the services. To avoid confusion you would need to use a URL that uniquely indicates the service, which might be tricky.
I notice you are using containers, so that sort of solves the init script problem. We use Kubernetes in our cloud offering, and I believe our team is deploying Telegraf as a sidecar to each kubernetes pod. In our model, each application would be a pod with it’s own Telegraf instance. Additionally, my understanding is that we deploy a Telegraf as a daemon set to collect system level stats such as memory and cpu usage. So we would have a total of 6 Telegraf instances in this example. I think this may be the best solution when dealing with highly dynamic environments.
Finally we solved this using one telegraf.conf files with the main configuration file + the output (influxdb) and one configuration file per server process in the machine. Each of these last ones contains the url of the jolokia endpoint, the inputs.jolokia2_agent.metric + inputs.jolokia2_agent.tags: