Telegraf: The Go Collection Agent

#1

Originally published at: https://www.influxdata.com/blog/telegraf-the-go-collection-agent/

Collection agents are a key part of any monitoring infrastructure: they can run on your hosts, collecting data about your systems and applications, or they can operate remotely, gathering data over the network via endpoints exposed by your applications.

In 2015, InfluxData released Telegraf: a lightweight, plugin-driven collection agent written in Go. While many developers were already using various collection agents to send metrics to InfluxDB and other time series databases, InfluxData wanted a project that was designed around the idea of tagged measurements, the data model used in databases like InfluxDB and OpenTSDB.

Telegraf features a plugin-based architecture and was designed so that users could easily contribute functionality to the project without knowing anything about the the codebase. There are four distinct types of plugins in Telegraf: Input, Processor, Aggregator, and Output plugins, each handling one part of the data path. Since its inception in 2015 the number of Telegraf plugins has grown to over 150 — the majority of which have been contributed by the community — giving Telegraf the ability to collect metrics from a variety of inputs and write them to a variety of outputs.

A few example plugins:

  • postgresql — An input plugin which collects performance metrics about your Postgres installation. It uses data from the built-in pg_stat_database and pg_stat_bgwriter views.
  • apache — An input plugin which collects server performance information using the mod_status module of the Apache HTTP Server
  • prometheus — An input plugin which gathers metrics from HTTP servers exposing metrics in Prometheus format.
  • influxdb — An ouput plugin which writes to InfluxDB via HTTP or UDP.
  • opentsdb — An output plugin which writes to an OpenTSDB instance using either the "telnet" or HTTP mode.

Using Telegraf

If you'd like to run Telegraf, first install Telegraf.

Telegraf has the ability to generate a config file based on the plugins you specify. You can create a new config using the -sample-config argument when running Telegraf:

$ telegraf -sample-config -input-filter apache -output filter influxdb > telegraf.conf

which will create a telegraf.conf file in your working directory. Next, you can launch Telegraf using the new config:

$ telegraf -config telegraf.conf

If you're running Telegraf as a service, you'll need to drop the newly generated config file in the appropriate directory (/etc/telegraf on Linux, for example) and restart Telegraf. On Debian / Ubuntu, the config file would go into /etc/telegraf, and Telegraf itself can be restarted using systemctl restart telegraf.

Interfaces & Writing Plugins

Plugins are implemented in Telegraf using Go interfaces. Each plugin type is defined by an interface, and writing a new plugin is as easy as fulfilling those interfaces and importing your plugin into the main Telegraf package.

If you've created a new plugin, or fixed or added functionality to an existing plugin, submit a PR on GitHub and share your hard work! There are a few additional steps required for contributors, such as adding the appropriate documentation and tests, which you can find more information about in the contributing documentation in the Telegraf repo.

That document also has detailed information about writing different plugin interfaces which are part of Telegraf.

Getting Started

If you're interested in developing your own Telegraf plugin, you'll need Go 1.5+ installed on your machine. The Golang website has a nice Getting Started guide with installation instructions and links to the downloads page.

Once you have Go installed, download the Telegraf source code using go get, check out a new branch in git for your work, and try building a binary using make:

$ go get github.com/influxdata/telegraf
$ cd $GOPATH/github.com/influxdata/telegraf
$ git checkout -b MyPlugin $ make

The source code will be downloaded into the src directory in your Gopath, while the make command will create a Telegraf binary under the bin directory.

Let's take a look at an existing example. Telegraf has a plugin called Trig, which is often used for demo purposes and emits data based on the trigonometric functions sine and cosine. We'll walk through creating the Trig plugin in the next section.

Input Plugins

First, add the following line to telegraf/plugins/inputs/all/all.go:

github.com/influxdata/telegraf/plugin/inputs/trig

to import your plugin and ensure that Telegraf can run the code you write. Because this file needs to be edited by anyone creating a plugin, you might run into merge conflicts when you're ready to merge your code, however it's essential to add this information before starting local development — without it, the code you write won't run.

Telegraf's input plugins need to satisfy just three interfaces. From the Godoc:

type Input interface { 
     // SampleConfig returns the default configuration of the Input 
     SampleConfig() string
     // Description returns a one-sentence description on the Input 
     Description() string
     // Gather takes in an accumulator and adds the metrics that the Input 
     // gathers. This is called every "interval" 
     Gather(Accumulator) error }

Both Description and SampleConfig are used by Telegraf to generate configuration files. Telegraf configurations are written using TOML; each section is prefaced by a one-line comment containing the string returned by the Description() function, followed by the configuration variables for the plugin itself, returned by the SampleConfig() function.

The plugin configuration has a single variable, amplitude:

# Inserts sine and cosine waves for demonstration purposes 
[[inputs.trig]] 
  ## Set the amplitude 
  amplitude = 10.0

and here is the implementation from trig.go:

var TrigConfig = ` 
  ## Set the amplitude 
  amplitude = 10.0 
` 

func (s *Trig) SampleConfig() string { 
     return TrigConfig 
} 

func (s *Trig) Description() string { 
     return "Inserts sine and cosine waves for demonstration purposes" 
}

Note: When writing the TOML configuration, make sure that you use two spaces to indent a line, rather than a tab, so that your entries lines up nicely with the others when Telegraf generates a config.

The last interface method for an input plugin is Gather. This is where we'll do all the work associated with collecting data from an input. At this point it returns nil because we haven't added any code to the Gathermethod, but we can go ahead and build Telegraf and test out generating a configuration to make sure that it works.

In your working directory type:

$ make $ telegraf -sample-config -input-filter trig -output-filter influxdb -debug

The -debug flag will add additional information to the output to help identify any issues you might encounter.

You should see this at the bottom of the output:

# 
# INPUTS: 
# 

# Inserts sine and cosine waves for demonstration purposes 
[[inputs.trig]] 
  # Sets the amplitude 
  amplitude = 10.0

Since we've added a configuration parameter in the config, we'll also need to add a corresponding property to our struct. You'll see the following lines in trig.go:

type Trig struct { 
     x float64 
     Amplitude float64 
}

Where Amplitude is the value we're defining in our config, and x is a variable used to store the plugin state between collection intervals.

With our configuration tested and working, and the appropriate properties added to our struct, we're ready to create the implementation for the Gather method.

Telegraf periodically collects and "flushes" metrics to your output; both the collection and flush interval can be defined in the Telegraf configuration under the [agent] section, using the interval and flush_interval variables, so if you set the former to 1s and the latter to 10s, the Trig plugin will generate new points every second but you won't see them appear until Telegraf has flushed the data to the database every ten seconds.

The Gather() method takes a single argument, a telegraf.Accumulator, which handles the creation of new measurements based on the data you've collected. By calling the telegraf.Accumulator.AddFields(measurement, tags, fields) method. This method takes a measurement, which is a string, tags, which is a map[string]string or map with keys as both strings and values, and fields, which is a map[string]interface{}, which is a map that has string keys and anything as its fields.

Let's take a look at Trig's Gather method:

func (s *Trig) Gather(acc telegraf.Accumulator) error {
     sinner := math.Sin((s.x*math.Pi)/5.0) * s.Amplitude 
     cosinner := math.Cos((s.x*math.Pi)/5.0) * s.Amplitude 

     fields := make(map[string]interface{}) 
     fields["sine"] = sinner 
     fields["cosine"] = cosinner 

     tags := make(map[string]string) 

     s.x += 1.0 
     acc.AddFields("trig", fields, tags) 

     return nil 
}

First, we generate a new point on both our sine and cosine waves using the previous value as well as the amplitude we defined in our configuration. We then create our fields and tags and assign the appropriate information before updating our state for the next time Gather is called, before calling AddFields to generate our measurement.

Note: It's important to design good schema when you're designing a Telegraf plugin. One common issue to keep in mind is what's called "Series Cardinality", which is the number of unique series you have stored in your database. If you generate a large number of tags, for example by assigning a UUID to each measurement, you can quickly reach a large number of series which will have an impact on the memory usage and performance of the database.

The last thing required for the Trig plugin is a starting state, which we'll define in the init function of trig.go:

func init() {
     inputs.Add("trig", func() telegraf.Input { return &Trig{x: 0.0} }) 
}

This function takes the entire plugin and passes it to the Telegraf agent, which then iterates through all of the active plugins every time the collection interval elapses and calls the Gather function each time the collection interval elapses.

It's time to test! Let's re-generate our config and run Telegraf:

$ make 
$ telegraf -sample-config -input-filter trig -output-filter influxdb >> telegraf.conf.test 
$ telegraf -config telegraf.conf.test -debug

The debug output will show the plugin running and collecting metrics.

You should also write some tests for your plugin at this point. If you want to contribute your plugin upstream, this is a requirement. Tests are run every time there is a new build to catch any kinds of regressions or issues that might have been introduced. In Go, tests are fairly easy to write and native to the language. You'll write methods which pass in the testing struct and use that to assert behavior for your plugin. You can find the tests for trig.go in the corresponding trig_test.go file (here it is on GitHub).

Contributing

In addition to tests, you'll also need to create a README.md file for your plugin with information about what your plugin does. You'll also need to make sure that you have a LICENSE file and a sample of the input / output format. If your plugin has additional dependencies, you'll also want to add those. Again, if you're interested in contributing, please check out the excellent guide on GitHub.