Plugin Modification Guidance


#1

I am interested in using the SQL Server input plugin and we currently store all of our metric data in ElasticSearch. Because of some of the plugin design choices, the output to elasticsearch is not very useful. In it’s current form you couldn’t easily see transactions per second, grouped by databases (for example) because the database name is part of the measure name. If the database name was a tag, this would all be possible.

I had an idea to add a configuration option to the plugin that would rearrange some of the data if you intended to use ElasticSearch as your output plugin. So the option might be “FormatForElasticSearch”. Is this generally frowned upon?

Thanks


#2

We wouldn’t want any options specific to a certain output, but it sounds like this might be something we want to change in general. Can you show some examples before and after in line protocol format?


#3

My general idea (for the SQL performance counters) is just to move the instance into the tags, and the object name to the beginnning of the measure.

A perf counter consists of three parts: object - counter - instance. This is a hierarchy, so it makes sense to keep things in this order.

This example is related to lock waits. Here is the current form:

Lock Waits/sec | _Total | Locks,servername=35dbdfd82180,type='Performance counters',host=pigdog value=0i 1505351841000000000
Lock Waits/sec | AllocUnit | Locks,host=pigdog,servername=35dbdfd82180,type='Performance counters' value=0i 1505351841000000000
Lock Waits/sec | Application | Locks,servername=35dbdfd82180,type='Performance counters',host=pigdog value=0i 1505351841000000000
Lock Waits/sec | Database | Locks,type='Performance counters',host=pigdog,servername=35dbdfd82180 value=0i 1505351841000000000
Lock Waits/sec | Extent | Locks,servername=35dbdfd82180,type='Performance counters',host=pigdog value=0i 1505351841000000000
Lock Waits/sec | File | Locks,type='Performance counters',servername=35dbdfd82180,host=pigdog value=0i 1505351841000000000
Lock Waits/sec | HoBT | Locks,servername=35dbdfd82180,type='Performance counters',host=pigdog value=0i 1505351841000000000
Lock Waits/sec | Key | Locks,servername=35dbdfd82180,type='Performance counters',host=pigdog value=0i 1505351841000000000
Lock Waits/sec | Metadata | Locks,servername=35dbdfd82180,type='Performance counters',host=pigdog value=0i 1505351841000000000
Lock Waits/sec | Object | Locks,servername=35dbdfd82180,type='Performance counters',host=pigdog value=0i 1505351841000000000
Lock Waits/sec | OIB | Locks,servername=35dbdfd82180,type='Performance counters',host=pigdog value=0i 1505351841000000000
Lock Waits/sec | Page | Locks,servername=35dbdfd82180,type='Performance counters',host=pigdog value=0i 1505351841000000000
Lock Waits/sec | RID | Locks,servername=35dbdfd82180,type='Performance counters',host=pigdog value=0i 1505351841000000000
Lock Waits/sec | RowGroup | Locks,servername=35dbdfd82180,type='Performance counters',host=pigdog value=0i 1505351841000000000

Above, Locks is the object, Lock Waits/sec is the counter, and _Total, AllocUnit,etc. are the instances. But in the current format the instance is incorporated into the name of the measure, and the object is at the end of the name of the measure.

I am not sure about other backends and visualization systems, but using Elasticsearch that is no way for me to graph these lock waits out, and create a series per instance. I would have to explicitly add each measure to the graph. When dealing with perf counters that use a database as the instance name, this makes things very difficult (I have over 300 databases on some instances).

If we moved the instance into the tags, and moved the object to the beginning of the measure name, it would make more logic sense, and be a lot easier to deal with when graphing:

Locks | Lock Waits/sec,instance=_Total,servername=35dbdfd82180,type='Performance counters',host=pigdog value=0i 1505351841000000000
Locks | Lock Waits/sec,instance=AllocUnit,host=pigdog,servername=35dbdfd82180,type='Performance counters' value=0i 1505351841000000000
Locks | Lock Waits/sec,instance=Application,servername=35dbdfd82180,type='Performance counters',host=pigdog value=0i 1505351841000000000
Locks | Lock Waits/sec,instance=Database,type='Performance counters',host=pigdog,servername=35dbdfd82180 value=0i 1505351841000000000
Locks | Lock Waits/sec,instance=Extent,servername=35dbdfd82180,type='Performance counters',host=pigdog value=0i 1505351841000000000
Locks | Lock Waits/sec,instance=File,type='Performance counters',servername=35dbdfd82180,host=pigdog value=0i 1505351841000000000
Locks | Lock Waits/sec,instance=HoBT,servername=35dbdfd82180,type='Performance counters',host=pigdog value=0i 1505351841000000000
Locks | Lock Waits/sec,instance=Key,servername=35dbdfd82180,type='Performance counters',host=pigdog value=0i 1505351841000000000
Locks | Lock Waits/sec,instance=Metadata,servername=35dbdfd82180,type='Performance counters',host=pigdog value=0i 1505351841000000000
Locks | Lock Waits/sec,instance=Object,servername=35dbdfd82180,type='Performance counters',host=pigdog value=0i 1505351841000000000
Locks | Lock Waits/sec,instance=OIB,servername=35dbdfd82180,type='Performance counters',host=pigdog value=0i 1505351841000000000
Locks | Lock Waits/sec,instance=Page,servername=35dbdfd82180,type='Performance counters',host=pigdog value=0i 1505351841000000000
Locks | Lock Waits/sec,instance=RID,servername=35dbdfd82180,type='Performance counters',host=pigdog value=0i 1505351841000000000
Locks | Lock Waits/sec,instance=RowGroup,servername=35dbdfd82180,type='Performance counters',host=pigdog value=0i 1505351841000000000

The reason I was thinking a config option made sense was for backward compatibility. While I think this new format would be easier to use, it would completely break things for people that have been using this plugin. If that isn’t a concern, the changes could all be made in the embedded TSQL code in the plugin.

Please tell me if you need any more detail, or examples.


#4

If we were modeling this today I would probably start with something like this (omitting some counters):

sqlserver_locks,servername=35dbdfd82180,type='Performance counters',host=pigdog _total_waits_per_sec=0i,alloc_unit_waits_per_sec=0i,application_waits_per_sec=0i,database_waits_per_sec=0i 1505351841000000000

I’m not sure if this would look to different from what is visible in SQLServer, there are advantages to having it match closely to the source, but this would be a more typical layout to what most plugins produce.

Overall though what you are proposing is an improvement for all outputs, and we can make changes to the output format so long as we provide a way to ease people over to it. We should consider if there are any other changes we might like to make at the same time, since we don’t want to be frequently tweaking the output, and then we can have an option like: metric_version = 2.

Can you open a feature request issue on the github page and then we can discuss this in more depth.


#5

Sounds good, I’ll get that started today.


#6

Created: https://github.com/influxdata/telegraf/issues/3233 I think it gives us enough info to get started, I linked to this thread as well.