InfluxDB schema - new Series, Measurement, Database or Server?

schema
influxdb
#1

I want to collect:

  • Server KPIs (cpu/ram/disk/etc);
  • Web Access Logs (method/path/status/etc);
  • Services up-status (serviceX is running, serviceY is down, etc);

I am already collecting Server KPIs into an InfluxDB server similar to:

InfluxDB
--> database_xyz
----> measurement_CPU
------> tag layer: "web"
        tag server: "webserver-1"
        value cpu_idle: 93
        value cpu_something: 4
----> measurement_RAM
------> tag layer: "web"
        tag server: "webserver-1"
        value RAM_Free: 65536

Now I want to also collect, say, the Web Access Logs but I’m unsure how exactly InfluxDB is meant to be structured.

1) Should we add everything to the same measurement but filtered by Tags?

2) Should we create a different Measurement per … type of log (ex: ServerKPIs, WebAccessLogs) or per type of sub-log (ex: ServerKPI-CPU, ServerKPI-RAM)?

3) Should we create a different Database altogether per type of log?

4) Should we create a different InfluxDB server per type of log?


  1. This just feels so clunky and ugly… However, looks to me completely feasible and I can’t think of any technical drawbacks. Can this give me the best flexibility/ease to in the future process the data (ex: CPU used per Request/s)?

  2. Feels pretty standard to me, which is why I originally went with this structure. Am I losing any kind of ability to combine data?

  3. Seems quite reasonable to separate such disparate logs, after all, they really are different logs. Would I lose ability to combine logs (ex: CPU used per Request/s)?

  4. If I just create a new InfluxDB server (same machine), later scaling the system to remove bottlenecks should be quite easy, by moving each InfluxDB service into its own dedicated machine. Would I still be able to combine logs, in the likes of Grafana?

#2

@paulo I would say option 2 would be your best bet. But I would suggest you look at the way telegraf stores this same data. It might also save you a bunch of time. There is a logparser plugin that might help out. There are also cpu and mem plugins along with around 100 other ones.

#3

@jackzampolin,

What about retention policies? What if I want a different retention policy for Server KPIs and Web Access Logs? I’m not able to assign RPs to measurements, right? I’m thinking options #2 and #3 are probably the best ones and I feel I’m being too perfectionist but theoretically it’s still an interesting issue to determine.


I am indeed using Telegraf to collect Server KPIs, which is itself kind of a cross-breed option #2. I tried using Telegraf to also collect Web Access Logs with the input plugin LogParser but turned out it was missing a critical feature to avoid Series explosion and I made a Feature Request for it:

I ended up going with LogStash instead (unfortunately as now I require 2 tools), uploading to InfluxDB, although that one was also missing a critical feature too, so I created another Feature/Bug Request for it too:

So I ended up in the old classic, LogStash + ElasticSearch :frowning:. I wanted one DB and one collector… I’m still hoping that the LogStash InfluxDB output plugin will end up “fixing” that issue and release me from having to rely on an ElasticSearch DB but for now I’m stuck.

#4

@paulo How many series were you creating? “Series explosion” shouldn’t be a big issue after tsi gets merged.

What would you need in the logparser to make it work for your usecase?

#5

I would also go with option 2. It keeps log data in one database, but separates different types of logs by putting them in different measurements.

If you want to parse different log files and store them in different measurement, then you would have to define the [inputs.logparser] as many times as the number of different log files you want to parse. Because, at one time, the [inputs.logparser] plugin can only put the parsed logs under one measurement.

 [[inputs.logparser]]
  files = ["/var/log/nginx/api_access.log"]
     [inputs.logparser.grok]
      patterns = ["%{COMBINED_LOG_FORMAT}"]
       measurement = "api_access.log"

You would repeat this input plugin, and store different file under different measurement. This is the only way to put different log files under different measurements.

#6

Hi Luv, what do you think of Retention policies? They’re per Database, correct? With Option #2 I won’t be able to set different RPs to different kinds of logs. Not sure that bothers me all that much, though.

#7

Hi Jack.


How many series were you creating?

In terms of number of Series, many of the URLs have a Session ID in the PATH, so it would be pretty much one whole set of Series each time a user gets to the website that day.

To be honest, though, I don’t know how impacting this would be to the current engine. But it appears to be strongly discouraged and, according to what I can understand of the engine, it appears to be highly undesirable. Could that be something that the DB engine could easily live with!? We don’t want to implement something that’s discouraged only to find ourselves in a few weeks in a deep hole that I dug for ourselves with a bad design.


“Series explosion” shouldn’t be a big issue after […]

In terms of new upcoming engine, we cannot wait on «promises of the future». We need the logs «today», not someday in the future that might not even come to be. As I said, I don’t really know how impacting that «Series explosion» would be to the server, but if it is substantially negative to the engine as it is now, then we have to work around it in some way.


What would you need in the logparser to make it work for your usecase?

At the moment, we would need Telegraf’s logparser to be able to do these 2 things:

# Transform data #
For example, be able to transform this:
/getPersonalAccount/324651-1234-1234-1234-123456678/
Into this:
/getPersonalAccount/{GUID}/

# Aggregate or resolve time-conflicted lines #
We have some high-traffic web servers which serve many URLs which are equal in all “Tags” happening at the exact same second. In InfluxDB they will all be lost but one, losing the ability to calculate frequency of Requests/s and proper statistics of Average/percentiles/min/max/etc Response Times.

I would need Telegraf to have the ability to either aggregate the requests with strong statistical capabilities, or the ability to resolve those time-bucket conflicts by detecting the conflict and adding a conflict-resolving Tag.

#8

@paulo A single instance can handle ~5M series pretty comfortably, so there is some headroom depending on how many series you are creating daily and how long you want your retention period to be. Currently each series key is stored in memory to look up the array of values on disk. That means the more series, the larger the RAM requirement. One way a lot of folks deal with this is having the high cardinality data downsampled into much lower cardinality data.

The promises of the future are only there because you could start collecting your data now at a lower retention period, and once the changes come in lengthen the retention period to infinite.

The logparser can currently do that, you just have to write the proper parsing rules to enable that behavior.

Time conflicted lines are best taken care of by properly tagging the data to avoid timestamp collisions. Session_id would take care of that requirement. Also you could increase the precision of the emitted timestamps. I would say that arbitrarily adding a tag is a bad idea as a way to resolve those conflicts.

#9

Hi Jack,

The promises of the future are only there because

Just to make sure, I didn’t mean any kind of critic to your comment, I was just speaking literally.

I’ve grown quite bitter and skeptical of promises from the likes of “next version all bugs will be fixed, all imaginary features will be there including curing cancer, just stick with us another year, another year, another year, ad infinitum”.

^ you can probably guess some of the high profile company names I’m thinking of who’ve been fooling us for decades.


The logparser can currently do that, you just have to write the proper parsing rules to enable that behavior.

Are you sure that’s the case? I couldn’t find anything and ended opening a Feature Request with them:
https://github.com/influxdata/telegraf/issues/2667
Which was apparently accepted, hinting me that’s not yet possible!?

Maybe that feature request is then useless. How are you thinking that can be done? Can you give me some kind of example?


Time conflicted lines are best taken care of by properly tagging the data to avoid timestamp collisions. Session_id would take care of that requirement. […] I would say that arbitrarily adding a tag is a bad idea as a way to resolve those conflicts.

I’m a bit confused. You’ve mentioned it’s best to avoid conflicts by tagging but … that adding a tag to resolve conflicts is a bad idea? In InfluxDB’s documentation it’s specified there that if we need to resolve timestamp conflicts that we should add an extra Tag for it.

Also, you seem to be referring to the idea of having the likes of Session ID in some Tags as something desirable. Isn’t that highly or at least moderately undesirable? It would create a huge amount of Series caused by a variable (which theoretically should then be used as a Field instead of as a Tag) and make most of the Series exist merely for periods of 1~20 minutes.