I am getting very high cardinality from the docker input measurements on AWS, where the lifecycle of containers are short. Some cron type, jobs, and crash/restart scenarios. Running in AWS using the Container service, makes a few of the tag values random each time a container starts.
Top measurements:
20471 docker_container_blkio
15889 docker_container_cpu
5103 docker_container_mem
5012 docker_container_net
I can see that a couple tags that are added based on docker labels will increase the cardinality:
com.amazonaws.ecs.task-definition-version
com.amazonaws.ecs.task-arn
I can exclude these via the configuration, so that will reduce cardinality somewhat. The container name is another issue as it contains a generated identifier for example “container_name=ecs-ion-integration-49-receptionist-c6f18886a79fc4ba2f00”
It appears in code that the input simply adds the container name as a tag.
Does anyone have a solution to this?
I’m thinking the only way to do this is to add a generic tag exclusion, not just one on labels? Something like “tag_exclude” in the configuration that would remove any tags listed before the measurement is pushed?
I would at the very least make sure you exclude the provider specific docker tags you show above. I’ve run into that issue with our Kubernetes integration.
This post is about a year old; hopefully it’ll get some visibility. I’m running into a similar issue with Docker Swarm. I believe the cardinality issues I’m running up against have to do with the way Docker Swarm names containers when they are deployed as part of a Service within a Stack.
For example, when a container for a service named app within a stack named api is deployed, the container is named something like: api_app.1.77h3ypzz1jtnm6uxmu7qdto93. For services with more than one replica, this expands out to api_app.2.[...], api_app.3.[...], api_app.4.[...], etc.
When this is happening across dozens of stacks, each comprised of a handful of services, this alone adds up. Throw in a stack that iterates multiple times a day, and cardinality starts taking a hit quickly. This blew out our max cardinality on InfluxDB last week and all stats came to a grinding halt. So, what to do?
I have two questions:
What’s the recommended way of handling this? Is using tagexclude still the best way to go about this?
and
In my case I’m less concerned about the container’s unique id. In the example, above, getting stats for api_app.1 instead of the id-suffixed container name above is plenty fine for my metrics needs. Is there a way for the inputs.docker plugin to take care of this manipulation? While tagexclude will remove tags altogether, I still want the container name, but a part of the container’s name is causing the cardinality to blow up.
Below is a screenshot showing a small handful of the service ids currently in my database: