Best way to organize influxdb data for monitoring stats for 5K docker containers?

Antonio_Aguilar · July 15, 2017, 11:33pm

I have a dashboard application that monitors the stats for approx. 5K docker containers. The dashboard application is currently using ElasticSearch and I would like to transition to InfluxDB.

Our data collection process is as follows:

containers are created in groups of 200 by an automated script, each group of containers is hosted on a high spec server, it can take several hours to bootstrap all the containers, groups and containers have unique IDs
for each container we monitor cpu, memory, network io, including measurements for the application running inside the container, each container sends measurements every 30 seconds,
we have a script that is continuously listening for container data stats and is responsible for logging data to ElasticSearch
once all the containers are running in their respective servers, we push data traffic to stress the application running inside each container
we do this for approx. 3-4 days non-stop and simulate various conditions, we do this to evaluate the performance of our application over time under various stress conditions
once our tests are completed, we shut down all containers using an automated script, then we repeat the whole process again with a new set of containers
we use the dashboard application to monitor the containers in real-time while the test is ongoing and after the test has concluded to review data, results data needs to be persisted for at least 3 months so that other teams can review it

In our current solution, for each group of containers, we create an index in ElasticSearch. Then, we organize the data for each container under that index. Then, we can easily search container stats by using the group id and container id.

So, my questions are:

should I use a single database to store all container data? then let the dashboard application filter/query the container data by group id?
should I create one database per group? and let the dashboard application query multiple databases to display container data?

Do you have any suggestions?

Regards,

Antonio.

sbains · July 19, 2017, 4:41pm

I don’t think storing data in different databases will have an advantage but the influxdb team can comment on this. It is a time series database so the relationship is with timestamps; the memory mappings should be identical. But at the same time if you have these databases in different machines it can definitely make a difference.

Antonio_Aguilar · July 19, 2017, 5:54pm

I decided to create a database per container group, each container group has about 200 containers. It was actually very straight forward to implement, the automated script first creates the database in influxdb, I used a random string as the database name (btw: influxdb does not support database names that start with a number). Then, each of the containers in the group has a reference to the group ID (in this case the database name in influxdb) and can start sending stats to influxdb. The dashboard application then can access the list of groups (database names) and perform queries to influxdb to visualize the stats.

Topic		Replies	Views
Influxdb docker container disk size	2	1815	March 25, 2020
Data seems to be scaling with source applications Telegraf	1	373	March 23, 2021
Buckets best practice, manage dashboard with multiple installations InfluxDB 2	1	557	January 11, 2021
Evaluation of thousands metrics InfluxDB 2 influxdb , performance , docker	14	1014	June 14, 2021
Basic question about storage efficiency Store influxdb	1	809	February 1, 2021

Best way to organize influxdb data for monitoring stats for 5K docker containers?

Related Topics