Need help with Prometheus Counters and transient hosts (tags)

I’m having real issues aggregating Counters to give a Total Count from multiple cloud hosts, where hosts can come and go as the autoscaling kicks in.

Working with counters over transient hosts must be a common scenario, but I couldn’t find any information on the web.

For example, the query
select sum(*) FROM “measurement” WHERE time > :dashboardTime: AND time < :upperDashboardTime: GROUP BY time(:interval:) FILL(null)


and you can see that the graph goes down at 14:52 because one of the hosts is removed, but that host’s contribution should still be included in subsequent time periods, as if all the work had been done by a single host.

Any ideas?

Hello @GreySpike,
Did you perform the query before or after the host was added again?
I’m thinking you might want to use InfluxDB 2.0 and use tasks to run the query periodically.

Hi Anaisdg,

thanks for taking the trouble to respond.

These are ephemeral hosts in GCP and will never re-appear with the same hostname once destroyed by the orchestration tool.

This is a common pattern for managing hosts in the cloud - “cows not pets” - so I had hoped that there was a common solution design.

I’ve been experimenting with using fill(previous) to maintain the count during the display window, but that causes side-effects elsewhere that means that the numbers are still incorrect, from my perspective.

My next thought is to differentiate each host, then integrate the sum of all the rates, but I haven’t tried this yet.