Thank you so much for your previous and also that post. Already based on your previous one I could manage to achieve what I was looking for: removing series from the database. No clue how I came up with the ´DELETE FROM´ instead of the (successful) `DROP SERIES´.
I (partly auto-)generated (and carefully reviewed twice) a .txt file with one DROP SERIES statement per line and used it with influx -precision rfc3339 -database 'homeassistant' -username <uname> -password <pwd> < "/share/deletion_list.txt"
. It contains ~ 600 lines and while the CPU keeps calm it actually stresses the disk quite a lot:
While the bulk deletion is running I can watch the progress indirectly by using
SELECT numSeries FROM "_internal"."monitor"."database" WHERE "database"='homeassistant' ORDER BY time DESC LIMIT 1
which gives the `numSeries". And that is reaaaally slowly decreasing. Because of everything you said: hot and cold, takes some time to find and purge. So your post really helped me actually (better) understand how it works under the hood.
That last question while waiting for the bulk deletion to complete might be a little off-topic but of HUGE interest to me (only asking here as you seem to be very skilled):
Are you aware of a ´SELECT´ statement (or any other possibility) to get a list of series (and also measurements) with the “most data” (in terms of records or storage used), starting with the largest set and sorting the list decreasingly?
In a classic SQL I would do this, unfortunately I’m not skilled enough to transfer this syntax to something working for InfluxDB:
SELECT entity_id, COUNT(*) as count FROM states GROUP BY entity_id ORDER BY count DESC LIMIT 100
This way I’d like to identify which series are (actually or very likely, based on the storage data) consuming the most disk space. Actual problem: database grew really REALLY big and I need to sort out things. With that list I could also decide which of those “top scorers” actually need to be stored in the InfluxDB at all.