Hi there,
I’m trying to understand what happened with my InfluxDB instance because all data prior to Feb 4 disappeared, and I suspect it may be related to changes I made to the retention policies.
My version:
InfluxDB shell version: 1.8.10
System context
Earlier that day some system maintenance was done:
475 2026-02-04 07:54:54 : apt update
478 2026-02-04 07:55:26 : apt upgrade
483 2026-02-04 08:01:18 : apt autoremove
InfluxDB logs
Later in the logs I can see this error related to disk space:
Feb 04 09:30:00 monitor influxd-systemd-start.sh[521]: ts=2026-02-04T08:30:00.005840Z lvl=info msg="Executing query" log_id=0zVLdLXl000 service=query query="SELECT mean(used_percent) AS used_percent FROM telegraf_intern.autogen.disk WHERE (host =~ /^proxmox$/ AND fstype = 'ext4') AND time >= now() - 6h AND time <= now() - 30s GROUP BY time(1m), path"
Feb 04 09:30:00 monitor influxd-systemd-start.sh[521]: ts=2026-02-04T08:30:00.306211Z lvl=info msg="Error writing snapshot" log_id=0zVLdLXl000 engine=tsm1 error="error opening new segment file for wal (1): write /var/lib/influxdb/wal/telegraf/autogen/1709/_00024.wal: no space left on device"
Feb 04 09:30:01 monitor influxd-systemd-start.sh[521]: ts=2026-02-04T08:30:01.306270Z lvl=info msg="Error writing snapshot" log_id=0zVLdLXl000 engine=tsm1 error="error opening new segment file for wal (1): write /var/lib/influxdb/wal/telegraf/autogen/1709/_00024.wal: no space left on device"
So at that moment the disk was full.
Investigation commands
Later in the timeline, I found this in the history:
470 2026-02-04 08:18:09 : ncdu /
This particular one was earlier, during maintenance, but I don’t think anyone deleted the information from using -d in the ncdu interface.
460 2026-02-04 10:47:11 : influx
461 2026-02-04 10:47:37 : df -h
462 2026-02-04 10:52:58 : du -sh /var/lib/influxdb/data/*
463 2026-02-04 10:54:14 : du -sh /var/lib/influxdb/data/telegraf/* | sort -h
464 2026-02-04 10:54:14 : du -sh /var/lib/influxdb/data/telegraf_intern/* | sort -h
465 2026-02-04 10:54:39 : du -sh /var/lib/influxdb/data/telegraf/autogen/* | sort -h
466 2026-02-04 10:54:39 : du -sh /var/lib/influxdb/data/telegraf_intern/autogen/* | sort -h
Retention policy changes
“Scrolling up” in the InfluxDB shell history, these commands were executed (in this order). Unfortunately I cannot know the when or how much time passed between them.
(I mean that I cannot know either the day or the time when they were executed in relation to the rest of the logs)
USE telegraf;
CREATE RETENTION POLICY "18months" ON "telegraf" DURATION 78w REPLICATION 1 DEFAULT;
USE telegraf_intern;
CREATE RETENTION POLICY "18months" ON "telegraf_intern" DURATION 78w REPLICATION 1 DEFAULT;
USE icinga2;
CREATE RETENTION POLICY "18months" ON "icinga2" DURATION 78w REPLICATION 1 DEFAULT;
SHOW RETENTION POLICIES ON telegraf;
SHOW RETENTION POLICIES ON telegraf_intern;
SHOW RETENTION POLICIES ON icinga2;
DROP RETENTION POLICY "autogen" ON telegraf;
DROP RETENTION POLICY "autogen" ON telegraf_intern;
DROP RETENTION POLICY "dades_18_mesos" ON telegraf_intern;
DROP RETENTION POLICY "autogen" ON icinga2;
What I was trying to do
I noticed there were several retention policies:
-
autogen -
dades_18_mesos -
others
InfluxDB seemed to be storing more than 18 months of data, so I assumed the policies might be overlapping.
My reasoning was:
-
autogenhas infinite retention -
dades_18_mesoshad 18 months for example
So I thought that since autogen was infinite, it effectively included the 18-month data anyway.
Because of that, I:
-
Created new 18-month retention policies
-
Set them as DEFAULT
-
Then dropped the
autogenpolicies
I assumed that data within the new 18-month policies would remain, and only the rest would be deleted.
Problem
All data prior to Feb 4 disappeared ![]()
Question
Did dropping the autogen retention policy delete all the data that was stored under it?
Even though I created the new 18months retention policy before dropping autogen, could that still have caused the loss of historical data?
Or could the disk full situation have played a role in this? or even the “upgrade” ¿?
Any clarification about how InfluxDB handles retention policy deletion and data migration (or lack of it) would be very helpful
