Influxdb V1.7.6
Seeing huge memory spikes roughly every hour(see Image) which results in OOM and Influx being killed.

Memory is increasing from 14Gb to 47GB and causing OOM killer to kick in and kill Influxdb
Problem can be rerpoduced by doing a ‘drop shard ’
Retention policy is below;-
show retention policies;
name duration shardGroupDuration replicaN default
cmk_retention 2160h0m0s 24h0m0s 1 true
Also noticed that I appear to have an orphaned shard which can be seen in /influxdb/data/cmk/cmk_retention but does not show up in ‘show shard groups’. If I attempt to drop this shard memory can be seen to increase as mentioned above, but shared fails to be removed;-
drop shard 666;
ERR: no data received
Can this shard simply be deleted from /influxdb/data/cmk/cmk_retention?
and why does ‘drop shard’ result in massive memory usage?
Thanks
Mark
1 Like
I am seeing a similar issue too. Using the same version of InfluxDB (v1.7.6).
Though I did not drop the shard manually.
@markdollemore Did you try deleting the shard from the filesystem? We are encountering the same problem with InfluxDB 1.7.3 and 1.7.9.
It only happens with some of the shards. For the same database and retention policy, the one shard can easily be removed using the DROP SHARD command and then the next one triggers an OOM.
Hi,
I am having same problem and because of this data the “max-values-per-tag” keeps reaching limit. I confiugred retention policy to remove data after 1 hour and also I have a CQ that runs every 10 min to downsample the data from one_day -> one_week
Almost after every 2 hours, graphs in Grafana start showing almost no data. To fix this, I have to login to server, remove shard then restart influxdb service to make it work.
Is there any work around for this
Below is my config for database in influx
name duration shardGroupDuration replicaN default
---- -------- ------------------ -------- -------
autogen 0s 168h0m0s 1 false
one_day 1h0m0s 1h0m0s 1 true
one_week 168h0m0s 24h0m0s 1 false
id database retention_policy shard_group start_time end_time expiry_time owners
-- -------- ---------------- ----------- ---------- -------- ----------- ------
117 xxxxxxxx one_day 117 2020-01-26T09:00:00Z 2020-01-26T10:00:00Z 2020-01-26T11:00:00Z
118 xxxxxxxx one_day 118 2020-01-26T10:00:00Z 2020-01-26T11:00:00Z 2020-01-26T12:00:00Z
119 xxxxxxxx one_day 119 2020-01-26T11:00:00Z 2020-01-26T12:00:00Z 2020-01-26T13:00:00Z
120 xxxxxxxx one_day 120 2020-01-26T12:00:00Z 2020-01-26T13:00:00Z 2020-01-26T14:00:00Z
114 xxxxxxxx one_week 114 2020-01-26T00:00:00Z 2020-01-27T00:00:00Z 2020-02-03T00:00:00Z
Current system DateTime
Sun Jan 26 11:51:20 UTC 2020
Regards,
Mudasir Mirza.
I have tested and it does indeed work to delete the shard folder from the file system. After re-starting InfluxDB the DROP SHARD command can then be run and it will now complete without causing an OOM and successfully remove all traces of the shard.
I have done this twice on separate occasions and it worked perfectly every time. However, I would recommend making a backup of the InfluxDB storage filesystem before attempting this - just in case.