Drop Shard and Retention policy deletion check causes massive memory spike and OOM

markdollemore · August 28, 2019, 1:30pm

Influxdb V1.7.6

Seeing huge memory spikes roughly every hour(see Image) which results in OOM and Influx being killed.

Memory is increasing from 14Gb to 47GB and causing OOM killer to kick in and kill Influxdb

Problem can be rerpoduced by doing a ‘drop shard ’

Retention policy is below;-

show retention policies;
name duration shardGroupDuration replicaN default

cmk_retention 2160h0m0s 24h0m0s 1 true

Also noticed that I appear to have an orphaned shard which can be seen in /influxdb/data/cmk/cmk_retention but does not show up in ‘show shard groups’. If I attempt to drop this shard memory can be seen to increase as mentioned above, but shared fails to be removed;-

drop shard 666;
ERR: no data received

Can this shard simply be deleted from /influxdb/data/cmk/cmk_retention?

and why does ‘drop shard’ result in massive memory usage?

Thanks

Mark

RajKiranS · December 20, 2019, 10:26am

I am seeing a similar issue too. Using the same version of InfluxDB (v1.7.6).

Though I did not drop the shard manually.

rudolfv · January 3, 2020, 3:29pm

@markdollemore Did you try deleting the shard from the filesystem? We are encountering the same problem with InfluxDB 1.7.3 and 1.7.9.

It only happens with some of the shards. For the same database and retention policy, the one shard can easily be removed using the DROP SHARD command and then the next one triggers an OOM.

mudasirmirza · January 26, 2020, 12:01pm

Hi,

I am having same problem and because of this data the “max-values-per-tag” keeps reaching limit. I confiugred retention policy to remove data after 1 hour and also I have a CQ that runs every 10 min to downsample the data from one_day -> one_week

Almost after every 2 hours, graphs in Grafana start showing almost no data. To fix this, I have to login to server, remove shard then restart influxdb service to make it work.

Is there any work around for this

Below is my config for database in influx

name     	duration 	shardGroupDuration 	replicaN 	default
----     	-------- 	------------------ 	-------- 	-------
autogen  	0s       	168h0m0s           	1        	false
one_day  	1h0m0s   	1h0m0s             	1        	true
one_week 	168h0m0s 	24h0m0s            	1        	false


id  	database   retention_policy 	shard_group 	start_time           	end_time             	expiry_time          owners
--  	--------   ---------------- 	----------- 	----------           	--------             	-----------          ------
117 	xxxxxxxx   one_day          	117         	2020-01-26T09:00:00Z 	2020-01-26T10:00:00Z 	2020-01-26T11:00:00Z
118 	xxxxxxxx   one_day          	118         	2020-01-26T10:00:00Z 	2020-01-26T11:00:00Z 	2020-01-26T12:00:00Z
119 	xxxxxxxx   one_day          	119         	2020-01-26T11:00:00Z 	2020-01-26T12:00:00Z 	2020-01-26T13:00:00Z
120 	xxxxxxxx   one_day          	120         	2020-01-26T12:00:00Z 	2020-01-26T13:00:00Z 	2020-01-26T14:00:00Z
114 	xxxxxxxx   one_week         	114         	2020-01-26T00:00:00Z 	2020-01-27T00:00:00Z 	2020-02-03T00:00:00Z


Current system DateTime 
Sun Jan 26 11:51:20 UTC 2020

Regards,
Mudasir Mirza.

rudolfv · April 24, 2020, 9:18am

I have tested and it does indeed work to delete the shard folder from the file system. After re-starting InfluxDB the DROP SHARD command can then be run and it will now complete without causing an OOM and successfully remove all traces of the shard.

I have done this twice on separate occasions and it worked perfectly every time. However, I would recommend making a backup of the InfluxDB storage filesystem before attempting this - just in case.

Topic		Replies	Views
Out of memory while Retention Policy runs Welcome & Getting Started influxdb	3	904	June 17, 2020
I know this influx is some sort of hobby project, but can you at least make sure data is being deleted InfluxDB 2	0	331	March 6, 2022
Shards not getting removed as the same shards got dropped in influxdb	2	773	June 30, 2021
Retention policy deletion check let the memory explode influxdb	0	1202	September 12, 2018
Why are shards not removed from the filesystem? And also are still being read into memory on start-up Store influxdb	1	1183	December 13, 2019

Drop Shard and Retention policy deletion check causes massive memory spike and OOM

Related topics