Delete Timeouts on Influx 2.7.4

Steven_Rychlik · March 14, 2024, 7:19pm

Hi everyone,

We are running into an issue on Influx 2.7.4 running on a Ubuntu Linux VM.
After some inconsistent duration of time, delete operations will start to time out and the CPU usage will gradually increase.
Screen Shot 2024-03-14 at 2.12.28 PM

We are accessing Influx from a C# application and there are no intermediate network proxies between the application and influx.

Investigating the logs and metrics shows all recorded delete operations finishing in a relatively short time and no obvious errors.

http_api_request_duration_seconds_bucket{handler=“platform”,method=“POST”,path=“/api/v2/delete”,response_code=“204”,status=“2XX”,user_agent=“unknown”,le=“0.025”} 105993
http_api_request_duration_seconds_bucket{handler=“platform”,method=“POST”,path=“/api/v2/delete”,response_code=“204”,status=“2XX”,user_agent=“unknown”,le=“0.05”} 117921
http_api_request_duration_seconds_bucket{handler=“platform”,method=“POST”,path=“/api/v2/delete”,response_code=“204”,status=“2XX”,user_agent=“unknown”,le=“0.1”} 120840
http_api_request_duration_seconds_bucket{handler=“platform”,method=“POST”,path=“/api/v2/delete”,response_code=“204”,status=“2XX”,user_agent=“unknown”,le=“0.25”} 122566
http_api_request_duration_seconds_bucket{handler=“platform”,method=“POST”,path=“/api/v2/delete”,response_code=“204”,status=“2XX”,user_agent=“unknown”,le=“0.5”} 122990
http_api_request_duration_seconds_bucket{handler=“platform”,method=“POST”,path=“/api/v2/delete”,response_code=“204”,status=“2XX”,user_agent=“unknown”,le=“1”} 123320
http_api_request_duration_seconds_bucket{handler=“platform”,method=“POST”,path=“/api/v2/delete”,response_code=“204”,status=“2XX”,user_agent=“unknown”,le=“2.5”} 123394
http_api_request_duration_seconds_bucket{handler=“platform”,method=“POST”,path=“/api/v2/delete”,response_code=“204”,status=“2XX”,user_agent=“unknown”,le=“5”} 123396
http_api_request_duration_seconds_bucket{handler=“platform”,method=“POST”,path=“/api/v2/delete”,response_code=“204”,status=“2XX”,user_agent=“unknown”,le=“10”} 123396
http_api_request_duration_seconds_bucket{handler=“platform”,method=“POST”,path=“/api/v2/delete”,response_code=“204”,status=“2XX”,user_agent=“unknown”,le=“+Inf”} 123396

Since these metrics are written after the web request response is written, it leads me to believe the timeouts are delete requests that somehow get stuck internally in Influx and never get back to the web layer, and then consume more CPU as something internal to the system is babysitting more and more stuck requests / goroutines.

Our only workaround right now is to periodically restart influx, but this doesn’t mitigate the problem entirely.

Has anyone seen similar behavior and have a suggestion on a fix or workaround?

Thanks,
-Steven

Anaisdg · March 21, 2024, 8:27pm

Hello @Steven_Rychlik,
First I want to apologize for the delay. I was out of office.
Hmmmm how large are the deletes that you are trying to perform? What does the delete query look like? The distribution of data across shards can impact delete operations’ efficiency. Is VM throttling a concern? Have you changed any of InfluxDB’s cache sizes and write settings.
Here are config options:

You could try different TMS settings:

That being said InfluxDB 2.x is notorious for problematic deletes although I usually hear concerns being related to memory consumption not CPU.
Are you running other tasks as well?

Steven_Rychlik · March 22, 2024, 8:46pm

Thanks for the feedback.

The deletes are usually capped at 24 hours of 1 second data (often smaller).
It is usually something like
delete
start = {}
stop = {}
predicate = measurement = {} and tag = {a}

It’s likely that the data for various delete statements are in different shards, but probably not for a single delete. We keep all data in one bucket.

We haven’t changed cache sizes.

There are many other influx queries happening during this time, but nothing else is happening on the VM.

Topic		Replies	Views
Client timeout issue InfluxDB 1 influxdb	2	1309	March 15, 2022
InfluxDB taking more than 30 minutes to delete small amounts of data InfluxDB 1 influxdb	0	529	July 6, 2023
Lost metrics due to RequestTimedOutError Client SDKs client-libraries , javascript	6	2542	September 13, 2021
Newbie cannot delete bucket via GUI and C# api InfluxDB 2 influxdb	1	649	November 18, 2024
Context deadline exceeded (Client.Timeout exceeded while awaiting headers) InfluxDB 2	2	11446	June 8, 2021

Delete Timeouts on Influx 2.7.4

Related topics