Influxdb 2.1.1 sudden slowdown and write timeouts

phill84 · March 4, 2022, 10:34am

I have a home-grown influxdb cluster with 3 servers running influxdb 1.8.4 and 2 running influxdb 2.1.1. In front of all these influxdb servers, there is a custom application that duplicates the incoming write requests so all the servers receive exactly the same data.

Now for the past 2 weeks since I added the influxdb 2.1.1 servers, it happened twice that one of them suddenly came to a crawl and would return {"code":"internal error","message":"unexpected error writing points to database: timeout"} to almost all requests. Once the server slowed down it doesn’t recover on its own. A simple restart would “fix” this issue and after the restart it would happily process all the data that couldn’t get written. During the time these All the time the other 2.1.1 server and all the 1.8.4 ones were running just fine. The only difference between the 2.1.1 server that had this issue and the other one is that the “problematic” one is also receiving a little bit of queries, while the other one is receiving writes only.

I have enabled debug logging on both servers but unfortunately that didn’t help as everything looks quite normal (to my eyes) and more or less the same between these two 2.1.1 servers. Any suggestions/ideas on how I can investigate this strange slowdown? Thanks.

phill84 · March 4, 2022, 1:08pm

another thing that seems fishy is that ~5 hours after the influxdb 2.1.1 server started to timeout on everything, it hit the max open files limit.

ts=2022-03-04T07:27:58.157183Z lvl=info msg="http: Accept error: accept tcp 127.0.0.1:8087: accept4: too many open files; retrying in 1s" log_id=0Z~pVheG000 service=http
...
ts=2022-03-04T07:27:58.393647Z lvl=info msg="Error writing snapshot from compactor" log_id=0Z~pVheG000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot error="compaction in progress: open /data/influx/engine/data/0c0662e3701e086e/autogen/66/000003673-000000001.tsm.tmp: too many open files"

During this 5 hours CPU was mostly idle but the memory usage of influxdb was creeping up from 10GB to 55GB before the restart.

Topic		Replies	Views
Influx write fails with "internal error" InfluxDB 2	8	5152	October 4, 2021
Timeouts severe loading data InfluxDB 2 influxdb	11	4702	January 6, 2022
InfluxDB won't start - Failed to connect to streaming server influxdb	3	1776	February 12, 2021
Influxd error: "Unable to write gathered points" ... error=timeout InfluxDB 2	2	1834	February 23, 2024
Influx 2.7.11 stops responding to external requests but still accepts local writes InfluxDB 2 influxdb , performance	1	5	July 15, 2025

Influxdb 2.1.1 sudden slowdown and write timeouts

Related topics