My InfluxDB v2.3 OSS process on Ubutu 22.04 seems stuck after repeated hours long writing operation (millions of rows for each measurement, >1000 mms) using the python influxdb-client (v1.29.1) with following code/settings:
with InfluxDBClient(timeout=120000, verify_ssl=False, enable_gzip=True) as client_v2:
with client_v2.write_api(write_options=WriteOptions(batch_size=5_000, flush_interval=10_000)) as inf2_write_api:
After working several hours my script running machine started to issue more and more often the following errors in the prompt (I had to translate some from German to English):
The retriable error occurred during request. Reason: ‘(‘Connection aborted.’, ConnectionResetError(10054, ‘An existing connection was closed by the remote host.’, None, 10054, None))’.
The retriable error occurred during request. Reason: ‘<urllib3.connection.HTTPSConnection object at 0x000001CD2836BC40>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it’.
The batch item wasn’t processed successfully because: HTTPSConnectionPool(host=‘xxx.xx.xxx.xx’, port=8086): Max retries exceeded with url: /api/v2/write?o…
Failed to establish a new connection: [WinError 10065] Host could not be reached at a socket process
Failed to establish a new connection: [WinError 10060] A connection failed, because target did not respond after a certain time, or the established connection was faulty, because the connected host did not respond.'.
Rebooting the server, stopping and restarting the process with “sudo service influxdb stop/start” did not help. Checking the process status with “sudo service influxdb status” gives out the following:
Looking at the activation time I assume influx starts and again:
influxdb.service - InfluxDB is an open-source, distributed, time series datab>
Loaded: loaded (/lib/systemd/system/influxdb.service; enabled; vendor pres>
Active: activating (start) since Mon 2022-09-05 15:44:24 CEST; 1min 17s ago
Active: activating (start) since Mon 2022-09-05 15:45:55 CEST; 1min 2s ago
Active: activating (start) since Mon 2022-09-05 15:47:25 CEST; 4s ago
curl profiles shows following:
curl -o profiles.tar.gz "https://localhost:8086/debug/pprof/all?cpu=30s"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (7) Failed to connect to localhost port 8086 after 0 ms: Connection refused
It looks like your DB is not reachable. You can test connection to your db by something like:
curl: (7) Failed to connect to localhost port 8086 after 0 ms: Connection refused
The hardware might be not appropriate for the large writing operation. But it was a single event, max 1-2 writing operations at the same time, worked apparently stable for several hours, though it took several minutes for each measurement. And still I would not expect the process to collapse entirely.
It look like the machine tries to start and activate the process again and again. It looks like it just started anytime I check the service status:
● influxdb.service - InfluxDB is an open-source, distributed, time series database
Loaded: loaded (/lib/systemd/system/influxdb.service; enabled; vendor preset: enabled)
Active: activating (start) since Wed 2022-09-07 09:52:38 CEST; 48s ago
Docs: https://docs.influxdata.com/influxdb/
Cntrl PID: 594547 (influxd-systemd)
Tasks: 2 (limit: 9460)
Memory: 20.5M
CPU: 1min 8.290s
CGroup: /system.slice/influxdb.service
├─594547 /bin/bash -e /usr/lib/influxdb/scripts/influxd-systemd-start.sh
└─594691 sleep 1
Sep 07 09:53:18 computer_name.provider.net influxd-systemd-start.sh[594547]: InfluxDB API at https://localhost:8086/ready unavailable after
@bednar, I’m about to build up a new server. I am just afraid that it might end up like with first server.
Monitoring InfluxDB process
What would you recommend to monitor? Or how can I make sure not to overload the server?
Writing large amounts of data (e.g. dataframe with 3 million rows). @bednar I’ve seen your example on Github for ingesting a large dataframe. There you use the plain write_api() wo any non default WriteOptions. InfluxDB doc recommends however a batch_size=5000 and gzip compression. Is there any best practice to date?