InfluxDBv2.3 (Ubuntu 22.04) is stuck after large writing operation (python)

My InfluxDB v2.3 OSS process on Ubutu 22.04 seems stuck after repeated hours long writing operation (millions of rows for each measurement, >1000 mms) using the python influxdb-client (v1.29.1) with following code/settings:

with InfluxDBClient(timeout=120000, verify_ssl=False, enable_gzip=True) as client_v2:
with client_v2.write_api(write_options=WriteOptions(batch_size=5_000, flush_interval=10_000)) as inf2_write_api:

After working several hours my script running machine started to issue more and more often the following errors in the prompt (I had to translate some from German to English):

  • The retriable error occurred during request. Reason: ‘(‘Connection aborted.’, ConnectionResetError(10054, ‘An existing connection was closed by the remote host.’, None, 10054, None))’.
  • The retriable error occurred during request. Reason: ‘<urllib3.connection.HTTPSConnection object at 0x000001CD2836BC40>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it’.
  • The batch item wasn’t processed successfully because: HTTPSConnectionPool(host=‘xxx.xx.xxx.xx’, port=8086): Max retries exceeded with url: /api/v2/write?o…
  • Failed to establish a new connection: [WinError 10065] Host could not be reached at a socket process
  • Failed to establish a new connection: [WinError 10060] A connection failed, because target did not respond after a certain time, or the established connection was faulty, because the connected host did not respond.'.

Rebooting the server, stopping and restarting the process with “sudo service influxdb stop/start” did not help. Checking the process status with “sudo service influxdb status” gives out the following:

CGroup: /system.slice/influxdb.service
├─123968 /bin/bash -e /usr/lib/influxdb/scripts/influxd-systemd-start.sh
└─124087 sleep 1

Sep 05 10:37:33 computer_name.provider.net influxd-systemd-start.sh[123969]: ts=2022-09-05T08:37:32.736459Z lvl=info msg=“Opened shard” log_id=0ckC_v6G000 service=storage-engine service=>
Sep 05 10:37:33 computer_name.provider.net influxd-systemd-start.sh[123968]: InfluxDB API at https://localhost:8086/ready unavailable after 20 attempts…
Sep 05 10:37:34 computer_name.provider.net influxd-systemd-start.sh[123968]: /usr/lib/influxdb/scripts/influxd-systemd-start.sh: line 28: 123969 Killed /usr/bin/influxd
Sep 05 10:37:34 computer_name.provider.net influxd-systemd-start.sh[123968]: InfluxDB API at https://localhost:8086/ready unavailable after 21 attempts…
Sep 05 10:37:35 computer_name.provider.net influxd-systemd-start.sh[123968]: InfluxDB API at https://localhost:8086/ready unavailable after 22 attempts…
Sep 05 10:37:36 computer_name.provider.net influxd-systemd-start.sh[123968]: InfluxDB API at https://localhost:8086/ready unavailable after 23 attempts…

The default Influx config.toml is changed with added SSL cert and key options. Self signed cert.

Can I provide additional information?
What did I do wrong?
Is there any chance to save that?
I would be grateful for any help!

Looking at the activation time I assume influx starts and again:

  • influxdb.service - InfluxDB is an open-source, distributed, time series datab>
    Loaded: loaded (/lib/systemd/system/influxdb.service; enabled; vendor pres>
    Active: activating (start) since Mon 2022-09-05 15:44:24 CEST; 1min 17s ago
  • Active: activating (start) since Mon 2022-09-05 15:45:55 CEST; 1min 2s ago
  • Active: activating (start) since Mon 2022-09-05 15:47:25 CEST; 4s ago

@bednar do you have any insight here? Thank you for your help!

Sorry, I did not know where to place it best, so opened also a bug issue on GitHub, with some additional information. Maybe it helps: InfluxDBv2.3 (Ubuntu 22.04) refuses connections after large recurring writing operation (python client) · Issue #23712 · influxdata/influxdb · GitHub

With some advice I could do more analysis. Would be really happy if it is possible to save the database and especially all the data. Thank you.

Hi @paparovich,

curl profiles shows following:
curl -o profiles.tar.gz "https://localhost:8086/debug/pprof/all?cpu=30s"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (7) Failed to connect to localhost port 8086 after 0 ms: Connection refused

It looks like your DB is not reachable. You can test connection to your db by something like:

curl -i -X GET http://localhost:8086/ping

Is your DB properly sized? For more info see - Hardware sizing guidelines | InfluxDB OSS 1.8 Documentation

Regards

Hi @bednar, curl gives me:

curl: (7) Failed to connect to localhost port 8086 after 0 ms: Connection refused

The hardware might be not appropriate for the large writing operation. But it was a single event, max 1-2 writing operations at the same time, worked apparently stable for several hours, though it took several minutes for each measurement. And still I would not expect the process to collapse entirely.

It look like the machine tries to start and activate the process again and again. It looks like it just started anytime I check the service status:

● influxdb.service - InfluxDB is an open-source, distributed, time series database
     Loaded: loaded (/lib/systemd/system/influxdb.service; enabled; vendor preset: enabled)
     Active: activating (start) since Wed 2022-09-07 09:52:38 CEST; 48s ago
       Docs: https://docs.influxdata.com/influxdb/
Cntrl PID: 594547 (influxd-systemd)
      Tasks: 2 (limit: 9460)
     Memory: 20.5M
        CPU: 1min 8.290s
     CGroup: /system.slice/influxdb.service
             ├─594547 /bin/bash -e /usr/lib/influxdb/scripts/influxd-systemd-start.sh
             └─594691 sleep 1

Sep 07 09:53:18 computer_name.provider.net influxd-systemd-start.sh[594547]: InfluxDB API at https://localhost:8086/ready unavailable after

@bednar, I’m about to build up a new server. I am just afraid that it might end up like with first server.

Monitoring InfluxDB process
What would you recommend to monitor? Or how can I make sure not to overload the server?

Writing large amounts of data (e.g. dataframe with 3 million rows).
@bednar I’ve seen your example on Github for ingesting a large dataframe. There you use the plain write_api() wo any non default WriteOptions. InfluxDB doc recommends however a batch_size=5000 and gzip compression. Is there any best practice to date?

Hi @paparovich,

It depends on your data, you can also use something like:

from datetime import datetime

from influxdb_client import InfluxDBClient
from influxdb_client.client.write_api import SYNCHRONOUS

"""
Configuration
"""
url = 'http://localhost:8086'
token = 'my-token'
org = 'my-org'
bucket = 'my-bucket'
batch_size = 1000

data_frame = ...

"""
Ingest DataFrame
"""
print()
print("=== Ingesting DataFrame via batching API ===")
print()
startTime = datetime.now()

with InfluxDBClient(url=url, token=token, org=org) as client:
    write_api = client.write_api(write_options=SYNCHRONOUS)
    for start in range(0, len(data_frame), batch_size):
        # prepare chunk
        chunk = data_frame[start:start + batch_size]
        print(f"{start}-{start + batch_size}...")
        # write chunk
        write_api.write(bucket=bucket,
                        record=chunk,
                        data_frame_tag_columns=['tag'],
                        data_frame_measurement_name="measurement_name")

print()
print(f'Import finished in: {datetime.now() - startTime}')
print()

Regards