InfluxDBv2.3 (Ubuntu 22.04) is stuck after large writing operation (python)

paparovich · September 5, 2022, 12:04pm

My InfluxDB v2.3 OSS process on Ubutu 22.04 seems stuck after repeated hours long writing operation (millions of rows for each measurement, >1000 mms) using the python influxdb-client (v1.29.1) with following code/settings:

with InfluxDBClient(timeout=120000, verify_ssl=False, enable_gzip=True) as client_v2:
with client_v2.write_api(write_options=WriteOptions(batch_size=5_000, flush_interval=10_000)) as inf2_write_api:

After working several hours my script running machine started to issue more and more often the following errors in the prompt (I had to translate some from German to English):

The retriable error occurred during request. Reason: ‘(‘Connection aborted.’, ConnectionResetError(10054, ‘An existing connection was closed by the remote host.’, None, 10054, None))’.
The retriable error occurred during request. Reason: ‘<urllib3.connection.HTTPSConnection object at 0x000001CD2836BC40>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it’.
The batch item wasn’t processed successfully because: HTTPSConnectionPool(host=‘xxx.xx.xxx.xx’, port=8086): Max retries exceeded with url: /api/v2/write?o…
Failed to establish a new connection: [WinError 10065] Host could not be reached at a socket process
Failed to establish a new connection: [WinError 10060] A connection failed, because target did not respond after a certain time, or the established connection was faulty, because the connected host did not respond.'.

Rebooting the server, stopping and restarting the process with “sudo service influxdb stop/start” did not help. Checking the process status with “sudo service influxdb status” gives out the following:

CGroup: /system.slice/influxdb.service
├─123968 /bin/bash -e /usr/lib/influxdb/scripts/influxd-systemd-start.sh
└─124087 sleep 1

Sep 05 10:37:33 computer_name.provider.net influxd-systemd-start.sh[123969]: ts=2022-09-05T08:37:32.736459Z lvl=info msg=“Opened shard” log_id=0ckC_v6G000 service=storage-engine service=>
Sep 05 10:37:33 computer_name.provider.net influxd-systemd-start.sh[123968]: InfluxDB API at https://localhost:8086/ready unavailable after 20 attempts…
Sep 05 10:37:34 computer_name.provider.net influxd-systemd-start.sh[123968]: /usr/lib/influxdb/scripts/influxd-systemd-start.sh: line 28: 123969 Killed /usr/bin/influxd
Sep 05 10:37:34 computer_name.provider.net influxd-systemd-start.sh[123968]: InfluxDB API at https://localhost:8086/ready unavailable after 21 attempts…
Sep 05 10:37:35 computer_name.provider.net influxd-systemd-start.sh[123968]: InfluxDB API at https://localhost:8086/ready unavailable after 22 attempts…
Sep 05 10:37:36 computer_name.provider.net influxd-systemd-start.sh[123968]: InfluxDB API at https://localhost:8086/ready unavailable after 23 attempts…

The default Influx config.toml is changed with added SSL cert and key options. Self signed cert.

Can I provide additional information?
What did I do wrong?
Is there any chance to save that?
I would be grateful for any help!

paparovich · September 5, 2022, 1:52pm

Looking at the activation time I assume influx starts and again:

influxdb.service - InfluxDB is an open-source, distributed, time series datab>
Loaded: loaded (/lib/systemd/system/influxdb.service; enabled; vendor pres>
Active: activating (start) since Mon 2022-09-05 15:44:24 CEST; 1min 17s ago
Active: activating (start) since Mon 2022-09-05 15:45:55 CEST; 1min 2s ago
Active: activating (start) since Mon 2022-09-05 15:47:25 CEST; 4s ago

Anaisdg · September 6, 2022, 8:13pm

@bednar do you have any insight here? Thank you for your help!

paparovich · September 6, 2022, 8:53pm

Sorry, I did not know where to place it best, so opened also a bug issue on GitHub, with some additional information. Maybe it helps: InfluxDBv2.3 (Ubuntu 22.04) refuses connections after large recurring writing operation (python client) · Issue #23712 · influxdata/influxdb · GitHub

With some advice I could do more analysis. Would be really happy if it is possible to save the database and especially all the data. Thank you.

bednar · September 7, 2022, 7:24am

Hi @paparovich,

curl profiles shows following:
curl -o profiles.tar.gz "https://localhost:8086/debug/pprof/all?cpu=30s"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (7) Failed to connect to localhost port 8086 after 0 ms: Connection refused

It looks like your DB is not reachable. You can test connection to your db by something like:

curl -i -X GET http://localhost:8086/ping

Is your DB properly sized? For more info see - Hardware sizing guidelines | InfluxDB OSS 1.8 Documentation

Regards

paparovich · September 7, 2022, 7:58am

Hi @bednar, curl gives me:

curl: (7) Failed to connect to localhost port 8086 after 0 ms: Connection refused

The hardware might be not appropriate for the large writing operation. But it was a single event, max 1-2 writing operations at the same time, worked apparently stable for several hours, though it took several minutes for each measurement. And still I would not expect the process to collapse entirely.

It look like the machine tries to start and activate the process again and again. It looks like it just started anytime I check the service status:

● influxdb.service - InfluxDB is an open-source, distributed, time series database
     Loaded: loaded (/lib/systemd/system/influxdb.service; enabled; vendor preset: enabled)
     Active: activating (start) since Wed 2022-09-07 09:52:38 CEST; 48s ago
       Docs: https://docs.influxdata.com/influxdb/
Cntrl PID: 594547 (influxd-systemd)
      Tasks: 2 (limit: 9460)
     Memory: 20.5M
        CPU: 1min 8.290s
     CGroup: /system.slice/influxdb.service
             ├─594547 /bin/bash -e /usr/lib/influxdb/scripts/influxd-systemd-start.sh
             └─594691 sleep 1

Sep 07 09:53:18 computer_name.provider.net influxd-systemd-start.sh[594547]: InfluxDB API at https://localhost:8086/ready unavailable after

paparovich · September 8, 2022, 7:33pm

@bednar, I’m about to build up a new server. I am just afraid that it might end up like with first server.

Monitoring InfluxDB process
What would you recommend to monitor? Or how can I make sure not to overload the server?

Writing large amounts of data (e.g. dataframe with 3 million rows).
@bednar I’ve seen your example on Github for ingesting a large dataframe. There you use the plain write_api() wo any non default WriteOptions. InfluxDB doc recommends however a batch_size=5000 and gzip compression. Is there any best practice to date?

bednar · September 9, 2022, 7:11am

Hi @paparovich,

It depends on your data, you can also use something like:

from datetime import datetime

from influxdb_client import InfluxDBClient
from influxdb_client.client.write_api import SYNCHRONOUS

"""
Configuration
"""
url = 'http://localhost:8086'
token = 'my-token'
org = 'my-org'
bucket = 'my-bucket'
batch_size = 1000

data_frame = ...

"""
Ingest DataFrame
"""
print()
print("=== Ingesting DataFrame via batching API ===")
print()
startTime = datetime.now()

with InfluxDBClient(url=url, token=token, org=org) as client:
    write_api = client.write_api(write_options=SYNCHRONOUS)
    for start in range(0, len(data_frame), batch_size):
        # prepare chunk
        chunk = data_frame[start:start + batch_size]
        print(f"{start}-{start + batch_size}...")
        # write chunk
        write_api.write(bucket=bucket,
                        record=chunk,
                        data_frame_tag_columns=['tag'],
                        data_frame_measurement_name="measurement_name")

print()
print(f'Import finished in: {datetime.now() - startTime}')
print()

Regards

Topic		Replies	Views
InfluxDB stops writing	0	2019	November 26, 2019
InfluxDB Client failure when writing large amounts of data InfluxDB 2 influxdb , time-series , client-libraries , query , performance	14	9885	August 21, 2020
Write to InfluxDB cloud fails with ApiException: (503) Reason: Service Unavailable; upstream connect error or disconnect/reset before headers. reset reason: connection failure InfluxDB 2 client-libraries , performance , influxdb-cloud-2-0	48	3878	October 18, 2021
Influx write fails with "internal error" InfluxDB 2	8	5152	October 4, 2021
Call to /api/v2/write yields 500 internal server error due to timeout InfluxDB 2	1	294	July 14, 2024

InfluxDBv2.3 (Ubuntu 22.04) is stuck after large writing operation (python)

Related topics