Hi everyone,
first of all I would like to praise InfluxDB. It runs great on the RasPi 3B and has been doing so for about 9 months now. It is running on Raspbian, everything is up to date. I really appreciate that armhf packages for Debian are being built natively, this makes keeping up to date very easy.
While things were great for about the last 9 months, now InfluxDB seems to have failed on me. The whole system is running on an SSD and is only 7% full, so that is not the issue. nothing at all has changed, I have not done any system updates for days nor even logged into the system for a couple days. I am using Grafana 6.0.2 to display data from InfluxDB
Symptoms:
- The influxd process is between 100% and 350% CPU usage, of course this is not normal
- Not responding to any http queries or data posts
- Then I try to get the “influx” console I get:
Failed to connect to http://localhost:8086: Get http://localhost:8086/ping: dial tcp [::1]:8086: connect: connection refused
Please check your connection settings and ensure ‘influxd’ is running.
Of course influxd is running. - I cannot find any logs. i have not modified the config file in this respect. The folder /var/log/influxdb is created but empty.
- Grafana also cannot show data. When it tries to access the DB it reports “Network Error: Bad Gateway(502)”, although Influx is running on the same machine at localhost:8086.
- I tried to stop grafana and do a manual backup to get some output:
sudo influxd backup -portable ~/krftwrk_backup/
2019/03/25 19:02:28 backing up metastore to /home/alessio/krftwrk_backup/meta.00
2019/03/25 19:02:58 Download shard 0 failed copy backup to file: err=read tcp 127.0.0.1:57698->127.0.0.1:8088: read: connection reset by peer, n=0. Waiting 2s and retrying (0)…
2019/03/25 19:03:29 Download shard 0 failed copy backup to file: err=read tcp 127.0.0.1:57810->127.0.0.1:8088: read: connection reset by peer, n=0. Waiting 2s and retrying (1)…
2019/03/25 19:04:01 Download shard 0 failed copy backup to file: err=read tcp 127.0.0.1:57914->127.0.0.1:8088: read: connection reset by peer, n=0. Waiting 2s and retrying (2)…
2019/03/25 19:04:33 Download shard 0 failed copy backup to file: err=read tcp 127.0.0.1:58018->127.0.0.1:8088: read: connection reset by peer, n=0. Waiting 2s and retrying (3)…
2019/03/25 19:05:05 Download shard 0 failed copy backup to file: err=read tcp 127.0.0.1:58122->127.0.0.1:8088: read: connection reset by peer, n=0. Waiting 2s and retrying (4)…
2019/03/25 19:05:33 Download shard 0 failed copy backup to file: err=read tcp 127.0.0.1:58254->127.0.0.1:8088: read: connection reset by peer, n=0. Waiting 2s and retrying (5)…
2019/03/25 19:06:05 Download shard 0 failed copy backup to file: err=read tcp 127.0.0.1:58334->127.0.0.1:8088: read: connection reset by peer, n=0. Waiting 3.01s and retrying (6)…
2019/03/25 19:06:38 Download shard 0 failed copy backup to file: err=read tcp 127.0.0.1:58470->127.0.0.1:8088: read: connection reset by peer, n=0. Waiting 11.441s and retrying (7)…
2019/03/25 19:07:19 Download shard 0 failed copy backup to file: err=read tcp 127.0.0.1:58602->127.0.0.1:8088: read: connection reset by peer, n=0. Waiting 43.477s and retrying (8)…
2019/03/25 19:08:33 Download shard 0 failed copy backup to file: err=read tcp 127.0.0.1:58854->127.0.0.1:8088: read: connection reset by peer, n=0. Waiting 2m45.216s and retrying (9)… - When manually running influxd I get entries like:
2019-03-25T18:20:06.737327Z info Opened file {“log_id”: “0EPX8OA0000”, “engine”: “tsm1”, “service”: “filestore”, “path”: “/var/lib/influxdb/data/machinestats/autogen/470/000000001-000000001.tsm”, “id”: 0, “duration”: “6.258ms”}
2019-03-25T18:20:06.741604Z info Opened shard {“log_id”: “0EPX8OA0000”, “service”: “store”, “trace_id”: “0EPX8O_0000”, “op_name”: “tsdb_open”, “index_version”: “inmem”, “path”: “/var/lib/influxdb/data/machinestats/autogen/459”, “duration”: “18.055ms”}
2019-03-25T18:20:06.748781Z info Opened file {“log_id”: “0EPX8OA0000”, “engine”: “tsm1”, “service”: “filestore”, “path”: “/var/lib/influxdb/data/machinestats/autogen/481/000000001-000000001.tsm”, “id”: 0, “duration”: “3.113ms”}
2019-03-25T18:20:06.750204Z info Opened shard {“log_id”: “0EPX8OA0000”, “service”: “store”, “trace_id”: “0EPX8O_0000”, “op_name”: “tsdb_open”, “index_version”: “inmem”, “path”: “/var/lib/influxdb/data/machinestats/autogen/470”, “duration”: “21.187ms”}
2019-03-25T18:20:06.754831Z info Opened file {“log_id”: “0EPX8OA0000”, “engine”: “tsm1”, “service”: “filestore”, “path”: “/var/lib/influxdb/data/machinestats/autogen/492/000000001-000000001.tsm”, “id”: 0, “duration”: “2.349ms”}
2019-03-25T18:20:06.762807Z info Opened shard {“log_id”: “0EPX8OA0000”, “service”: “store”, “trace_id”: “0EPX8O_0000”, “op_name”: “tsdb_open”, “index_version”: “inmem”, “path”: “/var/lib/influxdb/data/machinestats/autogen/481”, “duration”: “20.889ms”}
2019-03-25T18:20:06.770042Z info Opened file {“log_id”: “0EPX8OA0000”, “engine”: “tsm1”, “service”: “filestore”, “path”: “/var/lib/influxdb/data/machinestats/autogen/505/000000001-000000001.tsm”, “id”: 0, “duration”: “3.404ms”}
2019-03-25T18:20:06.771709Z info Opened shard {“log_id”: “0EPX8OA0000”, “service”: “store”, “trace_id”: “0EPX8O_0000”, “op_name”: “tsdb_open”, “index_version”: “inmem”, “path”: “/var/lib/influxdb/data/machinestats/autogen/492”, “duration”: “21.151ms”}
So to be honest I am not quite sure where to look. The DB might be corrupted (why? There were no unplanned reboots etc.), the power supply is fine and I am stumped because nothign has changed on the machine. Does anyone have hints where to look?
Thanks in advance