Influxdb crash every monday at 0 am

Hi,
we have Influxdb 2.4 OpenSource version. It runs on Linux Centos 7. We are inserting TAGs into Influxdb from machines via NodeRED, which is connected to KepwareEx (data chain: machines->KepwareEx->OPC UA->NodeRED->InfluxDB->Grafana). There si approx. 300 inserts per second. Data retention of bucket is set to 1 year.

Problem: Every Monday at 0 am UTC (we are UTC+2) Influxdb stop responding and results error code: http 5xx. I can not connect to bucket via web gui and we must restart influxdb service. Same issue was with version 2.0, then we upgraded to 2.4 and issue is still persisting. We are monitoring influx and linux server by Prometheus, but there is nothing useful.

Have you had a look at the logs? can you share them?

Not completely sure, but given the retention of 1 year, I expect the compaction of the index to run once a week, it may lead to an Out Of Memory error.
This process can be limited via config options like storage-max-concurrent-compactions.

Log is deleted, because influx was falling after restart so we restarted machine and jurnaling was not set to persistent (after restart was journalctl deleted). Now is set to persistent, so I can post something on next monday … .

My influxdb config:

{
        "assets-path": "",
        "bolt-path": "/var/lib/influxdb/influxd.bolt",
        "e2e-testing": false,
        "engine-path": "/var/lib/influxdb/engine",
        "feature-flags": null,
        "flux-log-enabled": false,
        "hardening-enabled": false,
        "http-bind-address": ":8086",
        "http-idle-timeout": 180000000000,
        "http-read-header-timeout": 10000000000,
        "http-read-timeout": 0,
        "http-write-timeout": 0,
        "influxql-max-select-buckets": 0,
        "influxql-max-select-point": 0,
        "influxql-max-select-series": 0,
        "instance-id": "",
        "log-level": "info",
        "metrics-disabled": false,
        "nats-max-payload-bytes": 0,
        "nats-port": 0,
        "no-tasks": false,
        "pprof-disabled": false,
        "query-concurrency": 1024,
        "query-initial-memory-bytes": 0,
        "query-max-memory-bytes": 0,
        "query-memory-bytes": 0,
        "query-queue-size": 1024,
        "reporting-disabled": false,
        "secret-store": "bolt",
        "session-length": 60,
        "session-renew-disabled": false,
        "sqlite-path": "/var/lib/influxdb/influxd.sqlite",
        "storage-cache-max-memory-size": 1073741824,
        "storage-cache-snapshot-memory-size": 26214400,
        "storage-cache-snapshot-write-cold-duration": "10m0s",
        "storage-compact-full-write-cold-duration": "4h0m0s",
        "storage-compact-throughput-burst": 50331648,
        "storage-max-concurrent-compactions": 0,
        "storage-max-index-log-file-size": 1048576,
        "storage-no-validate-field-size": false,
        "storage-retention-check-interval": "30m0s",
        "storage-series-file-max-concurrent-snapshot-compactions": 0,
        "storage-series-id-set-cache-size": 0,
        "storage-shard-precreator-advance-period": "30m0s",
        "storage-shard-precreator-check-interval": "10m0s",
        "storage-tsm-use-madv-willneed": false,
        "storage-validate-keys": false,
        "storage-wal-fsync-delay": "0s",
        "storage-wal-max-concurrent-writes": 0,
        "storage-wal-max-write-delay": 600000000000,
        "storage-write-timeout": 10000000000,
        "store": "disk",
        "testing-always-allow-setup": false,
        "tls-cert": "",
        "tls-key": "",
        "tls-min-version": "1.2",
        "tls-strict-ciphers": false,
        "tracing-type": "",
        "ui-disabled": false,
        "vault-addr": "",
        "vault-cacert": "",
        "vault-capath": "",
        "vault-client-cert": "",
        "vault-client-key": "",
        "vault-client-timeout": 0,
        "vault-max-retries": 0,
        "vault-skip-verify": false,
        "vault-tls-server-name": "",
        "vault-token": ""
}

OOM is not reason (graph 1) because memory is raising after influx is not responding (500 node-fetch /api/v2/write is raising up and 204 write is falling down), so data are stocked in queue in NodeRed (last 2 graphs). Problem starts every monday around 2:05 am UTC+2

Here is log:

After “Reindexing WAL data” there are “Flux query failed” from Grafana (checking some alerts) and I see from Influxdb: lvl=warn msg=“internal error not returned to client”

Nothing else … some advice?

[root@svsk0101 ~]# journalctl -S "2022-09-12 00:50:00" -U "2022-09-12 02:30:00"
-- Logs begin at Mon 2022-09-05 09:50:48 CEST, end at Mon 2022-09-12 07:59:14 CEST. --
Sep 12 01:01:01 svsk0101 systemd[1]: Created slice user-0.slice.
Sep 12 01:01:01 svsk0101 systemd[1]: Starting user-0.slice.
Sep 12 01:01:01 svsk0101 CROND[30077]: (root) CMD (run-parts /etc/cron.hourly)
Sep 12 01:01:01 svsk0101 systemd[1]: Started Session 58 of user root.
Sep 12 01:01:01 svsk0101 systemd[1]: Starting Session 58 of user root.
Sep 12 01:01:01 svsk0101 run-parts(/etc/cron.hourly)[30080]: starting 0anacron
Sep 12 01:01:01 svsk0101 anacron[30086]: Anacron started on 2022-09-12
Sep 12 01:01:01 svsk0101 anacron[30086]: Normal exit (0 jobs run)
Sep 12 01:01:01 svsk0101 run-parts(/etc/cron.hourly)[30088]: finished 0anacron
Sep 12 01:01:01 svsk0101 run-parts(/etc/cron.hourly)[30090]: starting 0yum-hourly.cron
Sep 12 01:15:11 svsk0101 run-parts(/etc/cron.hourly)[16998]: finished 0yum-hourly.cron
Sep 12 01:15:11 svsk0101 systemd[1]: Removed slice user-0.slice.
Sep 12 01:15:11 svsk0101 systemd[1]: Stopping user-0.slice.
Sep 12 01:17:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-11T23:17:00.354626Z lvl=info msg="Retention policy deletion check (start)" log_id=0cpebx6W000 service=retention op_name=retention_delete_check op_event=start
Sep 12 01:17:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-11T23:17:00.356010Z lvl=info msg="Retention policy deletion check (end)" log_id=0cpebx6W000 service=retention op_name=retention_delete_check op_event=end op_elapsed=
Sep 12 01:29:56 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-11T23:29:56.352939Z lvl=info msg="Cache snapshot (start)" log_id=0cpebx6W000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=start
Sep 12 01:29:56 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-11T23:29:56.487131Z lvl=info msg="Snapshot for path written" log_id=0cpebx6W000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot path=/var/lib/influxdb
Sep 12 01:29:56 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-11T23:29:56.487165Z lvl=info msg="Cache snapshot (end)" log_id=0cpebx6W000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=end op_elapsed=134
Sep 12 01:47:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-11T23:47:00.354460Z lvl=info msg="Retention policy deletion check (start)" log_id=0cpebx6W000 service=retention op_name=retention_delete_check op_event=start
Sep 12 01:47:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-11T23:47:00.354668Z lvl=info msg="Retention policy deletion check (end)" log_id=0cpebx6W000 service=retention op_name=retention_delete_check op_event=end op_elapsed=
Sep 12 01:59:59 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-11T23:59:59.940299Z lvl=info msg="index opened with 8 partitions" log_id=0cpebx6W000 service=storage-engine index=tsi
Sep 12 01:59:59 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-11T23:59:59.940879Z lvl=info msg="Reindexing TSM data" log_id=0cpebx6W000 service=storage-engine engine=tsm1 db_shard_id=502
Sep 12 01:59:59 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-11T23:59:59.940892Z lvl=info msg="Reindexing WAL data" log_id=0cpebx6W000 service=storage-engine engine=tsm1 db_shard_id=502
Sep 12 02:00:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:00:00.030844Z lvl=info msg="index opened with 8 partitions" log_id=0cpebx6W000 service=storage-engine index=tsi
Sep 12 02:00:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:00:00.031491Z lvl=info msg="Reindexing TSM data" log_id=0cpebx6W000 service=storage-engine engine=tsm1 db_shard_id=499
Sep 12 02:00:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:00:00.031521Z lvl=info msg="Reindexing WAL data" log_id=0cpebx6W000 service=storage-engine engine=tsm1 db_shard_id=499
Sep 12 02:00:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:00:00.056250Z lvl=info msg="index opened with 8 partitions" log_id=0cpebx6W000 service=storage-engine index=tsi
Sep 12 02:00:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:00:00.056986Z lvl=info msg="Reindexing TSM data" log_id=0cpebx6W000 service=storage-engine engine=tsm1 db_shard_id=498
Sep 12 02:00:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:00:00.057004Z lvl=info msg="Reindexing WAL data" log_id=0cpebx6W000 service=storage-engine engine=tsm1 db_shard_id=498
Sep 12 02:00:04 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:00:04.265651Z lvl=info msg="index opened with 8 partitions" log_id=0cpebx6W000 service=storage-engine index=tsi
Sep 12 02:00:04 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:00:04.268975Z lvl=info msg="Reindexing TSM data" log_id=0cpebx6W000 service=storage-engine engine=tsm1 db_shard_id=506
Sep 12 02:00:04 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:00:04.268992Z lvl=info msg="Reindexing WAL data" log_id=0cpebx6W000 service=storage-engine engine=tsm1 db_shard_id=506
Sep 12 02:01:01 svsk0101 systemd[1]: Created slice user-0.slice.
Sep 12 02:01:01 svsk0101 systemd[1]: Starting user-0.slice.
Sep 12 02:01:01 svsk0101 systemd[1]: Started Session 59 of user root.
Sep 12 02:01:01 svsk0101 systemd[1]: Starting Session 59 of user root.
Sep 12 02:01:01 svsk0101 CROND[14185]: (root) CMD (run-parts /etc/cron.hourly)
Sep 12 02:01:01 svsk0101 run-parts(/etc/cron.hourly)[14188]: starting 0anacron
Sep 12 02:01:01 svsk0101 anacron[14194]: Anacron started on 2022-09-12
Sep 12 02:01:01 svsk0101 anacron[14194]: Normal exit (0 jobs run)
Sep 12 02:01:01 svsk0101 run-parts(/etc/cron.hourly)[14196]: finished 0anacron
Sep 12 02:01:01 svsk0101 run-parts(/etc/cron.hourly)[14198]: starting 0yum-hourly.cron
Sep 12 02:08:21 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:08:21.951015144+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:08:21 svsk0101 grafana-server[1291]: \r\n    |> last()\r\n\r\nD = C\r\n\t|> map( fn: (r) => ({\r\n\t\tr with \"alarm\" : if r._value < r.min or r._value > r.max then 1.0 else 0.0 } ) )\r\n    \r\nGRAFANA_ALARM_VALUE = D\r\n
Sep 12 02:08:46 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:08:46.947090143+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:08:46 svsk0101 grafana-server[1291]: now()) )\r\n  |> filter(fn: (r) => r[\"tag\"] =~ /[Tt]hermometer/)\r\n  |> last()\r\n  //|> keep( columns: [ \"_time\", \"_value\", \"_measurement\"])\r\n  |> map( fn:(r) => ({ r with _m
Sep 12 02:08:46 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:08:46.948573Z lvl=warn msg="internal error not returned to client" log_id=0cpebx6W000 handler=error_logger error="context canceled"
Sep 12 02:08:52 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:08:52.254908949+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:08:52 svsk0101 grafana-server[1291]: \r\n    |> last()\r\n\r\nD = C\r\n\t|> map( fn: (r) => ({\r\n\t\tr with \"alarm\" : if r._value < r.min or r._value > r.max then 1.0 else 0.0 } ) )\r\n    \r\nGRAFANA_ALARM_VALUE = D\r\n
Sep 12 02:09:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:09:00.111467Z lvl=info msg="Cache snapshot (start)" log_id=0cpebx6W000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=start
Sep 12 02:09:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:09:00.122266Z lvl=info msg="Snapshot for path written" log_id=0cpebx6W000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot path=/var/lib/influxdb
Sep 12 02:09:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:09:00.122304Z lvl=info msg="Cache snapshot (end)" log_id=0cpebx6W000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=end op_elapsed=10.
Sep 12 02:09:01 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:09:01.056883Z lvl=info msg="Cache snapshot (start)" log_id=0cpebx6W000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=start
Sep 12 02:09:01 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:09:01.060706Z lvl=info msg="Snapshot for path written" log_id=0cpebx6W000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot path=/var/lib/influxdb
Sep 12 02:09:01 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:09:01.060726Z lvl=info msg="Cache snapshot (end)" log_id=0cpebx6W000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=end op_elapsed=3.8
Sep 12 02:09:22 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:09:22.259337374+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:09:22 svsk0101 grafana-server[1291]: \r\n    |> last()\r\n\r\nD = C\r\n\t|> map( fn: (r) => ({\r\n\t\tr with \"alarm\" : if r._value < r.min or r._value > r.max then 1.0 else 0.0 } ) )\r\n    \r\nGRAFANA_ALARM_VALUE = D\r\n
Sep 12 02:09:46 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:09:46.949569726+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:09:46 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:09:46.949688Z lvl=warn msg="internal error not returned to client" log_id=0cpebx6W000 handler=error_logger error="context canceled"
Sep 12 02:09:46 svsk0101 grafana-server[1291]: now()) )\r\n  |> filter(fn: (r) => r[\"tag\"] =~ /[Tt]hermometer/)\r\n  |> last()\r\n  //|> keep( columns: [ \"_time\", \"_value\", \"_measurement\"])\r\n  |> map( fn:(r) => ({ r with _m
Sep 12 02:09:52 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:09:52.26435544+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" query
Sep 12 02:09:52 svsk0101 grafana-server[1291]: r\n    |> last()\r\n\r\nD = C\r\n\t|> map( fn: (r) => ({\r\n\t\tr with \"alarm\" : if r._value < r.min or r._value > r.max then 1.0 else 0.0 } ) )\r\n    \r\nGRAFANA_ALARM_VALUE = D\r\n/
Sep 12 02:09:54 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:09:54.352552Z lvl=info msg="Cache snapshot (start)" log_id=0cpebx6W000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=start
Sep 12 02:09:54 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:09:54.379240Z lvl=info msg="Snapshot for path written" log_id=0cpebx6W000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot path=/var/lib/influxdb
Sep 12 02:09:54 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:09:54.379278Z lvl=info msg="Cache snapshot (end)" log_id=0cpebx6W000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=end op_elapsed=26.
Sep 12 02:10:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:10:00.353068Z lvl=info msg="Cache snapshot (start)" log_id=0cpebx6W000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=start
Sep 12 02:10:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:10:00.427776Z lvl=info msg="Snapshot for path written" log_id=0cpebx6W000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot path=/var/lib/influxdb
Sep 12 02:10:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:10:00.427817Z lvl=info msg="Cache snapshot (end)" log_id=0cpebx6W000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=end op_elapsed=74.
Sep 12 02:10:22 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:10:22.269086076+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:10:22 svsk0101 grafana-server[1291]: \r\n    |> last()\r\n\r\nD = C\r\n\t|> map( fn: (r) => ({\r\n\t\tr with \"alarm\" : if r._value < r.min or r._value > r.max then 1.0 else 0.0 } ) )\r\n    \r\nGRAFANA_ALARM_VALUE = D\r\n
Sep 12 02:10:36 svsk0101 run-parts(/etc/cron.hourly)[26986]: finished 0yum-hourly.cron
Sep 12 02:10:36 svsk0101 systemd[1]: Removed slice user-0.slice.
Sep 12 02:10:36 svsk0101 systemd[1]: Stopping user-0.slice.
Sep 12 02:10:46 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:10:46.9488609+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" query=
Sep 12 02:10:46 svsk0101 grafana-server[1291]: w()) )\r\n  |> filter(fn: (r) => r[\"tag\"] =~ /[Tt]hermometer/)\r\n  |> last()\r\n  //|> keep( columns: [ \"_time\", \"_value\", \"_measurement\"])\r\n  |> map( fn:(r) => ({ r with _mea
Sep 12 02:10:46 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:10:46.949960Z lvl=warn msg="internal error not returned to client" log_id=0cpebx6W000 handler=error_logger error="context canceled"
Sep 12 02:10:52 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:10:52.273377195+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:10:52 svsk0101 grafana-server[1291]: \r\n    |> last()\r\n\r\nD = C\r\n\t|> map( fn: (r) => ({\r\n\t\tr with \"alarm\" : if r._value < r.min or r._value > r.max then 1.0 else 0.0 } ) )\r\n    \r\nGRAFANA_ALARM_VALUE = D\r\n
Sep 12 02:11:22 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:11:22.278197374+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:11:22 svsk0101 grafana-server[1291]: \r\n    |> last()\r\n\r\nD = C\r\n\t|> map( fn: (r) => ({\r\n\t\tr with \"alarm\" : if r._value < r.min or r._value > r.max then 1.0 else 0.0 } ) )\r\n    \r\nGRAFANA_ALARM_VALUE = D\r\n
Sep 12 02:11:41 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:11:41.947088764+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:11:41 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:11:41.947177Z lvl=warn msg="internal error not returned to client" log_id=0cpebx6W000 handler=error_logger error="context canceled"
Sep 12 02:11:41 svsk0101 grafana-server[1291]: now()) )\r\n  |> filter(fn: (r) => r[\"tag\"] =~ /[Tt]hermometer/)\r\n  |> last()\r\n  //|> keep( columns: [ \"_time\", \"_value\", \"_measurement\"])\r\n  |> map( fn:(r) => ({ r with _m
Sep 12 02:11:52 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:11:52.283299916+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:11:52 svsk0101 grafana-server[1291]: \r\n    |> last()\r\n\r\nD = C\r\n\t|> map( fn: (r) => ({\r\n\t\tr with \"alarm\" : if r._value < r.min or r._value > r.max then 1.0 else 0.0 } ) )\r\n    \r\nGRAFANA_ALARM_VALUE = D\r\n
Sep 12 02:12:22 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:12:22.287889915+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:10:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:10:00.353068Z lvl=info msg="Cache snapshot (start)" log_id=0cpebx6W000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=start
Sep 12 02:10:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:10:00.427776Z lvl=info msg="Snapshot for path written" log_id=0cpebx6W000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot path=/var/lib/influxdb
Sep 12 02:10:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:10:00.427817Z lvl=info msg="Cache snapshot (end)" log_id=0cpebx6W000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=end op_elapsed=74.
Sep 12 02:10:22 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:10:22.269086076+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:10:22 svsk0101 grafana-server[1291]: \r\n    |> last()\r\n\r\nD = C\r\n\t|> map( fn: (r) => ({\r\n\t\tr with \"alarm\" : if r._value < r.min or r._value > r.max then 1.0 else 0.0 } ) )\r\n    \r\nGRAFANA_ALARM_VALUE = D\r\n
Sep 12 02:10:36 svsk0101 run-parts(/etc/cron.hourly)[26986]: finished 0yum-hourly.cron
Sep 12 02:10:36 svsk0101 systemd[1]: Removed slice user-0.slice.
Sep 12 02:10:36 svsk0101 systemd[1]: Stopping user-0.slice.
Sep 12 02:10:46 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:10:46.9488609+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" query=
Sep 12 02:10:46 svsk0101 grafana-server[1291]: w()) )\r\n  |> filter(fn: (r) => r[\"tag\"] =~ /[Tt]hermometer/)\r\n  |> last()\r\n  //|> keep( columns: [ \"_time\", \"_value\", \"_measurement\"])\r\n  |> map( fn:(r) => ({ r with _mea
Sep 12 02:10:46 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:10:46.949960Z lvl=warn msg="internal error not returned to client" log_id=0cpebx6W000 handler=error_logger error="context canceled"
Sep 12 02:10:52 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:10:52.273377195+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:10:52 svsk0101 grafana-server[1291]: \r\n    |> last()\r\n\r\nD = C\r\n\t|> map( fn: (r) => ({\r\n\t\tr with \"alarm\" : if r._value < r.min or r._value > r.max then 1.0 else 0.0 } ) )\r\n    \r\nGRAFANA_ALARM_VALUE = D\r\n
Sep 12 02:11:22 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:11:22.278197374+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:11:22 svsk0101 grafana-server[1291]: \r\n    |> last()\r\n\r\nD = C\r\n\t|> map( fn: (r) => ({\r\n\t\tr with \"alarm\" : if r._value < r.min or r._value > r.max then 1.0 else 0.0 } ) )\r\n    \r\nGRAFANA_ALARM_VALUE = D\r\n
Sep 12 02:11:41 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:11:41.947088764+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:11:41 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:11:41.947177Z lvl=warn msg="internal error not returned to client" log_id=0cpebx6W000 handler=error_logger error="context canceled"
Sep 12 02:11:41 svsk0101 grafana-server[1291]: now()) )\r\n  |> filter(fn: (r) => r[\"tag\"] =~ /[Tt]hermometer/)\r\n  |> last()\r\n  //|> keep( columns: [ \"_time\", \"_value\", \"_measurement\"])\r\n  |> map( fn:(r) => ({ r with _m
Sep 12 02:11:52 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:11:52.283299916+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:11:52 svsk0101 grafana-server[1291]: \r\n    |> last()\r\n\r\nD = C\r\n\t|> map( fn: (r) => ({\r\n\t\tr with \"alarm\" : if r._value < r.min or r._value > r.max then 1.0 else 0.0 } ) )\r\n    \r\nGRAFANA_ALARM_VALUE = D\r\n
Sep 12 02:12:22 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:12:22.287889915+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:12:22 svsk0101 grafana-server[1291]: \r\n    |> last()\r\n\r\nD = C\r\n\t|> map( fn: (r) => ({\r\n\t\tr with \"alarm\" : if r._value < r.min or r._value > r.max then 1.0 else 0.0 } ) )\r\n    \r\nGRAFANA_ALARM_VALUE = D\r\n
Sep 12 02:12:46 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:12:46.947596446+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:12:46 svsk0101 grafana-server[1291]: now()) )\r\n  |> filter(fn: (r) => r[\"tag\"] =~ /[Tt]hermometer/)\r\n  |> last()\r\n  //|> keep( columns: [ \"_time\", \"_value\", \"_measurement\"])\r\n  |> map( fn:(r) => ({ r with _m
Sep 12 02:12:46 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:12:46.948824Z lvl=warn msg="internal error not returned to client" log_id=0cpebx6W000 handler=error_logger error="context canceled"
Sep 12 02:12:52 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:12:52.292701191+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:12:52 svsk0101 grafana-server[1291]: \r\n    |> last()\r\n\r\nD = C\r\n\t|> map( fn: (r) => ({\r\n\t\tr with \"alarm\" : if r._value < r.min or r._value > r.max then 1.0 else 0.0 } ) )\r\n    \r\nGRAFANA_ALARM_VALUE = D\r\n
Sep 12 02:13:22 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:13:22.297537698+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:13:22 svsk0101 grafana-server[1291]: \r\n    |> last()\r\n\r\nD = C\r\n\t|> map( fn: (r) => ({\r\n\t\tr with \"alarm\" : if r._value < r.min or r._value > r.max then 1.0 else 0.0 } ) )\r\n    \r\nGRAFANA_ALARM_VALUE = D\r\n
Sep 12 02:13:46 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:13:46.947344502+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:13:46 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:13:46.947522Z lvl=warn msg="internal error not returned to client" log_id=0cpebx6W000 handler=error_logger error="context canceled"
Sep 12 02:13:46 svsk0101 grafana-server[1291]: now()) )\r\n  |> filter(fn: (r) => r[\"tag\"] =~ /[Tt]hermometer/)\r\n  |> last()\r\n  //|> keep( columns: [ \"_time\", \"_value\", \"_measurement\"])\r\n  |> map( fn:(r) => ({ r with _m
Sep 12 02:13:52 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:13:52.302605327+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:13:52 svsk0101 grafana-server[1291]: \r\n    |> last()\r\n\r\nD = C\r\n\t|> map( fn: (r) => ({\r\n\t\tr with \"alarm\" : if r._value < r.min or r._value > r.max then 1.0 else 0.0 } ) )\r\n    \r\nGRAFANA_ALARM_VALUE = D\r\n
Sep 12 02:14:22 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:14:22.307811205+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:14:22 svsk0101 grafana-server[1291]: \r\n    |> last()\r\n\r\nD = C\r\n\t|> map( fn: (r) => ({\r\n\t\tr with \"alarm\" : if r._value < r.min or r._value > r.max then 1.0 else 0.0 } ) )\r\n    \r\nGRAFANA_ALARM_VALUE = D\r\n
Sep 12 02:14:46 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:14:46.948930972+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:14:46 svsk0101 grafana-server[1291]: now()) )\r\n  |> filter(fn: (r) => r[\"tag\"] =~ /[Tt]hermometer/)\r\n  |> last()\r\n  //|> keep( columns: [ \"_time\", \"_value\", \"_measurement\"])\r\n  |> map( fn:(r) => ({ r with _m
Sep 12 02:14:46 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:14:46.954278Z lvl=warn msg="internal error not returned to client" log_id=0cpebx6W000 handler=error_logger error="context canceled"
Sep 12 02:14:52 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:14:52.314096329+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:14:52 svsk0101 grafana-server[1291]: \r\n    |> last()\r\n\r\nD = C\r\n\t|> map( fn: (r) => ({\r\n\t\tr with \"alarm\" : if r._value < r.min or r._value > r.max then 1.0 else 0.0 } ) )\r\n    \r\nGRAFANA_ALARM_VALUE = D\r\n
Sep 12 02:15:22 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:15:22.319044469+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:15:22 svsk0101 grafana-server[1291]: \r\n    |> last()\r\n\r\nD = C\r\n\t|> map( fn: (r) => ({\r\n\t\tr with \"alarm\" : if r._value < r.min or r._value > r.max then 1.0 else 0.0 } ) )\r\n    \r\nGRAFANA_ALARM_VALUE = D\r\n
Sep 12 02:15:46 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:15:46.947990004+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:15:46 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:15:46.948334Z lvl=warn msg="internal error not returned to client" log_id=0cpebx6W000 handler=error_logger error="context canceled"
Sep 12 02:15:46 svsk0101 grafana-server[1291]: now()) )\r\n  |> filter(fn: (r) => r[\"tag\"] =~ /[Tt]hermometer/)\r\n  |> last()\r\n  //|> keep( columns: [ \"_time\", \"_value\", \"_measurement\"])\r\n  |> map( fn:(r) => ({ r with _m
Sep 12 02:15:52 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:15:52.323842704+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:15:52 svsk0101 grafana-server[1291]: \r\n    |> last()\r\n\r\nD = C\r\n\t|> map( fn: (r) => ({\r\n\t\tr with \"alarm\" : if r._value < r.min or r._value > r.max then 1.0 else 0.0 } ) )\r\n    \r\nGRAFANA_ALARM_VALUE = D\r\n
Sep 12 02:16:22 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:16:22.329287539+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:16:22 svsk0101 grafana-server[1291]: \r\n    |> last()\r\n\r\nD = C\r\n\t|> map( fn: (r) => ({\r\n\t\tr with \"alarm\" : if r._value < r.min or r._value > r.max then 1.0 else 0.0 } ) )\r\n    \r\nGRAFANA_ALARM_VALUE = D\r\n
Sep 12 02:16:46 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:16:46.948462523+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:16:46 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:16:46.948570Z lvl=warn msg="internal error not returned to client" log_id=0cpebx6W000 handler=error_logger error="context canceled"
Sep 12 02:16:46 svsk0101 grafana-server[1291]: now()) )\r\n  |> filter(fn: (r) => r[\"tag\"] =~ /[Tt]hermometer/)\r\n  |> last()\r\n  //|> keep( columns: [ \"_time\", \"_value\", \"_measurement\"])\r\n  |> map( fn:(r) => ({ r with _m
Sep 12 02:16:52 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:16:52.334282669+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:16:52 svsk0101 grafana-server[1291]: \r\n    |> last()\r\n\r\nD = C\r\n\t|> map( fn: (r) => ({\r\n\t\tr with \"alarm\" : if r._value < r.min or r._value > r.max then 1.0 else 0.0 } ) )\r\n    \r\nGRAFANA_ALARM_VALUE = D\r\n
Sep 12 02:17:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:17:00.059911Z lvl=info msg="Cache snapshot (start)" log_id=0cpebx6W000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=start
Sep 12 02:17:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:17:00.068507Z lvl=info msg="Snapshot for path written" log_id=0cpebx6W000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot path=/var/lib/influxdb
Sep 12 02:17:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:17:00.068559Z lvl=info msg="Cache snapshot (end)" log_id=0cpebx6W000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=end op_elapsed=8.6
Sep 12 02:17:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:17:00.354282Z lvl=info msg="Retention policy deletion check (start)" log_id=0cpebx6W000 service=retention op_name=retention_delete_check op_event=start
Sep 12 02:17:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:17:00.356300Z lvl=info msg="Deleted shard group" log_id=0cpebx6W000 service=retention op_name=retention_delete_check db_instance=8716f6e9e41fde66 db_shard_grou
Sep 12 02:17:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:17:00.358186Z lvl=info msg="Deleted shard group" log_id=0cpebx6W000 service=retention op_name=retention_delete_check db_instance=d374d7eddb702911 db_shard_grou
Sep 12 02:17:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:17:00.362215Z lvl=info msg="Deleted shard" log_id=0cpebx6W000 service=retention op_name=retention_delete_check db_instance=d374d7eddb702911 db_shard_id=476 db_
Sep 12 02:17:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:17:00.364924Z lvl=info msg="Deleted shard" log_id=0cpebx6W000 service=retention op_name=retention_delete_check db_instance=8716f6e9e41fde66 db_shard_id=490 db_
Sep 12 02:17:00 svsk0101 influxd-systemd-start.sh[1295]: ts=2022-09-12T00:17:00.366853Z lvl=info msg="Retention policy deletion check (end)" log_id=0cpebx6W000 service=retention op_name=retention_delete_check op_event=end op_elapsed=
Sep 12 02:17:22 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:17:22.338949243+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:17:22 svsk0101 grafana-server[1291]: \r\n    |> last()\r\n\r\nD = C\r\n\t|> map( fn: (r) => ({\r\n\t\tr with \"alarm\" : if r._value < r.min or r._value > r.max then 1.0 else 0.0 } ) )\r\n    \r\nGRAFANA_ALARM_VALUE = D\r\n
Sep 12 02:17:46 svsk0101 grafana-server[1291]: logger=tsdb.influx_flux t=2022-09-12T02:17:46.948846918+02:00 level=warn msg="Flux query failed" err="Post \"http://localhost:8086/api/v2/query?org=TDK\": context deadline exceeded" quer
Sep 12 02:17:46 svsk0101 grafana-server[1291]: now()) )\r\n  |> filter(fn: (r) => r[\"tag\"] =~ /[Tt]hermometer/)\r\n  |> last()\r\n  //|> keep( columns: [ \"_time\", \"_value\", \"_measurement\"])\r\n  |> map( fn:(r) => ({ r with _m
...
...
...

I was coming here for the same symptom / similar issue with InfluxDB v2.4.0 . For me, Sunday night 8pm which happens to be 00:00 Monday UTC.


I have 24G of RAM and 4G of swap. I added more swap but not sure how to properly size things. I do have a fair bit of data I supposed. I have a local telegraf agent running in the VM that once I restart influxdb2, unloads its collected information.
I will also try

storage-max-concurrent-compactions = 0

in /etc/influxdb/config.toml


Sep 11 23:59:02 influxdb influxd-systemd-start.sh[819]: ts=2022-09-11T23:59:02.737785Z lvl=info msg="Cache snapshot (start)" log_id=0cn0mGUW000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=start
Sep 11 23:59:03 influxdb influxd-systemd-start.sh[819]: ts=2022-09-11T23:59:03.033635Z lvl=info msg="Snapshot for path written" log_id=0cn0mGUW000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot path=/var/lib/influxdb/engine/data/fe5285a70f105db0/autogen/369 duration=295.847ms
Sep 11 23:59:03 influxdb influxd-systemd-start.sh[819]: ts=2022-09-11T23:59:03.033888Z lvl=info msg="Cache snapshot (end)" log_id=0cn0mGUW000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=end op_elapsed=296.109ms
Sep 11 23:59:14 influxdb loki-linux-amd64[771]: level=info ts=2022-09-11T23:59:14.505899621Z caller=table_manager.go:169 msg="uploading tables"
Sep 11 23:59:23 influxdb influxd-systemd-start.sh[819]: ts=2022-09-11T23:59:23.738899Z lvl=info msg="Cache snapshot (start)" log_id=0cn0mGUW000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=start
Sep 11 23:59:24 influxdb influxd-systemd-start.sh[819]: ts=2022-09-11T23:59:24.030899Z lvl=info msg="Snapshot for path written" log_id=0cn0mGUW000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot path=/var/lib/influxdb/engine/data/fe5285a70f105db0/autogen/369 duration=292.009ms
Sep 11 23:59:24 influxdb influxd-systemd-start.sh[819]: ts=2022-09-11T23:59:24.031330Z lvl=info msg="Cache snapshot (end)" log_id=0cn0mGUW000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=end op_elapsed=292.438ms
Sep 11 23:59:57 influxdb influxd-systemd-start.sh[819]: ts=2022-09-11T23:59:57.531593Z lvl=info msg="index opened with 8 partitions" log_id=0cn0mGUW000 service=storage-engine index=tsi
Sep 11 23:59:57 influxdb influxd-systemd-start.sh[819]: ts=2022-09-11T23:59:57.532335Z lvl=info msg="Reindexing TSM data" log_id=0cn0mGUW000 service=storage-engine engine=tsm1 db_shard_id=387
Sep 11 23:59:57 influxdb influxd-systemd-start.sh[819]: ts=2022-09-11T23:59:57.532420Z lvl=info msg="Reindexing WAL data" log_id=0cn0mGUW000 service=storage-engine engine=tsm1 db_shard_id=387
Sep 12 00:00:00 influxdb influxd-systemd-start.sh[819]: ts=2022-09-12T00:00:00.520397Z lvl=info msg="index opened with 8 partitions" log_id=0cn0mGUW000 service=storage-engine index=tsi
Sep 12 00:00:00 influxdb influxd-systemd-start.sh[819]: ts=2022-09-12T00:00:00.521032Z lvl=info msg="Reindexing TSM data" log_id=0cn0mGUW000 service=storage-engine engine=tsm1 db_shard_id=385
Sep 12 00:00:00 influxdb influxd-systemd-start.sh[819]: ts=2022-09-12T00:00:00.521125Z lvl=info msg="Reindexing WAL data" log_id=0cn0mGUW000 service=storage-engine engine=tsm1 db_shard_id=385
Sep 12 00:00:00 influxdb influxd-systemd-start.sh[819]: ts=2022-09-12T00:00:00.557149Z lvl=info msg="index opened with 8 partitions" log_id=0cn0mGUW000 service=storage-engine index=tsi
Sep 12 00:00:00 influxdb influxd-systemd-start.sh[819]: ts=2022-09-12T00:00:00.557542Z lvl=info msg="Reindexing TSM data" log_id=0cn0mGUW000 service=storage-engine engine=tsm1 db_shard_id=386
Sep 12 00:00:00 influxdb influxd-systemd-start.sh[819]: ts=2022-09-12T00:00:00.557649Z lvl=info msg="Reindexing WAL data" log_id=0cn0mGUW000 service=storage-engine engine=tsm1 db_shard_id=386
Sep 12 00:00:08 influxdb systemd[1]: Starting Discard unused blocks on filesystems from /etc/fstab...
Sep 12 00:00:08 influxdb systemd[1]: Starting Rotate log files...
Sep 12 00:00:08 influxdb systemd[1]: Starting Daily man-db regeneration...
Sep 12 00:00:08 influxdb logrotate[53119]: error: Ignoring influxdb because it is writable by group or others.
Sep 12 00:00:08 influxdb systemd[1]: man-db.service: Succeeded.
Sep 12 00:00:08 influxdb systemd[1]: Finished Daily man-db regeneration.
Sep 12 00:00:08 influxdb systemd[1]: logrotate.service: Succeeded.
Sep 12 00:00:08 influxdb systemd[1]: Finished Rotate log files.
Sep 12 00:00:08 influxdb fstrim[53117]: /: 73.3 GiB (78673907712 bytes) trimmed on /dev/disk/by-uuid/b3992e1c-58d6-4a55-8c84-72cad5f516b3
Sep 12 00:00:08 influxdb systemd[1]: fstrim.service: Succeeded.
Sep 12 00:00:08 influxdb systemd[1]: Finished Discard unused blocks on filesystems from /etc/fstab.
Sep 12 00:00:11 influxdb influxd-systemd-start.sh[819]: ts=2022-09-12T00:00:11.807929Z lvl=info msg="index opened with 8 partitions" log_id=0cn0mGUW000 service=storage-engine index=tsi
Sep 12 00:00:11 influxdb influxd-systemd-start.sh[819]: ts=2022-09-12T00:00:11.808357Z lvl=info msg="Reindexing TSM data" log_id=0cn0mGUW000 service=storage-engine engine=tsm1 db_shard_id=388
Sep 12 00:00:11 influxdb influxd-systemd-start.sh[819]: ts=2022-09-12T00:00:11.808369Z lvl=info msg="Reindexing WAL data" log_id=0cn0mGUW000 service=storage-engine engine=tsm1 db_shard_id=388
Sep 12 00:00:14 influxdb loki-linux-amd64[771]: level=info ts=2022-09-12T00:00:14.505858643Z caller=table_manager.go:169 msg="uploading tables"
Sep 12 00:00:26 influxdb influxd-systemd-start.sh[819]: ts=2022-09-12T00:00:26.037237Z lvl=error msg="Unable to write gathered points" log_id=0cn0mGUW000 service=scraper scraper-name="new target" error=timeout
Sep 12 00:00:26 influxdb influxd-systemd-start.sh[819]: ts=2022-09-12T00:00:26.295457Z lvl=info msg="index opened with 8 partitions" log_id=0cn0mGUW000 service=storage-engine index=tsi
Sep 12 00:00:26 influxdb influxd-systemd-start.sh[819]: ts=2022-09-12T00:00:26.295879Z lvl=info msg="Reindexing TSM data" log_id=0cn0mGUW000 service=storage-engine engine=tsm1 db_shard_id=389
Sep 12 00:00:26 influxdb influxd-systemd-start.sh[819]: ts=2022-09-12T00:00:26.295889Z lvl=info msg="Reindexing WAL data" log_id=0cn0mGUW000 service=storage-engine engine=tsm1 db_shard_id=389
Sep 12 00:00:28 influxdb influxd-systemd-start.sh[819]: ts=2022-09-12T00:00:28.301193Z lvl=info msg="Cache snapshot (start)" log_id=0cn0mGUW000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=start
Sep 12 00:00:28 influxdb influxd-systemd-start.sh[819]: ts=2022-09-12T00:00:28.595528Z lvl=info msg="Snapshot for path written" log_id=0cn0mGUW000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot path=/var/lib/influxdb/engine/data/7a038e6d102418c6/autogen/389 duration=294.325ms
Sep 12 00:00:28 influxdb influxd-systemd-start.sh[819]: ts=2022-09-12T00:00:28.595558Z lvl=info msg="Cache snapshot (end)" log_id=0cn0mGUW000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=end op_elapsed=294.382ms
Sep 12 00:00:36 influxdb influxd-systemd-start.sh[819]: ts=2022-09-12T00:00:36.029085Z lvl=error msg="Unable to write gathered points" log_id=0cn0mGUW000 service=scraper scraper-name="new target" error=timeout
Sep 12 00:00:46 influxdb influxd-systemd-start.sh[819]: ts=2022-09-12T00:00:46.036073Z lvl=error msg="Unable to write gathered points" log_id=0cn0mGUW000 service=scraper scraper-name="new target" error=timeout
Sep 12 00:00:56 influxdb influxd-systemd-start.sh[819]: ts=2022-09-12T00:00:56.038366Z lvl=error msg="Unable to write gathered points" log_id=0cn0mGUW000 service=scraper scraper-name="new target" error=timeout
Sep 12 00:01:06 influxdb influxd-systemd-start.sh[819]: ts=2022-09-12T00:01:06.036601Z lvl=error msg="Unable to write gathered points" log_id=0cn0mGUW000 service=scraper scraper-name="new target" error=timeout

Hello @mdtancsa
Did trying that option help?

@Jarda_K,
I’m not sure what’s going on here. I’m asking around and I’ll get back to you as soon as I hear back. Half of the company is meeting though so there might be a little delay. I appreciate your patience.

Hi @Anaisdg thanks for checking in! I have not seen the problem yet, but it seems to be every Monday at 00:00:01 UTC. Is there a way to force the app to do whatever clean up it normally does at that time? Or should I just wait.

Hi there,
we are trying restart influxdb automatically via NodeRED when it is not responding (this is not solution, but we do not want to miss data). It works so that when read from nodered is failing, then after 1 min is influx restarted via nodered. Result is this:

At 2 a.m. there is fail of influx, then it is restarted via nodered and after approx. 12 min it fails again four time in row … . I thing there is some problem with shards. Any idea?