Shard file created but not written

Suhanbongo · May 31, 2023, 8:46am

Hi, We have many buckets, the retention days is set to 60, thus the default shard group duration is 1d.
In some cased the newly created shard file (got corrupted?) cannot written by the metrics provider and only influx restart solve this issue. This always happen when there is a switch at 00:00 from old shard to new one. Beside we noticed that memory usage is increasing during this period

Could you please advise what could be the reason or how can we eliminate and/or monitor this situation ?

Influxdb OSS 2.7 running on Openshift with NFS storage

Thanks

Suhanbongo · June 6, 2023, 3:17pm

Same issue what is described here (just found now)

github.com/influxdata/influxdb

InfluxDB bucket stops reading+writing every couple of days

opened 10:45AM - 29 Nov 22 UTC

nward

__Steps to reproduce:__ List the minimal actions needed to reproduce the behavi…or. 1. Run influxdb2 2. Insert metrics (with telegraf) 3. Wait for some time __Expected behavior:__ Things keep working __Actual behavior:__ InfluxDB2 main index ("telegraf") stops reading/writing data. Other indexes work fine - including one which is 5m aggregates of the telegraf raw index. (obviously this does not get any new data) We have had this in the past randomly, but in the last few weeks has happened every few days. In the past it seemed to happen at 00:00UTC when influx did some internal DB maintenance - but now happens at random times. __Environment info:__ * System info: Linux 3.10.0-1160.66.1.el7.x86_64 x86_64 * InfluxDB version: InfluxDB v2.3.0+SNAPSHOT.090f681737 (git: 090f681737) build_date: 2022-06-16T19:33:50Z * Other relevant environment details: CentOS 7 on vmware - lots of spare IO, CPU, memory. Our database is 170GB, mostly metrics inserted every 60s, some every 600s. storage_writer_ok_points is around 2.5k/s for 7mins, then ~25k/s for 3mins for the every-600s burst. VM has 32G RAM, 28G of which is in buffers/cache. 4 cores, and typically sits at around 90% idle. ~ 24IOPS, 8MiB/s __Config:__ ``` bolt-path = "/var/lib/influxdb/influxd.bolt" engine-path = "/var/lib/influxdb/engine" flux-log-enabled = "true" ``` We have enabled flux-log to see if specific queries are causing this - but it doesn't seem to be. __Logs:__ Include snippet of errors in log. __Performance:__ I captured a 10s pprof which I will attach. I also have a core dump, and a 60s dump of debug/pprof/trace (though not sure if this has sensitive info but can share privately - the core dump certainly will)

Anaisdg · June 7, 2023, 10:26pm

Hello @Suhanbongo,
Thanks for posting the issue. If there’s an already issue I’d refer to that for help. Thank you

Topic		Replies	Views
Queries and writes blocked until restart of InfluxDB InfluxDB 2 influxdb , query	4	593	October 17, 2024
Failed to send metrics on some buckets after disk full InfluxDB 2 influxdb , telegraf	6	1317	May 31, 2022
Error: shard <num>: short write Store	2	2352	June 3, 2021
Timeouts severe loading data InfluxDB 2 influxdb	11	4702	January 6, 2022
Influx unable to write to disk influxdb , docker	2	86	October 15, 2024

Shard file created but not written

Related topics