Thanks! I’ve been testing out some stuff as well, I have found that adjusting the compact_throughput up I’ve only managed to crash the server (excessive memory usage). Although I was also playing around with the cache settings so that might also have factored in with the memory problems I created.
Another look at the config file made me think that maybe the actual culprit (for most of the dip/peaks) is the retention policy enforcer. The dip/peak cycles are exactly every 30 minutes, and the retention policy enforcer is set to run every 30 minutes. This can’t explain all of the dips and peaks - not all of them occur exactly every 30 minutes - but it can explain most of them.
Whenever the retention policy enforcer runs I get these logging:
Oct 09 08:30:22 influxdb influxd: ts=2019-10-09T06:30:22.784600Z lvl=info msg=“Retention policy deletion check (start)” log_id=0ILzUuF0000 service=retention trace_id=0INoE3G0000 op_name=retention_delete_check op_event=start
Oct 09 08:30:22 influxdb influxd: ts=2019-10-09T06:30:22.785102Z lvl=info msg=“Retention policy deletion check (end)” log_id=0ILzUuF0000 service=retention trace_id=0INoE3G0000 op_name=retention_delete_check op_event=end op_elapsed=0.524ms
Doesn’t seem to run excessively long or do any weird things, but the times that are logged by InfluxDB line up exactly with the dip/peaks I see in my write throughput.
While I was typing this InfluxDB seems to have done a few dip/peak cycles that weren’t exactly on the 30m mark, but I do see the following log at the exact time of one of the dip/peaks:
Oct 09 08:44:59 influxdb influxd: ts=2019-10-09T06:44:59.779626Z lvl=info msg=“Cache snapshot (start)” log_id=0ILzUuF0000 engine=tsm1 trace_id=0INp3a0l000 op_name=tsm1_cache_snapshot op_event=start
Oct 09 08:44:59 influxdb influxd: ts=2019-10-09T06:44:59.944076Z lvl=info msg=“Snapshot for path written” log_id=0ILzUuF0000 engine=tsm1 trace_id=0INp3a0l000 op_name=tsm1_cache_snapshot path=/var/lib/influxdb/data/mijnbaopt/forever/2421 duration=164.465ms
Oct 09 08:44:59 influxdb influxd: ts=2019-10-09T06:44:59.944134Z lvl=info msg=“Cache snapshot (end)” log_id=0ILzUuF0000 engine=tsm1 trace_id=0INp3a0l000 op_name=tsm1_cache_snapshot op_event=end op_elapsed=164.521ms
So maybe it’s the retention policy enforcer combined with the TSM cache snapshotter? Note that not every non-30m dip/peak has a cache logging associated with it, and cache logs also appear without the dip/peak in my write throughput.
What’s weird to me is that if that’s true, why would InfluxDB just stop accepting writes when it is doing some IO to disk?
Does this information help in any way?