Influxdb 2 beta 8: high CPU usage, and querys are stuck in queue

Several days ago, influxdb return many 400 queue length exceed while querying.
I found that query queue almost exhausted.

And CPU usage is higher than other instances.

I dump the cpu, goroutine,trace profiling and vars.

I think GC is very frequently, but I do not figure out why.

(trace file is too large and I can’t upload it.)

and zoom in:

goroutine.tar.gz (9.9 KB) profiles.tar.gz (174.3 KB) vars.txt (17.2 KB)

Hello @STRRL,
I believe some of these issues are being addressed. Have you tried updating to 2.0-rc 0? Im also asking someone from the storage team to take a look. Thank you.

Thanks @Anaisdg! I will try it.

Should I do something like rebuilding tsi after updating to 2.0-rc0?
Or just replace influxd binary then restart.

Hello @STRRL,
If restarting isn’t a problem/you don’t care about historical data…then I think that’s probably the fastest way.
Are you upgrading from 2.x to r.c? Or 1.x?

Thanks

I prepared a new machine for trying rc0, it works well so far. I will keep watching at least for 7 days or longer.

I will upgrade influxdb from 2.0 beta8 to rc0, and actually I could ignore historical data, so I think I do not need any other operations.

Thanks @Anaisdg !

Hi @Anaisdg, I meet another problem: many metrics like query_control_queueing_active, storage_tsm_files_disk_bytes, storage_compactions_active, etc are lost in influxdb 2.0 rc0.

I think it might be caused by changes of backend storage. These metrics are very useful and powerful for tunning and profiling influxdb. Will these metrics return back in the future?

@STRRL -

I believe you are right about the metrics. We had some swapping of the storage tier. Could you file a bug report here: Sign in to GitHub · GitHub and report on the metrics? I also think the metrics are useful. They might not be able to come back in the exact same form because of the storage tier changes.

Thanks for noticing this!

Let us know if you are still seeing the high cpu usage with rc.

1 Like

OK, I will make an issue.

I have not query on new instance so far, it needs time to prepare enough data points. I will test rc0 as soon as data is ready.

1 Like