Hi,
I’ve just experienced the following problem for the second time.
My applications are hanging when attempting to query influxdb.
If I connect manually with the influx binary, I can see the TCP connection is established but I do not get a prompt. It just hangs (seemingly) indefinitely, with no timeout etc.
I restarted my application after the first occurence, thinking it was some sort of transient thing, but it has happened again today.
Nothing shows in the logs.
The systems have plenty of resources (disk/inodes/cpu/ram) - but it seems that there might be a deadlock somewhere.
Does anybody have any recommendations on how can i diagnose this further?
I’m using influxdb from the official docker image, and have been building this application for a couple of months without this issue. I’m using a python and the python influx library to read/write into influx from other containers using docker compose. Nothing significant has changed in the time this issues has arisen, apart from my application code which (as far as DB is concerned) is trivial. I have a ~30MB dataset. I write a handful of points every few minutes.
@robshep Can you share both the version of InfluxDB you are running as well as the logs during the time period?
thank you jack,
[I] 2017-07-06T09:23:00Z InfluxDB starting, version 1.2.2, branch master, commit 1bcf3ae74c6b9c4897dab68d513d056277eb24f7
[I] 2017-07-06T09:23:00Z Go version go1.7.4, GOMAXPROCS set to 4
Running from influxdb:1.2.2-alpine
Running in Docker version 17.06.0-ce, build 02c1d87 on OS X
The container is still running and I can exec into it etc.
Aparently new users cannot upload attachments, so i’ve uploaded it to pastebin.
https://pastebin.com/NwDdLdzj (It is gzipped text, then base64 encoded.)
It is still in it’s “hung” state, and i’ve been trying to work out what’s stuck but there are limited diagnostic tools in the alpine image
Thanks for any insight.
@robshep I’m going through these logs now. Can you upgrade to the most recent version of InfluxDB? We have fixed a couple of issues that might cause something like this. If the issue persists please send the a SIGQUIT
to obtain a stack trace while the process is hanging and share the output.
Thank you jack.
Apologies, I hadn’t realised that was a new release.
I’m running this release now.
Strangely, when I issued:
kill -QUIT 1
from inside the container, it actually restarted the whole docker composition - I think it must be a bug in docker for this to occur.
Even more strange, the influx log seems to show it starting before handling the SIGQUIT, then starting again.
Odd, take a look: https://pastebin.com/QYpSbdVz
this log starts where the other one left off at 16:02.
nothing else happened until I issued kill -QUIT 1
at 22:58:21
Cheers
Rob