Influxdbv2 stopped working, error: nats no servers available for connection

Hi there,

i don’t know where to begin. I have a really simple setup. 1 Database for homeautomation / monitoring and 1 for system logs via telegraph just for fun. After several weeks and month of service, it basically stopped working, throwing errors that influx cannot establish http connections or something like that.
After a bit of googling, it was suggested to increase the ulimits for max open files which back then was only 1024. So i increased it and i didn’t get that error anymore yet another. Trying to revert back to the old state did not change the error. I also tried to move the folders around but that did also not help.

As of now it seems as if my data/engine folder, the actual database data is the problem. I moved the engine folder via environment variable and it booted up again, but of course my data is gone.

So here is my setup and the logs:

Logs via pastebin

docker compose setup:

 # INFLUXDB
  influxdb:
container_name: influxdbv2
image: influxdb:latest
restart: unless-stopped
#    user: 0:1001
ports:
  - $INFLUXDB_PORT:8086
volumes:
#      - $DOCKERDIR/influxdbv2/influxdb:/var/lib/influxdb
  - $DOCKERDIR/influx-test/data:/var/lib/influxdb2
  - $DOCKERDIR/influx-test/config:/etc/influxdb2
#      - /etc/ssl:/etc/ssl/
environment:
#      - DOCKER_INFLUXDB_INIT_MODE=setup
  - DOCKER_INFLUXDB_INIT_USERNAME=$INFLUXDB_ADMIN_USERNAME
  - DOCKER_INFLUXDB_INIT_PASSWORD=$INFLUXDB_ADMIN_PASSWORD
  - DOCKER_INFLUXDB_INIT_ORG=$INFLUXDB_ORG
  - DOCKER_INFLUXDB_INIT_BUCKET=$INFLUXDB_BUCKETNAME
  - INFLUXD_ENGINE_PATH=/var/lib/influxdb2/engine
  #    - INFLUXDB_HTTP_HTTPS_ENABLED=true
  - INFLUXD_TLS_CERT=/etc/influxdb2/influxdb-selfsigned.crt
  - INFLUXD_TLS_KEY=/etc/influxdb2/influxdb-selfsigned.key

ulimit -a

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 22941
max locked memory (kbytes, -l) 65536
max memory size (kbytes, -m) unlimited
open files (-n) 60000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 22941
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

/etc/security/limit.conf
> * soft nofile 60000
> * hard nofile 65535
> * hard nproc 65635

thanks!

Hi there,

Hi, 0.629 recurring…

i don’t know where to begin.

A good place to start is being clear and precise about the information you’re
trying to tell us :slight_smile:

I have a really simple setup.

1 Database for homeautomation / monitoring and 1 for system logs via
telegraph just for fun.

After several weeks and month of service, it basically stopped working,
throwing errors that influx cannot establish http connections or something
like that.

Sorry, but “or something like that” is not especially helpful when we’re
trying to identify the problem.

Do you still have a copy of exactly what those errors were (maybe in
historical log files)?

After a bit of googling,

I wonder what you Googled for and what you found.

it was suggested to increase the ulimits for max open files which back then
was only 1024. So i increased it and i didn’t get that error anymore yet
another.

So, what was the other error you got?

Trying to revert back to the old state did not change the error. I also
tried to move the folders around

Give us more detail about what you moved and where?

but that did also not help.

As of now it seems as if my data/engine folder, the actual database data is
the problem. I moved the engine folder via environment variable

Please explain what that means?

and it booted up again, but of course my data is gone.

Maybe we can help you find it again if you tell us what you moved, from where,
to where.

docker compose setup:

I don’t know whether this is significant, but I personally have no experience
of Docker, so if that is (part of) the cause of the problem, someone else will
need to comment on it.

ulimit -a

I’m not convinced that ulimits are the problem until I see what the error
message was and what advice you followed to deal with it.

Help us to understand what you did (if we can reproduce the problem oursleves,
that’s great) and we can try to help you deal with the problem.

Regards,

Antony.

Hi Anthony,
thanks for your response and sorry for my frustration. I have tried now for 3 days on my own to get this working but it wouldn’t work. Sadly this is not my first rodeo on the new 2.0 version of influx. Never had trouble with 1.x

I will try to provide a better explanation.

Sorry, but “or something like that” is not especially helpful when we’re
trying to identify the problem.

Do you still have a copy of exactly what those errors were (maybe in
historical log files)?
Since i cannot post my original logs because docker threw it out of the window, i can only refer to some other threads.

So my first error was this:

2014/02/27 00:00:25 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s

Just like posted here a few years back by someone else.
Another one else had the same problem in 2020 when using v2.0 and he said it was resolved increasing the maximum of allowed open files. Since i also only had a limit of 1024 max it made sense to try it. There were also other threads about it, but my browser history can’t find them.
Then i increased the ulimit size with some guide i cannot find anymore. It was basically like this. As mentioned above i tried many different things with the /etc/security/limit.conf file. I only increased it for the user, then also for root, then for user and docker group, then docker only etc. I did every combination and also did some reboots inbetween.
From the very first change i made influx started spewing out different errors, those you see in the initial logs. I wasn’t able to revert the status back to the original errors. Even when i rolled back to the original state of the /etc/security/limits.conf, the influxdb errors stayed the same and not reverted back.
So that brought me to this github post.
I picked up the idea from the initial post over there and tried to figure out if it might be a weird parameter issue, where i need to specifiy the path of this so called engine implicitly. The engine apparently is the data folder, so literally my database. So i tried some other locations for that folder and adding the parameter or removing it and doublechecking permissions of those folders. But that did not help either.
So what i meant with “the data is of course gone” was my realization, that this engine path is indeed the database data [data/, wal/] and of course i cannot move it, otherwise influx would not be able to open the database. So i am Starting to think that my database might be corrupted, but honestly, i have no ideas anymore.

Thanks

“Thanks for your response and sorry for my frustration. I have tried now for 3
days on my own to get this working but it wouldn’t work. Sadly this is not my
first rodeo on the new 2.0 version of influx. Never had trouble with 1.x”

In that case I too am going to bow out of trying to assist here, since I have
no experience of InfluxDB 2.0 (other than a very unexpected and unwelcome
automatic update by my package manager some months ago when the release tags
apparently got mixed up - fortunately I did nothing with my data files and was
able to revert to a comfortable 1.8 version again), and I hope that someone
else here with familiarity with InfluxDB 2.0 can help you in working out what
has happened and how to deal with it.

Best wishes,

Antony.

@17over27 would you mind updating your influxd deployment to pass --log-level debug, and pasting the resulting logs? As you’ve seen, this NATS issue has been lurking for awhile now. I haven’t been able to reproduce it myself so any logging you can provide would be a great help.

gladly, yet it does merely print one single debugging line