Getting Timeout [500] Errors Indefinitely When Writing and Querying in InfluxDB 1.8

Hi,
I am new to InfluxDB. I have installed InfluxDB v1.8 on EC2 (8 CPU, 32 GB RAM).

  • Shard Duration is 2 days, Duration is 30 days.
  • I am ingesting data with 20000-50000 cardinality having 3 tags and 7 fields. Points precision is in minutes.
  • I am able to ingest 20000 points/sec very easily from inside the EC2 box (not over internet). I have tried max 150,000 points/sec successfully. Points are distributed over last 2 days. Response time is usually less than 100ms.
  • I am able to aggregate data for last 2h (~50000 points) within 200ms.
  • Read and writes are working when doing seperately.

Now, I faced problems when:

  • I am querying data at 10 queries/sec and ingesting data at 10000 points/sec, both at the same time. I get 500 timeout errors for writes. Queries reponse goes to seconds (ms to 30 sec). Sometimes, whole EC2 is freezing. This happened frequently.
  • When I restarted EC2, I was able to run the above mentioned queries and wrties. But sometime later, I again faced the same issue.

What could be the reason for this issue that I am getting timeouts indefinitely.I don’t know this maybe due to somethings happening in InfluxDB behind the scenes. How can I avoid it? Any help is appreciated.