Influx has too many threads and killed by OOM when he try to execute too many queries, but should skip them

InfluxDB OSS version: InfluxDB 2.0.7 (git: 2a45f0c037) build_date: 2021-06-04T19:17:40Z (running in docker container)
Hardware: 64Gb RAM, 2Tb SSD (aws ec2 m5.4xlarge)


My Influx 2.0 running in docker container and has OOM when I try to execute hundred queries from many processes.
I’ve tried to change influxd configuration fields: 2 concurrent queries limit, 1 query queue size, 100000000 query memory bytes limit, 10000000 query initial memory bytes. But Influx still create too many influxd threads (when he should just skip all queries and put in queue just one of them) and use too many RAM and will be killed just in seconds after queries has started.

Redefined fields in config:

query-concurrency: 2
query-initial-memory-bytes: 10000000
query-max-memory-bytes: 3200000000
query-memory-bytes: 100000000
query-queue-size: 1

Result of influxd print-config:

assets-path: ""
bolt-path: /root/.influxdbv2/influxd.bolt
e2e-testing: false
engine-path: /root/.influxdbv2/engine
feature-flags: {}
http-bind-address: :8086
http-idle-timeout: 3m0s
http-read-header-timeout: 10s
http-read-timeout: 0s
http-write-timeout: 0s
influxql-max-select-buckets: 0
influxql-max-select-point: 0
influxql-max-select-series: 0
key-name: ""
log-level: info
metrics-disabled: false
nats-max-payload-bytes: 1048576
nats-port: -1
no-tasks: false
pprof-disabled: false
query-concurrency: 2
query-initial-memory-bytes: 10000000
query-max-memory-bytes: 3200000000
query-memory-bytes: 100000000
query-queue-size: 1
reporting-disabled: false
secret-store: bolt
session-length: 60
session-renew-disabled: false
storage-cache-max-memory-size: 200000000
storage-cache-snapshot-memory-size: 100000000
storage-cache-snapshot-write-cold-duration: 10m0s
storage-compact-full-write-cold-duration: 4h0m0s
storage-compact-throughput-burst: 50331648
storage-max-concurrent-compactions: 0
storage-max-index-log-file-size: 1048576
storage-retention-check-interval: 30m0s
storage-series-file-max-concurrent-snapshot-compactions: 0
storage-series-id-set-cache-size: 0
storage-shard-precreator-advance-period: 30m0s
storage-shard-precreator-check-interval: 10m0s
storage-tsm-use-madv-willneed: false
storage-validate-keys: false
storage-wal-fsync-delay: 0s
store: bolt
testing-always-allow-setup: false
tls-cert: ""
tls-key: ""
tls-min-version: "1.2"
tls-strict-ciphers: false
tracing-type: ""
vault-addr: ""
vault-cacert: ""
vault-capath: ""
vault-client-cert: ""
vault-client-key: ""
vault-client-timeout: 0s
vault-max-retries: 0
vault-skip-verify: false
vault-tls-server-name: ""
vault-token: ""

Query template (each query has 30 minute len):

from(bucket: "{bucket}") 
  |> range(start: {start}, stop: {stop})
  |> filter(fn: (r) => r["_measurement"] == "{measurement}")
  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
  |> filter(fn: (r) => r["{tag1}"] == "{value1}")
  |> filter(fn: (r) => r["{tag2}"] == "{value2}")

Cardinality: 435, number has calculated with flux request:

from(bucket: "buck")
  |> range(start: -1y)
  |> last()
  |> toString()
  |> group()
  |> count()

5 second after Influx has started requests processing (and 5 second right before Influx killed by OOM):

Hello @Admin_Topflow,
Welcome. I don’t know how to configure skipping and queuing queries. let me ask the team. Thanks.

Hi @Admin_Topflow !

Are your other process running queries getting any kind of error response before influxd crashes? Or are all the queries completely successfully until the server crashes?

Also I’m curious to learn a little bit more about the kind of data you are querying so that I can reproduce this kind of OOM crash. Would it be possible to get a representative sample of the data you are using?


Is there any update about queries scheduling?