Query Performance is Poor vs InfluxDB2 - What am I missing?

Hello,

I am in the process of migrating my data from InfluxDB2 to a newly installed InfluxDB3 instance on the same hardware. The server is a Dell T710 with dual Xeon processors (16 cores) and 48Gb of RAM. Migrating my data went very well, no issues. Then I tried a simple query to get a feel for the performance and the result was very poor. I can only assume I have something setup incorrectly so I’m hoping someone here can point be in the right direction.

Test table is named “powermonitor_realpower”. It consists of a tag named “channel” which will be a value between 0 and 9. It also has a tag named “date” which I am not using. Of course it has a “time” column and a “value” which is a floating point number. This table has a row for each channel (0-9) every minute.

My query is a simple count:

influxdb3 query --database Power "SELECT count(*) FROM powermonitor_realpower WHERE time > '2025-05-01T00:00:00-04:00' AND channel IN ('0', '1')"
+----------+
| count(*) |
+----------+
| 692500   |
+----------+

That’s not many rows so I would expect to see a response within a few seconds or so. Instead, it’s taking ~70-80 seconds to receive the result.

I don’t see anything in the logs that may indicator why it’s taking so long. In fact in one of the final log messages I see “end2end_duration_secs=1.50945614” which looks more like what I would expect but the client does not see the response for 70 to 80 seconds. Maybe someone here can help interpret these log entries and point me in the right direction? Since I just loaded a large amount of data to the database does it take time for it to get organized?

Dec 29 20:59:01 t710 python3[2444285]: 2025-12-30T00:59:01.103792Z  INFO influxdb3_server::http: handling query_sql database=Power query_str=SELECT count(*) FROM powermonitor_realpower WHERE time > '2025-05-01T00:00:00-04:00' AND channel IN ('0', '1') format=Pretty
Dec 29 20:59:01 t710 python3[2444285]: 2025-12-30T00:59:01.103855Z  INFO influxdb3_query_executor: executing sql query database=Power query=SELECT count(*) FROM powermonitor_realpower WHERE time > '2025-05-01T00:00:00-04:00' AND channel IN ('0', '1') params=None
Dec 29 20:59:01 t710 python3[2444285]: 2025-12-30T00:59:01.103939Z  INFO iox_query::query_log: query when="received" id=194140d7-16b9-428b-8e69-3abe84e7c377 namespace_id=7 namespace_name="Power" query_type="sql" query_text=SELECT count(*) FROM powermonitor_realpower WHERE time > '2025-05-01T00:00:00-04:00' AND channel IN ('0', '1') query_params=Params { } issue_time=2025-12-30T00:59:01.103932109+00:00 success=false running=true cancelled=false
Dec 29 20:59:02 t710 python3[2444285]: 2025-12-30T00:59:02.092858Z  WARN datafusion::physical_planner: Physical input schema should be the same as the one converted from logical input schema, but did not match for logical plan:
Dec 29 20:59:02 t710 python3[2444285]: Projection:
Dec 29 20:59:02 t710 python3[2444285]:   Filter: powermonitor_realpower.time > TimestampNanosecond(1746072000000000000, None) AND (powermonitor_realpower.channel = Dictionary(Int32, Utf8("0")) OR powermonitor_realpower.channel = Dictionary(Int32, Utf8("1")))
Dec 29 20:59:02 t710 python3[2444285]:     TableScan: powermonitor_realpower projection=[channel, time], partial_filters=[powermonitor_realpower.time > TimestampNanosecond(1746072000000000000, None), powermonitor_realpower.channel = Dictionary(Int32, Utf8("0")) OR powermonitor_realpower.channel = Dictionary(Int32, Utf8("1"))]
Dec 29 20:59:02 t710 python3[2444285]: 2025-12-30T00:59:02.188835Z  WARN iox_query::physical_optimizer::dedup::split: cannot split dedup operation, fanout too wide n_split=34662 max_dedup_split=100
Dec 29 20:59:02 t710 python3[2444285]: 2025-12-30T00:59:02.613027Z  INFO iox_query::query_log: query when="planned" id=194140d7-16b9-428b-8e69-3abe84e7c377 namespace_id=7 namespace_name="Power" query_type="sql" query_text=SELECT count(*) FROM powermonitor_realpower WHERE time > '2025-05-01T00:00:00-04:00' AND channel IN ('0', '1') query_params=Params { } issue_time=2025-12-30T00:59:01.103932109+00:00 partitions=0 parquet_files=0 deduplicated_partitions=0 deduplicated_parquet_files=0 plan_duration_secs=1.509076609 success=false running=true cancelled=false
Dec 29 20:59:02 t710 python3[2444285]: 2025-12-30T00:59:02.613086Z  INFO iox_query::query_log: query when="permit" id=194140d7-16b9-428b-8e69-3abe84e7c377 namespace_id=7 namespace_name="Power" query_type="sql" query_text=SELECT count(*) FROM powermonitor_realpower WHERE time > '2025-05-01T00:00:00-04:00' AND channel IN ('0', '1') query_params=Params { } issue_time=2025-12-30T00:59:01.103932109+00:00 partitions=0 parquet_files=0 deduplicated_partitions=0 deduplicated_parquet_files=0 plan_duration_secs=1.509076609 permit_duration_secs=7.6563e-5 success=false running=true cancelled=false
Dec 29 20:59:02 t710 python3[2444285]: 2025-12-30T00:59:02.613390Z  INFO iox_query::query_log: query when="success" id=194140d7-16b9-428b-8e69-3abe84e7c377 namespace_id=7 namespace_name="Power" query_type="sql" query_text=SELECT count(*) FROM powermonitor_realpower WHERE time > '2025-05-01T00:00:00-04:00' AND channel IN ('0', '1') query_params=Params { } issue_time=2025-12-30T00:59:01.103932109+00:00 partitions=0 parquet_files=0 deduplicated_partitions=0 deduplicated_parquet_files=0 plan_duration_secs=1.509076609 permit_duration_secs=7.6563e-5 execute_duration_secs=0.000279265 end2end_duration_secs=1.50945614 compute_duration_secs=7.8009e-5 max_memory=0 ingester_metrics.latency_to_plan_secs=0.0 ingester_metrics.latency_to_full_data_secs=0.0 ingester_metrics.response_rows=0 ingester_metrics.partition_count=0 ingester_metrics.response_size=0 success=true running=false cancelled=false
Dec 29 20:59:07 t710 python3[2444285]: 2025-12-30T00:59:07.010587Z  INFO influxdb3_wal::object_store: flushing WAL buffer to object store host="primary-node" n_ops=2 min_timestamp_ns=1767056100000000000 max_timestamp_ns=1767056346665791310 wal_file_number=64026
Dec 29 21:00:07 t710 python3[2444285]: 2025-12-30T01:00:07.013239Z  INFO influxdb3_wal::object_store: flushing WAL buffer to object store host="primary-node" n_ops=2 min_timestamp_ns=1767056160000000000 max_timestamp_ns=1767056406746139599 wal_file_number=64027

Thanks!

K

Well one problem I discovered after my post… Apparently InfluxDB3 does NOT like bulk loading a large amount of historic data. The result is it consumed all of available physical RAM and then all available virtual memory until it brought the entire server down.

I find it very concerning that it does not manage memory better! This must be a bug or design flaw. I tried setting a systemd service limit of 50% physical memory to leave something for the rest of the server to use but it continued to use all available virtual memory until the OOM Killer finally killed it. At least that protects the server but it can’t be good for the InfluxDB3 process to be abruptly killed and restarted repeatedly. I’m not sure exactly what it’s doing but the log shows thousands of “influxdb3_wal::object_store: replaying WAL file with details”

This may be the root cause of my problem so I’ll start fresh and load slower this time.

I deleted all my InfluxDB3 data and started fresh. This time I loaded my historic InfluxDB2 data slower (about 7 million rows). After writing the InfluxDB3 process was sitting at around 55% memory usage and from the logs I can see it’s primarily buffered WAL data. Then later in the day it started writing parquet files and it just fell apart. It started consuming all available memory again and getting killed by the OOM Kernel process every 5 to 7 minutes or so. :frowning:

This is what it’s doing when it consumes all the memory.

2025-12-30T16:28:28.481093-04:00 t710 python3[902473]: 2025-12-30T20:28:28.481011Z  INFO influxdb3_write::write_buffer::queryable_buffer: Persisting 100 rows for db id 2 and table id 1 and chunk 1734448200000000000 to file primary-node/dbs/2/1/2024-12-17/15-10/0000000888.parquet

This is where the OOM kills it. It’s consuming an incredible amount of memory apparently unconstrained.

2025-12-30T16:17:59.025375-04:00 t710 kernel: InfluxDB 3 Core invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
2025-12-30T16:17:59.025419-04:00 t710 kernel: oom_kill_process+0x118/0x280
2025-12-30T16:17:59.025718-04:00 t710 kernel: [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
2025-12-30T16:17:59.025724-04:00 t710 kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=influxdb3-core.service,mems_allowed=0-1,oom_memcg=/system.slice/influxdb3-core.service,task_memcg=/system.slice/influxdb3-core.service,task=influxdb3,pid=869740,uid=990
2025-12-30T16:17:59.025726-04:00 t710 kernel: Memory cgroup out of memory: Killed process 869740 (influxdb3) total-vm:80653864kB, anon-rss:41354632kB, file-rss:23040kB, shmem-rss:0kB, UID:990 pgtables:143200kB oom_score_adj:0
2025-12-30T16:18:04.055813-04:00 t710 kernel: oom_reaper: reaped process 869740 (influxdb3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
2025-12-30T16:18:04.072616-04:00 t710 systemd[1]: influxdb3-core.service: Failed with result ‘oom-kill’.

InfluxDB3 does not appears capable of handling a large amount of data over a relatively short time (i.e. bulk loading historic data). I’ve tried adjusting several of the memory related parameters and nothing seems to stop it from consuming all available memory.

Ideas? Or is this a flaw/bug I’m seeing?

I gave up… switched to QuestDB which so far is working much better.

Hi @KenR just returned from holidays so firstly Happy New Year! Are you using InfluxDB 3 Core or Enterprise ? InfluxDB 3 is well designed to handle large volumes of data and query the same however some settings need to be adjusted. For writing lot’s of data it’s best to batch writes instead of huge single files, tuning exec-mem-pool-bytes, lowering --force-snapshot-mem-threshold should also help. Let me know if you have any questions and you can also join us live in our technical office hour so we can trouble shoot over Zoom: InfluxData