InfluxDB memory usages high

rguptarg · December 31, 2019, 9:56am

HI Team,

I have influxdb-1.7.8-1.x86_64 , and running on “Red Hat Enterprise Linux Server release 7.5 (Maipo)” .

Last few days we have notices Memory usage is suddenly high, these are hardware parameters :-

          total        used        free      shared  buff/cache   available

Mem: 94G 93G 439M 120M 661M 384M
Swap: 19G 3.6G 16G

Total cpu allocated :- 8

please suggest how we can control this issue.

Mert · January 1, 2020, 3:47pm

What did you change in last few days? Check if there is a Continuous Query running?

rguptarg · January 2, 2020, 6:39am

I am facing this issue long back, even I have raised a request on GIthub.

Currently 900+ Telegraf agents are reporting to InfluxDB.

I have checked “SHOW QUERIES” but no result

is there any other option to check long running query?

Giovanni_Luisotto · January 2, 2020, 8:06am

You can try the Telegraf Influxdb Input, it may give you more information about what’s happening in the db instance

rguptarg · January 3, 2020, 6:15am

HI, as suggested I have enable influx plugin in telegraf,
it’s look like normal utilization, but still we are facing high memory utilization on server , hence server will gone in hang-state.

please suggest, let me know if any other details required

Giovanni_Luisotto · January 3, 2020, 8:03am

You should try to understand what’s happening when the memory has 100% spikes (and I can’t help with that). is a continuous query running?
(btw 13min of avg query duration looks a lot to me)

Mert · January 3, 2020, 8:44am

Try in influx console:

SHOW CONTINUOUS QUERIES

13min for avg is query is very high. Are these write queries or read? You can visualize it too. You should figure it out which query/queries took long and why. Probably, those long queries are causing high memory usage.

rguptarg · January 3, 2020, 9:50am

Thanks for reply!!!

I am also trying to find out long running query on influx. whenever I have run these 2 query, it’s gives no record.
“SHOW QUERIES" and “SHOW CONTINUOUS QUERIES”

Is there any way to find query history or all executed query on influx

Giovanni_Luisotto · January 3, 2020, 10:01am

Have a look at the data related to the “Average Query Duration” that you see in Grafana, I think the measurement contains more than that, not sure with what detail though, if you are lucky it might contain the query text.

If you have no continuous query then somewhere in a report you have a huge query that might need some work. Query performance is also influenced by the retention policy settings (shard duration), but first you need to find what is causing the issue

can you provide the structure of the measurement about queries?

rguptarg · January 3, 2020, 10:56am

SHOW MEASUREMENTS on telegraf
name: measurements
name

cpu
cpu_util
disk
diskio
kernel
mem
mem_util
msr_atl_c360_agg_info_log
msr_error_code_info
net
net_response
oracle_session_longrunning
procCheck
processes
redis
redis_keyspace
swap
swap_util
system
win_cpu
win_disk
win_diskio
win_mem
win_net
win_services
win_swap
win_system

SHOW MEASUREMENTS on chronograf
name: measurements
name

alerts

SHOW MEASUREMENTS on _internal
name: measurements
name

cq
database
httpd
queryExecutor
runtime
shard
subscriber
tsm1_cache
tsm1_engine
tsm1_filestore
tsm1_wal
write

Giovanni_Luisotto · January 3, 2020, 11:22am

I meant which tags and fields are available in the measurement about queries (it should be the “queryExecutor” measurement), but you will see which one is used in the Grafana query…

This is what I have in the Internal db

Sadly, since no tags are available to further restrict your research for the guilty query, you will probably need to open some reports (or run whatever usually runs on influx), probably the most used (since the problem occurs often) and see which one requires like 13 minutes to load, from there analyze the queries.
maybe someone is querying a huge time range or there might be some heavy calculation…

let us know if you find something

rguptarg · January 3, 2020, 11:52am

Thanks for reply !!!

that 13 was showing last 24 hrs avrg.

can you please help me to know find out all executed query in influxDB.

means is there any way to get executed query history.

Giovanni_Luisotto · January 3, 2020, 1:12pm

3 options:

Run SHOW QUERIES while the cpu is at 100%, it will return all the query in execution
Check the Http request log, it tracks the requests and you can get the “query history” from its data. If not already configured have a look at the docs or at the section “2- Define Logging Settings” of this blog post (the blog itself)
Manually do what users or system do daily and check the response time and memory consumption

The best one probably is option 2.

Another useful setting in this case is " log-queries-after "

rguptarg · January 17, 2020, 5:35am

Thanks for reply,

I have observed few days InfluxDB logs and below are the analysis :-

As 900+ telegraf agents reporting to InfluxDB, so many POST request.
I can see GET request from Grafana server and few scheduled query, these all are SELECT query, so is it possible they will impact InfluxDB performance ?

• 38669 out of 58058 GET request from A server on 14 Jan 2020.
• 26004 out of 40489 GET request from A server on 13 Jan 2020.

  52185 out of 92094 GET request from A server on 15 Jan 2020.

  63644 out of 96290 GET request from A server on 16 Jan 2020.

Queries like :-
Jan 17 10:50:08 N2VL-PD-FLU01 influxd: [httpd] 10.5.98.200 - - [17/Jan/2020:10:50:08 +0530] “GET /query?db=telegraf&epoch=s&q=SELECT+mean%28%22used_percent%22%29+from+%22telegraf%22.%22autogen%22.%22mem%22+WHERE+%28%22time%22+%3E%3D+%272020-01-10T05%3A14%3A00.000000000Z%27+and+%22time%22+%3C%3D+%272020-01-10T05%3A18%3A59.599999999Z%27+AND+%22IP%22+%3D+%2710.56.4.52%27%29+GROUP+BY+%22IP%22 HTTP/1.1” 200 151 “-” “python-requests/2.21.0” 06ce1767-38e9-11ea-a512-005056b67104 148463
Jan 17 10:50:08 N2VL-PD-FLU01 influxd: [httpd] 10.5.98.200 - - [17/Jan/2020:10:50:08 +0530] “GET /query?db=telegraf&epoch=s&q=SELECT+%28100-mean%28%22usage_idle%22%29%29+from+%22telegraf%22.%22autogen%22.%22cpu%22+WHERE+%28%22time%22+%3E%3D+%272020-01-11T05%3A14%3A00.000000000Z%27+and+%22time%22+%3C%3D+%272020-01-11T05%3A18%3A59.599999999Z%27+AND+%22IP%22+%3D+%2710.135.0.235%27%29+GROUP+BY+%22IP%22 HTTP/1.1” 200 153 “-” “python-requests/2.21.0” 06cd8de1-38e9-11ea-a510-005056b67104 152424
Jan 17 10:50:08 N2VL-PD-FLU01 influxd: [httpd] 10.5.98.200 - - [17/Jan/2020:10:50:08 +0530] “GET /query?db=telegraf&epoch=s&q=SELECT+%28100-mean%28%22usage_idle%22%29%29+from+%22telegraf%22.%22autogen%22.%22cpu%22+WHERE+%28%22time%22+%3E%3D+%272020-01-11T05%3A14%3A00.000000000Z%27+and+%22time%22+%3C%3D+%272020-01-11T05%3A18%3A59.599999999Z%27+AND+%22IP%22+%3D+%2710.92.204.57%27%29+GROUP+BY+%22IP%22 HTTP/1.1” 200 151 “-” “python-requests/2.21.0” 06e28664-38e9-11ea-a521-005056b67104 25637

So can you please confirm, GET query impact the server performance or increase the load on influxDB ?

Giovanni_Luisotto · January 17, 2020, 9:33am

those 3 queries are simple and I doubt that this is the cause of the problem.
I can imagine those query are made by a grafana chart, all the query have the following pattern:

SELECT mean("used_percent") from "telegraf"."autogen"."mem" WHERE ("time" >= ''2020-01-10T05:14:00.000000000Z'' and "time" <= ''2020-01-10T05:18:59.599999999Z'' AND "IP" = '10.56.4.52') GROUP BY "IP"

You can run it yourself and check the response time but I doubt this is the problem

Topic		Replies	Views
Memory usage is remaining high after a finished query InfluxDB 1 influxdb	8	3361	October 5, 2021
High memory usage caused by CQ influxdb	4	1856	August 15, 2017
Suddenly InfluxDB memory consumption went high InfluxDB 2 influxdb , performance	3	1765	April 19, 2022
Influxdb keeps high memory after querying data for a long period of time InfluxDB 2 influxdb , query	0	464	May 28, 2020
High Swap usage on influxdb InfluxDB 1 performance	3	335	March 26, 2024

InfluxDB memory usages high

Related topics