How can I reduce the memory usage of InfluxDB 1.7.2

I’m running the TICK image on Digital Ocean. I set this up as a logging server for a web application I run. It worked great originally, but over time I noticed performance in Chronograph tanking. About a month ago my web application was unable to send cURL requests to Influx and I was getting all sorts errors. The influx process would sit at like 100% CPU utilization and connections would time out.

The first thing I tried was changing the indexes from inmem to tsi1. This fixed the CPU problem but now influx was using every last bit of available memory; I couldn’t even reboot the server from my SSH client.

I temporarily increased the memory capacity from 1GB to 2GB, but I don’t have any interest in paying an extra $5/mo for that. I can’t think of any reason why influx would need that much memory.

How can I reduce it’s memory usage? I have two main measurements with about 20M rows between them. The larger one has two continuous queries (by hour and by date). The smaller one does not have any. I have not configured any retention policies.

Would this be as simple as creating a retention policy that will truncate some of the older data? Perhaps keeping these raw tables at just a few million records would work? I’m more familiar with MySQL, so forgive my inexperience here. Thanks for the help!

What is your series cardinality? Sometimes this can make a difference in memory usage.

> show series cardinality
cardinality estimation
----------------------
18422

> show series exact cardinality
name: api_log (this one has 12M records)
count
-----
191

name: api_log_cq_date
count
-----
63

name: api_log_cq_hour
count
-----
64

name: otherservice_api_log (this one has 8M records)
count
-----
70937

name: small_a_api_log
count
-----
3676

name: small)_b_api_log
count
-----
2546

name: small_c_api_log
count
-----
4710

Hi @katy, just wanted to see if you had any other thoughts on this one. Thank you!

Thanks for your patience! I’m looking into it now. Your cardinality isn’t too high, so that should be okay. Sometimes, people limit the amount of memory InfluxDB can use because of this, but I’m chatting with the team to see what we can do.

Can you pmap the process @ziebelje ?

Important: I had to get this up and running so I dropped the measurement with 8M records (otherservice_api_log). This dropped my memory usage from something like 1.2GB to maybe 800MB. It’s risen a bit since then. I still have this server configured to 2GB of memory but I would like to bring it back down to 1GB.

pmap: https://pastebin.com/NZ9Awuuy

Hi @ziebelje,

According to pmap, InfuxDB is using 347MB of RAM, which is close to the 35.3% of the 908MiB of RAM in use, as seen in htop. I’d say the pmap and htop where at sightly different times, hence the small discrepancy.

It’s actually using 703M. The RES column from htop specifies the actual amount of physical memory in use. That is 77% of the in-use memory and 35% of the available memory.

I’m not sure how to interpret the pmap results. At the bottom it seems to indicate that nearly 3G is in use.

I know this is apples to oranges, but my MySQL server which has a 200GB table that is very actively used is sitting at 571M. My Influx server is almost 100% writes; I only open the Chronograph dashboard a couple times a day.

RES is the resident set size, which includes mmapped memory which is available to the kernel and other processes; it is not locked up by InfuxDB.

The total at the bottom of pmap shows 347MiB is used by InfuxDB

It’s important to remember, InfuxDB will use whatever memory is available to it in order to optimise reads and writes.

If you wish, you can try to create a cgroup to restrict it; but from what I can see - your install will run fine with 1GiB.

I’d recommend reducing it again and if your get lockups, send us a new pmap to inspect and we can request some additional debugging stuff

Thanks for the explanation. However, as soon as I open up Chronograph and run a query the memory spikes:

ptop: https://pastebin.com/JhuN0LYA

Interestingly, if I run the queries on that dashboard directly from the console only 200M of additional memory is used; so not sure what overhead Chronograph has.

I don’t really want to create a cgroup; I always wanted this to just be an easy logging solution that required very little configuration. If there’s no trivial way to specify a max memory usage in the Influx config, maybe I just need to reduce the cardinality or amount of data I’m storing.

This mostly seems to be an issue with this one measurement. The api_log measurement with 12M records performs incredibly well. I can read the past year of data rapidly and without high memory usage.

If I paste some more detailed information here would you be able to suggest some improvements to the schema? Is there anything specific you want to see?

OK. Let’s see if we can work this out. It is the weekend, so I won’t be able to reply super fast, but I will as I can.

Can you provide your schemas for the databases, including retention policies.

I’d like to know your ingestion rate too. Use the internal stats if you need.

Do you tweak any cache/Wal/or shard durations in your config?

Yeah I’ll be busy too as soon as my kids wake up; thanks for the help! I didn’t bother censoring anything below…it’s not really worth it at this point. :slight_smile:
Edit: They’re awake. Send help.

I log two things with Influx:

  1. Calls to my own API
  2. Calls to external APIs

Calls to my own API

  • Ingestion rate: 1.7/s
  • Currently 13M rows
  • Retention policy: autogen/none
  • Measurements:
    api_log (13M rows)
    api_log_cq_date (42 rows)
    api_log_cq_hour (10k rows)
  • The main measurement is large and performant; the CQs are not significant

show tag keys from api_log

name: api_log
tagKey
------
exception
from_cache
request_api_user_id

show field keys from api_log

name: api_log
fieldKey             fieldType
--
request_method       string
request_resource     string
response_error_code  integer
response_query_count integer
response_query_time  float
response_time        float
user_id              integer

select * from api_log order by time desc limit 5;

name: api_log
time                exception from_cache request_api_user_id request_method request_resource response_error_code response_query_count response_query_time response_time user_id
----                --------- ---------- ------------------- -------------- ---------------- ------------------- -------------------- ------------------- ------------- -------
1566037805878745000 0         0          1                   read_id        ecobee_sensor                        1                    0.0018              0.0019        4900
1566037805617531000 0         0          1                   read_id        ecobee_sensor                        1                    0.0003              0.0003        3849
1566037802126947000 0         0          5                   sync           thermostat                           34                   3.7502              4.2417        1
1566037794617212000 0         0          1                   read_id        ecobee_sensor                        1                    0.0003              0.0003        3849
1566037783664022000 0         0          1                   read_id        ecobee_sensor                        1                    0.0019              0.002         3831

Calls to external APIs

  • Ingestion rate: 1.9/s
  • Currently 1M rows
  • Retention policy: 30d (has only been running for 7)
  • Measurements:
    ecobee_api_log (1M rows)
    mailchimp_api_log (4k rows)
    patreon_api_log (25k rows)
    smarty_streets_api_log (5k rows)
  • All measurements use the exact same schema.
  • As you can see, only one of these measurements is significant
  • This data is not especially important

show tag keys from ecobee_api_log

name: ecobee_api_log
tagKey
------
api_user_id
exception
user_id

show field keys from “30d”.ecobee_api_log

name: ecobee_api_log
fieldKey     fieldType
--------     ---------
connect_time float
http_code    integer

select * from “30d”.ecobee_api_log order by time desc limit 5;

name: ecobee_api_log
time                api_user_id connect_time exception http_code user_id
----                ----------- ------------ --------- --------- -------
1566037833448908000 5           0.0138       0         200       3803
1566037833284941000 5           0.0138       0         200       3803
1566037833106300000 5           0.0137       0         500       3803
1566037832831375000 5           0.0141       0         500       3702
1566037832702936000 5           0.0137       0         500       3702

Also, I am using this image:

The only thing I changed was converting from inmem to TS1 indexes.