Ever-increasing RAM usage with low series cardinality

iot
time-series
influxdb
#1

Hi,

I’m just testing influxdb 1.3.5 for storing a small number (~30-300) of very long integer series (worst case: (86400)*(12*365) [sec/day * ((days/year)*12) * 1 device] = 378.432.000)

e.g. the number of total points would be for 320 devices: (86400)*(12*365)*320 [sec/day * ((days/year)*12) * 320 devices] = 121.098.240.000)

The series cardinality is low, it equals the number of devices. I’m using second-precision timestamps (that mode is enabled when I commit to influxdb via the php-API.
Yes, I really need to keep all the samples, so downsampling is not an option.

I’m inserting the samples as point-arrays of size 86400 per request sorted from oldest to newest. The behaviour is similar (OOM in both cases) for inmem and tsi1 indexing modes.

Despite all that, I’m not able to insert this number of points to the database without crashing it due to out of memory. The host-vm has 8GiB of RAM and 4GiB of Swap which fill up completely. I cannot find anything about that setup being problematic in the documentation. I cannot find a notice that indicates this setup should result in a high RAM usage at all…

Does anyone have a hint on what could be wrong here?

Thanks and all the best!
b-

#2

I found out what the issue most likely was:

I had a bug in my feeder that caused timestamps not being updated to lots of points with distinct values were written over and over again to the same timestamp/tag combination.

If you experience something similar, try double-checking each step in the pipeline for a time concerning error.

This was not the issue unfortunately, the ram usage rises nevertheless then importing more points than before.

#3

So far this worked best for me and brought influx to a moderate memory usage of ~3GiB

  1. Lower cache-snapshot-write-cold-duration to 10s during backfilling
  2. create default retention policy with long shard duration e.g. create database sensors with duration INF shard duration 5200w name longterm

This essentially locks the number of shard groups being created to one (expecting a daterange smaller than 100y). influxdb then manages the tsm files within the group itself (so you should not lose performance, It’s designed to work this way).