InfluxDB evaluation scenario - memory/performance issue

qootec · July 13, 2020, 1:04pm

Does my message belong here or on slack? I don’t know, so deciding “here” is the right place.

Problem:

I’m facing performance and memory management issues for the evaluation scenario described below. Should we add memory, change config (TSM->TSI), change sharding? I don’t find the right answer.
Community to the rescue…

Environment:

InfluxDB 1.7.10 default Docker image
Hosted on Ubuntu 18.04.04LTS server
4-vCPU VM with 16GB of memory
Config: TSM index, retention 2w, shard duration 1d

Scenario:

We use the Python SDK to send data to Influx.
Each dataframe is written with this command:
dataframeclient.write_points(currentDataFrame,
measurement,
tags,
time_precision=‘ms’,
batch_size=5000,
protocol=‘line’)
Always the same measurement and the same (single) tag.
We are processing historic files that carry 5 minutes of data values.
We chop this information up, based on the data field’s name.
Each such dataframe contains:
- DateTimeIndex
- One float64 column with a given name: A, B, C etc
  The field name is one of the about 200 names our data values can have.
  So, we could be writing a block of 10000 rows for field A.
  Next, a block of 3000 rows for field B.
  Next, 2 rows for field C.
  Etc.
  Timestamps might overlap e.g. if a B-sample was taken on the same ms as an A-sample.
About 100.000 samples are written every minute.
The result is a sparse dataset.
- Some rows will contain only one field with a value.
- Some rows may contain many fields that have a value set.

Problems:

As indicated, we are uncertain how to find an optimal balance between memory / performance.
Currently we see a lot of memory usage.
Below a memory assignment of 8GB, InfluxDB just hangs after some time with all memory consumed.
I guess that is because of the in-memory nature of TSM.
Is TSI expected to be a solution? We only have 1 measurement and one tag (and about 200 fields in a sparse dataset).
Is InfluxDB not the best database for this kind of data structure?
Should we restructure our data or method of writing?

Thanks for your suggestions,
Johan

philjb · August 6, 2020, 9:41pm

@Johan_Scheepers

I appreciate your question and the information you have provided!

TSI and TSM files work together and are both required to store your data and query it. There is an option to have TSI be an entirely in-memory index, but you are more likely to run into memory exhaustion with this approach. Which are you using?

Both TSI and TSM files are mapped into memory when they are accessed. This memory usage will show up in the overall memory usage stats for InfluxDB. The actual golang heap usage is actually much smaller. The OS manages the memory mapped files and will use as much memory as it can, keeping files in memory for as long as possible to improve performance. How much data is stored in your database?

For your 1 measurement and 1 tag with 200 fields, this is quite reasonable as is 100k points per minute. What performance issues are you seeing? How is InfluxDB “hanging”? What query are you running when it hangs? Please share the query that is causing problems.

qootec · September 16, 2020, 3:31pm

Thanks for your reply. I have been putting this on hold for some time (holidays and other experiments).

The system has been changed a bit by now with respect to this post:
Two measurements now:

First: 1 tag (single value) and 1300 fields
Second: 1 tag (single value) and 15 fields
The rest of the properties is the same (sparse float matrix).

To point you to some of the issues we see:

Do you have ideas?

Thanks,
Johan (but not Scheepers, to whom you replied)

philjb · September 21, 2020, 4:28pm

Johan -

Let’s close this thread in favor of your other questions.

Topic		Replies	Views
High memory usage on TSI mode Store influxdb , schema , influxql , performance	5	4505	November 16, 2023
InfluxDB memory usage Store	0	3084	August 14, 2018
OOMKilled in InfluxDB, how to preperly set memory settings?	1	766	July 24, 2019
querying is slow and high memory usagewith TB of data	1	1376	May 24, 2018
Benchmarking Memory Usage: tsi1 and inmem Store influxdb	1	2358	November 23, 2018

InfluxDB evaluation scenario - memory/performance issue

Related topics