How to estimate the storage capacity for wal and data directories?

tohlzhu · August 16, 2022, 7:26am

I’m testing for Influxdb2. I can see wal and data directories in /var/lib/influxdb/engine.

Do I need to point the wal and data directories to different disks to improve performance when using influxdb2, like for v1.x? The configuration file of influxdb2 only provides parameters for modifying the engine-path.

And, what factors influence the storage size of wal? How to estimate disk capacity for wal and data directories?

I can’t find any docs about this.

Thanks for help!

Anaisdg · August 16, 2022, 5:37pm

Hello @tohlzhu,
These HW guidelines roughly apply to 2.x as well I believe:

Something here might be useful (although I didn’t find anything, still worth knowing about):

Factors that influence the storage size of the wal:

high series cardinality
high ingest rate
lack of downsampling or automatically expiring old data
storing logs
excessively long measurement or tag names (this is a rare situation).

There’s this tool in 1.8:

Perhaps we should create an feature to make this available in 2.x.

I’ll also ask around. Thank you for your patience.

tohlzhu · August 17, 2022, 3:29am

Thanks for replying, this is helpful!

I’ve read the “Hardware sizing guidelines-> Bytes and compression” in docs for v1.8.
Can I assume the size of data directory can be calculated mainly based on points number, and wal directory is only related to write speed? In other words, the wal directory has nothing to do with the data volume of the data in the “data” directory, and the “old” data in the “wal” directory will be deleted naturally?
So, for a 4vCPU and 32GB memory instance, I can give wal a fixed small disk size, for example a 32GB disk and give a 4TB disk for data directory. Is this reasonable?

The doc also suggests to store wal and data on separate storage devices for heavy write load. Although I’m not facing this kind of scenario, I still want to know how to set separate wal and data paths on v2.x. Can you give me suggestions?

Anaisdg · August 17, 2022, 7:47pm

Helo @tohlzhu,
Yes if your series cardinality isn’t increasing, then you can.
If it is, then your WAL will increase as well (as I understand it).

Hmm I’m not sure how to separate wal and data paths on v2.x, let me ask around.

Anaisdg · August 17, 2022, 7:48pm

@tohlzhu jk the documentation for wal and data path and how to change it is here:

Blessings to the docs team <3
Thanks @scott and team

tohlzhu · August 18, 2022, 3:54am

Hi Anaisdg, I’ve read this “InfluxDB file system layout” page. It mentioned “use the engine-path configuration option” to change engine directory, which is the parent directory of wal and data. This is why I said “The configuration file of influxdb2 only provides parameters for modifying the engine-path.”, wal and data is still on the same disk, and will not be separated.

I suspect that v2.x can not separate the wal directory and the data directory by configuration. Is the documentation incomplete, or does it not have the ability to do this?

scott · August 18, 2022, 2:48pm

@tohlzhu InfluxDB 2.x doesn’t currently expose separate configuration options for data and wal paths, but you’re welcome to submit a feature request.

tohlzhu · August 19, 2022, 1:05am

@scott Thanks, anyway, that’s a solution.

Topic		Replies	Views
InfluxDB 1.8.9 - limiting wall-size before flush InfluxDB 1 influxdb , raspberry	1	1370	August 31, 2021
What determines when these InfluxDB /wal/ directory contents are moved to /data/? Store influxdb	3	3928	August 16, 2022
Hardware Sizing Guidelines for influxdb2? InfluxDB 2 influxdb	5	2805	January 4, 2022
How to measure disk space requirement per bucket Store influxdb	3	2113	July 29, 2020
Confused about where the data from databased is being stored influxdb	8	31064	July 26, 2019

How to estimate the storage capacity for wal and data directories?

Related topics