How influxdb work with files (tsm,db,wal)?

Hi
Dear all
i have question about files that influxdb create on these path /var/lib/influxdb/ (tsm,db,wal)

-how influx manage size of these files?
-when influx decide to create new tsm file?
-does influx use a threshold value to create new tsm file ?

and etc …

2 Likes

There are two main root dirs for the wal and data. The files in the WAL (Write-Ahead-Log) are appended to as new writes and deletes arrive. This data is essentially write optimized in that it’s fast to write it, but not query it. TSM files are read optimized which means they can be queried efficiently, but not updated easily.

The series data for writes to the wal is also stored in an in-memory cache until the cache is snapshotted. When the cache is snapshotted, based on cache-snapshot-memory-size and cache-snapshot-write-cold-duration, a new level 1 TSM file is written and the WAL segments associated with that snapshot are removed. Each WAL segment is 10mb and rolls over to a new segment as it’s filled up.

TSM file are periodically compacted to more compressed forms and improve query performance. Once a certain number of TSM files at a particular level exist, they are compacted to the next higher level. When a file reaches 2GB in size, it will rollover into secondary TSM files.

When a shard goes cold as defined by compact-full-write-cold-duration a full compaction runs and compacts all the files in the shard into an optimal layout for querying. If there are no more changes to the shard, these files are no longer re-compacted.

There are more details about the storage engine in the docs.

4 Likes

Thank you dear @jason
1-would you please clarify this part of your sentence “The files in the WAL (Write-Ahead-Log) are appended to as new writes and deletes arrive”?
2-what is the relation between WAL, TSM? influx write on both of them? when write on each of them?
3-is it possible configure to use only WAL or only TSM for special scenario that required read or query faster ?
4-as you mention TSM split when reach to 2 GB but i saw more TSM files that split with lower size such as 40Mb, or 30Mb!
5-any technical document that describe it clearly or i need to get enterprise support to access more documents?
6-what is the tombstone files, are they temp files?

Thanks,

1-would you please clarify this part of your sentence “The files in the WAL (Write-Ahead-Log) are appended to as new writes and deletes arrive”?

Within the WAL dir, there is a dir per shard. When writes are received, are written to disk in the WAL. If you look in a shards WAL dir, you would see a files such as _00001.wal, _00002.wal, etc… These are WAL segments. Each time a write comes in, the write is appended to the current segment which is the file with the largest number (_00002.wal) in this example. When the segments hit the max segment size (10mb), they are closed and a new segment is opened.

2-what is the relation between WAL, TSM? influx write on both of them? when write on each of them?

The WAL is where incoming writes hit initially. As I mentioned before, this is a “write optimized” file structure that allows writes to be appended to the file. These writes are also maintained in an in-memory cache to support querying. When a snapshot compaction occurs, the values in the cache are written to a new TSM file and the associated WAL segments are removed.

TSM files are continually compacted into larger and more dense files. Once they are written, they are immutable and never updated. Compactions combine multiple TSM files into new ones.

3-is it possible configure to use only WAL or only TSM for special scenario that required read or query faster ?

No. If there are no writes coming in, nothing will be written to the WAL and only TSM files will be used.

4-as you mention TSM split when reach to 2 GB but i saw more TSM files that split with lower size such as 40Mb, or 30Mb!

Yes, if an individual TSM file reaches 2GB in size, we split it. Not all TSM files are 2GB and some may never reach that size. Compactions combines small less dense TSM files into more larger, denser files.

5-any technical document that describe it clearly or i need to get enterprise support to access more documents?

The docs I linked to earlier. There is also various docs (some outdated) and comments in the code.

6-what is the tombstone files, are they temp files?

Tombstones record deleted series keys/time ranges within TSM files. Since TSM files are immutable, we write a tombstone file for anything in that TSM file that is deleted. The next time that file is compacted, the deleted keys/time ranges are removed when writing the new TSM file.

1 Like

Hi @jason ,
Can you please help on below?

  1. How to change the path of data .tsm files? In my case it is getting stored at “C:\Windows\System32\config\systemprofile.influxdb\data\telegraf\autogen” in windows. I want to change this dir to different path.

Answered here: Confused about where the data from databased is being stored

1 Like

Thank you @dgnorton