How is data internally stored? How to consider Line Protocol?

Hi all,

I am a new InfluxDB user, and, especially, a new InfluxDB features student.
I am interested in how data is actually stored:

  • I know that data is physically stored in TSM files; what can I find in their data blocks?? Is data still in Line protocol, or in some relative compressed form?
  • When writes come to Influx, what gets actually written into WAL files? just log information or actual data? If the latter, what role do caches have?
  • How do TSM files indexes work? I mean, the indexes in the index block present in each TSM file.
  • For each record in line protocol, timestamp is optional. If I omit it, how can i display time series data in a correct order?

By the way: my series data is stored in tabular CSV files. How can I easily convert them to line protocol? Do I have to create my own script, or there is some other way? I installed the whole TICK stack on my machine.

Thanks,
Luca.

Hi!

I would start by watching this video explaining some of the internals. I’m not sure if it covers all of your questions, but I think it’s a good start. :slight_smile:

1 Like

This series of blog posts on InfluxDB internals may also be interesting and probably summarizes what’s in the video linked by Katy.

If you’d like to go even deeper, check out the TSM design document and the code. I think there is also a TSI design document floating around somewhere, but I can’t find it right now. The code for TSI is here.

2 Likes

Thanks guys, shared links have been very helpful to me.