How does the TSM file index help with queries that use tagsets

salilsurendran · May 3, 2021, 6:17pm

I was wondering as to how does an the index described in In-memory indexing and the Time-Structured Merge Tree (TSM) | InfluxDB OSS 1.8 Documentation help with a query like this “select avg(temperature.internal) where data_center = ‘10’”. From what I read in the above page the index has a key which is a measurement (temperature) , field (internal) and then a tag set (data_center=10,cpu_id=1).

So I visualize a key as a String like this
temperate.data_center=10,cpu_id=1.internal and this points to the blocks that contain the data and the time intervals.

Now if a query like “select avg(temperature.internal) where data_center = ‘10’” is done wouldn’t it have to scan all the index entries belonging to temperature and grep for all entries starting with temperature and has data_center=10. Wouldn’t this be slow?

scott · May 3, 2021, 7:39pm

@salilsurendran The alternative is scanning the full contents of every shard/file block directly to see if they contain points that belong to the queried series. The index is a light-weight method for identifying which shards contain data relevant to the query and scanning data only from the relevant shards.

In-mem indexing is somewhat antiquated. If you’re able, you should switch to TSI indexing. It stores indexes in memory, but also persists them to disk. It’ll improve your startup times and drastically increase your cardinality ceiling.

Topic		Replies	Views
What is indexed Store	3	573	August 6, 2020
Questions Regarding internals of Block and Index Store	0	519	July 13, 2018
Some questions regarding TSM > TSI1 shard conversion Store influxdb	5	6667	February 15, 2019
Need help in optimizing memory usage influxdb	5	611	June 9, 2021
How to find index mechanism my influxdb is using?	5	1861	March 12, 2020

How does the TSM file index help with queries that use tagsets

Related topics