How does the TSM file index help with queries that use tagsets

I was wondering as to how does an the index described in In-memory indexing and the Time-Structured Merge Tree (TSM) | InfluxDB OSS 1.8 Documentation help with a query like this “select avg(temperature.internal) where data_center = ‘10’”. From what I read in the above page the index has a key which is a measurement (temperature) , field (internal) and then a tag set (data_center=10,cpu_id=1).

So I visualize a key as a String like this :slight_smile:
temperate.data_center=10,cpu_id=1.internal and this points to the blocks that contain the data and the time intervals.

Now if a query like “select avg(temperature.internal) where data_center = ‘10’” is done wouldn’t it have to scan all the index entries belonging to temperature and grep for all entries starting with temperature and has data_center=10. Wouldn’t this be slow?

@salilsurendran The alternative is scanning the full contents of every shard/file block directly to see if they contain points that belong to the queried series. The index is a light-weight method for identifying which shards contain data relevant to the query and scanning data only from the relevant shards.

In-mem indexing is somewhat antiquated. If you’re able, you should switch to TSI indexing. It stores indexes in memory, but also persists them to disk. It’ll improve your startup times and drastically increase your cardinality ceiling.