Import tool tags vs fields


#1

Hi all,

I have experienced a strange issue, and I hope you guys can help me understand the reason of this.

I have created a huge dataset (500M+ datapoints), to import directly in Influx (running on a remote server!) with the -import command, using the CLI.

This dataset has 4 tags and a field.

I successfully inserted it, reaching an overall 224k PPS speed.

For testing purposes, I decided to re-load the same dataset in another db, making all tags fields (i.e. modifying the line protocol, from measurement,tagkeys=tagvals fieldkey=fieldval timestamp to measurement fieldkeys=fieldvals timestamp - note the missing comma after measurement, after the mod).

We all know that InfluxDB automatically indexes tags. So, my intuition was that, making all attributes field, ingestion speed would have been enhanced.

Instead, with my big surprise, I reached an overall 194k PPS speed, resulting in a big speed drop, the opposite effect I wanted to create.

Who to blame? Maybe network issues this afternoon that altered my tests? I will certainly repeat them when possible, to make this clear. In the meanwhile, does anybody have any clue on this behavior??

Thanks,
Luca

P.S.: test queries ran worse too. Ok for tag-based queries (no tags/indexes anymore!), but time-ranged and field-based queries should have had similar runtime, which is not.


#2

I have done same ingestion tests using a LAN network, rather than my company one, in order to have results that are not massively affected by network issues.

Results were the same: ingestion speed is way bigger when using tags, rather then setting all tags as field (i.e. not indexing them).

I have noticed that the db1 size (where db1 is the db used for the no tag insertions) is HALF the size of db0 (where db0 is the db used for the first ingestion, i.e. using tags).

This strange behavior can then show because of compression matters only???

I’ve tried to insert only string fields too, and the speed is lower than using normal fields.


#3

You might get slightly different results if you switch from TSM in-memory to TSI disk-based indexes.

Tags are always read as string. You might get different timing if the tags you changed to fields are interpreted as int/float.

I don’t know how the internals work but maybe it only stores the tag values once and stores a reference to that value afterward.


#4

Hi @samaust,

It’s exactly as you said. Tags are stored only once per value, and then referenced for each value.

Weird fact is that I discovered this paramount aspect in the ‘hardware guides’ section of the docs.