Import tool tags vs fields

luca15f · February 6, 2019, 10:25pm

Hi all,

I have experienced a strange issue, and I hope you guys can help me understand the reason of this.

I have created a huge dataset (500M+ datapoints), to import directly in Influx (running on a remote server!) with the -import command, using the CLI.

This dataset has 4 tags and a field.

I successfully inserted it, reaching an overall 224k PPS speed.

For testing purposes, I decided to re-load the same dataset in another db, making all tags fields (i.e. modifying the line protocol, from measurement,tagkeys=tagvals fieldkey=fieldval timestamp to measurement fieldkeys=fieldvals timestamp - note the missing comma after measurement, after the mod).

We all know that InfluxDB automatically indexes tags. So, my intuition was that, making all attributes field, ingestion speed would have been enhanced.

Instead, with my big surprise, I reached an overall 194k PPS speed, resulting in a big speed drop, the opposite effect I wanted to create.

Who to blame? Maybe network issues this afternoon that altered my tests? I will certainly repeat them when possible, to make this clear. In the meanwhile, does anybody have any clue on this behavior??

Thanks,
Luca

P.S.: test queries ran worse too. Ok for tag-based queries (no tags/indexes anymore!), but time-ranged and field-based queries should have had similar runtime, which is not.

luca15f · February 8, 2019, 2:32pm

I have done same ingestion tests using a LAN network, rather than my company one, in order to have results that are not massively affected by network issues.

Results were the same: ingestion speed is way bigger when using tags, rather then setting all tags as field (i.e. not indexing them).

I have noticed that the db1 size (where db1 is the db used for the no tag insertions) is HALF the size of db0 (where db0 is the db used for the first ingestion, i.e. using tags).

This strange behavior can then show because of compression matters only???

I’ve tried to insert only string fields too, and the speed is lower than using normal fields.

samaust · February 13, 2019, 5:25am

You might get slightly different results if you switch from TSM in-memory to TSI disk-based indexes.

Tags are always read as string. You might get different timing if the tags you changed to fields are interpreted as int/float.

I don’t know how the internals work but maybe it only stores the tag values once and stores a reference to that value afterward.

luca15f · February 13, 2019, 6:49am

Hi @samaust,

It’s exactly as you said. Tags are stored only once per value, and then referenced for each value.

Weird fact is that I discovered this paramount aspect in the ‘hardware guides’ section of the docs.

Topic		Replies	Views
Tags or Fields when there are many duplicated "tags" and "fields: influxdb	0	617	March 29, 2019
Data Storage Schema Store schema	4	61	August 26, 2024
Query returning wrong data Store influxql	10	1424	May 22, 2017
Question/advice how to setup influxdb tags/fields InfluxDB 2 time-series , performance	2	604	June 11, 2021
Writing queries with Python doesn't correctly add tags/fields Store schema	0	2530	June 15, 2018

Import tool tags vs fields

Related topics