Hi,
we’re using InfluxData version 1.3.6, running on a AWS EC2 r4.4xlarge
We have an high cardinality use case, so we want to use “tsi1” engine instead of “inmem”.
If we use the old engine and run:
show tag values from "store" with key=rankType
The output is
name: store
key value
--- -----
rankType TOP_FREE
rankType TOP_GROSSING
rankType TOP_PAID
Which is expected.
If we use the new engine (by adding index-version = “tsi1” to the configuration and restarting influx), on a new database and insert the data again, the output is the following:
name: store
key value
--- -----
rankType TOP_FREE
rankType TOP_GROSSING
rankType TOP_FREE
rankType TOP_PAID
rankType TOP_FREE
rankType TOP_GROSSING
rankType TOP_FREE
This is just an example tag key with a small number of values from our db, for the sake of readability, we have other tags with thousands of values.
I haven’t been able to create a small reproducer for this, but thought that someone might have hit something similar and decided to ask here, but it’s consistent when we start inserting our data.
I’m aware of this issue TSI branch has duplicate tag values · Issue #8443 · influxdata/influxdb · GitHub which looks similar, but from what I managed to understand this was already released on 1.3.0.
UPDATE:
Tried a few different versions and the issue doesn’t happen on 1.3.0. It was introduced with 1.3.2. I’m guessing it is probably related with this PR: Improve performance of SHOW TAG VALUES.
I’m currently trying to setup influx dev env on my machine so that I can try to debug the issue.
Thanks in advance