Performance Tuning for High Cardinality

wwarby · January 11, 2021, 2:45pm

I’m looking to use InfluxDB for a use case that will have very high cardinality. I don’t think there is any way around that in terms of schema design - the scenario calls for queries that look something like this (written a SQL because it’s what I’m familiar with):

SELECT SUM(TotalActivities)
FROM Table
WHERE ClientId = "Some Client ID"
    AND Time >= "2020-12-01"
    AND Time <= "2020-12-31"
GROUP BY CustomerId

The database will have hundreds of millions of records, and there will be millions of unique CustomerId values and hundreds of ClientId values.

Initial experiments even with small amounts of data have shown that performance is unacceptable on this type of query with a few tens of thousands of CustomerId values when CustomerId is configured as a tag. If I configure CustomerId as a field and group by some other tag, performance is good, but that doesn’t meet my requirements - it just shows that high cardinality is the source of my performance issues.

I’ve tested the same scenario in KDB+ which performs roughly 40x better for the same query when the schema is properly optimised, but I know KDB+ quite well and an completely new to Influx so it might not be a fair comparison.

I’m using InfluxDB 2, latest version as of yesterday. We’re about to give up and look at alternative solutions, but before I do I wanted to ask if we might have missed anything simple that could be configured to improve performance on this kind of query. Below is the query I wrote in flux, which took about 20 seconds to execute with around a 30,000 “points” in the database that would resolve to something like 20,000 unique combinations of tags.

from(bucket: "activity")
  |> range(start: 2020-12-01, stop: 2020-12-31)
  |> filter(fn: (r) =>
    r._measurement == "AggregatedActivities" and
    r.ClientId == "some_guid" and
    (r._field =~ /Total/)
  )
  |> sum()
  |> group(columns: ["CustomerId"])
  |> pivot(rowKey:["_start"], columnKey: ["_field"], valueColumn: "_value")
  |> yield()

This query yields the correct results though I can’t attest to it’s efficiency.

Note: I realise this scenario is not really using Influx for what it’s designed to be efficient at. Most of what we want to use it for actually is fairly typical time-series queries but we do also need to produce headline stats from our data and if we can’t make this query run performantly there is no point in us continuing.

Anaisdg · January 11, 2021, 5:25pm

Hello @wwarby,
One way you could increase efficiency on that query is to use a task to assign a new tag to that unique 20,000 tag combinations.
After doing that, you could quickly query that subset of data by filtering for this new tag that identifies that data subset.

scott · January 11, 2021, 5:26pm

@wwarby Are you using InfluxDB 2.0 OSS or InfluxDB Cloud?

Anaisdg · January 11, 2021, 5:29pm

@wwarby also cardinality will be a thing of the past…I encourage you to take a look at the future of InfluxDB

wwarby · January 11, 2021, 5:47pm

I’m using the open source version with a view to licensing the product in the longer term once we get to the point that we really need HA/ replication etc.

wwarby · January 11, 2021, 6:06pm

Ah I see - that makes sense but unfortunately the example I gave is just one of many - there would actually be hundreds of permutations of group/filter tag combinations in the real world so I don’t think that approach will work for me. I had a read of that article you posted which sounds really positive. I guess it’ll be quite a while before it’s generally available but InfluxDB might be a better fit for my use case at that point.

Topic		Replies	Views
Tags with high cardinality	8	5667	October 31, 2020
High cardinality & boolean values	3	2501	April 30, 2017
Optimize InfluxDB for 2.8 billion series and more InfluxDB 2 influxdb	6	941	April 27, 2021
Best practices for storing high-velocity data InfluxDB 2 influxdb	2	1337	January 26, 2021
Cardinality and Data Series Store influxdb	3	540	November 12, 2019

Performance Tuning for High Cardinality

Related topics