Path to 1 Billion Time Series: InfluxDB High Cardinality Indexing Ready for Testing


#1

Originally published at: https://www.influxdata.com/path-1-billion-time-series-influxdb-high-cardinality-indexing-ready-testing/

One of the long-standing requests we’ve had for InfluxDB is to support a large number of time series. That is, a very high cardinality in the number of unique time series that the database stores. While we currently have customers with tens of millions of time series, we’re looking to expand to hundreds of millions and eventually billions. Today we’ve released the first alpha build for testing of our new time series indexing engine. With this new engine, users should be able to have millions of unique time series (our target goal when this work is done is 1 billion). The number of series should be unbounded by the amount of memory on the server hardware. Further, the number of series that exist in the database will have a negligible impact on database startup time. This work has been in the making since last August and represents the most significant technical advancement in the database since we released the Time Series Merge Tree storage engine last year. Read on for details on how to enable the new index engine and what kinds of problems this will open up InfluxDB to solve.

Before we get into the details we’ll need to cover a little bit of background. InfluxDB actually looks like two databases in one: a time series data store and an inverted index for the measurement, tag, and field metadata. The TSM engine that we built in 2015 and 2016 was an effort to solve the first part of this problem: getting maximum throughput, compression, and query speed for the raw time series data. Up until now the inverted index was an in-memory data structure that was built during startup of the database based on the data in TSM. This meant that for every measurement, tag key/value pair, and field name there was a lookup table in memory to map those bits of metadata to an underlying time series. For users with a high number ephemeral series they’d see their memory utilization go up and up as new time series got created. Further, startup times would increase as all that data would have to be loaded onto the heap at start time.

The new time series index or TSI moves the index to files on disk that we memory map. This means that we let the operating system handle being the LRU. Much like the TSM engine for raw time series data we have a write-ahead log with an in-memory structure that gets merged at query time with the memory-mapped index. Background routines run constantly to compact the index into larger and larger files to avoid having to do too many index merges at query time. Under the covers, we’re using techniques like Robin Hood Hashing to do fast index lookups and HyperLogLog++ to keep sketches of cardinality estimates. The latter will give us the ability to add things to the query languages like SHOW CARDINALITY queries.

Problems TSI solves and doesn't solve

The biggest problem the current TSI work is meant to address is that of ephemeral time series. We see this most often with use cases that want to track per process metrics or per container metrics by putting their identifiers in tags. For example, the Heapster project for Kubernetes does this. For those series that are no longer hot for writes or queries, they won't take up space in memory.

The issue that this doesn’t yet address is limiting the scope of data returned by the SHOW queries. We’ll have updates to the query language in the future to limit those results by time. We also don’t solve the problem of having all these series hot for reads and writes. For that problem scale-out clustering is the solution. We’ll have to continue to optimize the query language and engine to work with large sets of series. The biggest thing to address in the near term is that queries that hit all series in the DB could potentially blow out the memory usage. We’ll need to add guard rails and limits into the language and eventually spill-to-disk query processing. That work will be ongoing in every release.

Enabling the new time series index

First, you'll have to download the alpha build of 1.3. You can find those packages in the nightly builds:
https://dl.influxdata.com/influxdb/artifacts/influxdb_1.3.0-alpha1_amd64.deb
https://dl.influxdata.com/influxdb/artifacts/influxdb-1.3.0-alpha1_linux_amd64.tar.gz
The new time series indexing engine is disabled by default. To enable it you’ll need to edit your config file. Under the data section add a new setting:
index-version = "tsi1"
Then restart the database. If you already have data, all old shards will continue to use the in-memory index. New shards will use the new disk based time series index. For testing purposes it'll be best to start with a fresh database. To verify that you're using disk based indexing, do a few writes and look at your data/<database>/<retention policy>/<shard id> directory. You should see a subdirectory called index.

Conclusion

This work is still early. There is more work we have to do to tune the compaction process so that it requires less memory along with much testing and bug fixing (watch the tsi label to keep track). Users that enable this should do so only on testing infrastructure. It is not meant for production use, and we could update the underlying format of the TSI files in the coming few months, which would require users to blow away that data. However, we're very excited about the new possibilities that will be enabled with the database by this work. Users will be able to track ephemeral time series like per process or per container metrics, or data across a very large array of sensors.

I’ll be giving talks at PerconaLive on April 26th and DataEngConf on April 27th that go deep into the details of TSM and TSI.


#2

The new TSI concept which has been mentioned under this topic, I assume it will be either memory or disk based? Or is there a plan to have a mix of both e.g. keep a Cache of x amount of entries in memory and rest on disk (or even based on measurements/databases)


#3

@sbains It will keep reciently written to series in memory and the rest on disk. We will add additional index control as users request those features.


#4

I have personally done some testing on my current setup (High-cardinality) and it’s drastically has reduced the memory usage, I can’t wait to see this fully released it’s going to be a game changer!

Is there any ETA on this feature?


#5

@syepes Happy to hear that! We should be cutting the first release candidate for 1.3 sometime in the next week. We are aiming for a release mid to late June.

How was your experience with it? Did you run into any issues? What is your usecase where you have high cardinality?


#6

For the moment I have just tested the same ingestion workload that I currently have on 1.2.x and in term of memory it’s night and day.

The usecase: Two retention policies raw (12h) and agg (3y), data is ingested in the raw rp with a high cardinality as it contains UUID’s then several CQ’s downsample and aggregate the data on the agg rp.


#7

Is this already implemented/tested?