Read for every write with high cardinality data

influxdb

#1

I’m new to Influx DB, and I’ve looking at using Influx DB for the following application:

  • Events are generated for users, of which there can be millions (let’s say up to 30 million)
  • We’d like to filterthe events such that only some events actually get sent on to the user as an alert. For example max/min thresholds for events per day/hour is the most typical kind of filter.
  • The number of events per user per day would vary - a vast majority of users would generate just a few events (say less than 30), whereas a few users might generate hundreds or even thousands of events per day.
  • This requires a search (extremely simple, no regex) for every event written to the DB, to check if we should send “this” event on to the user. That would typically be based on the number of events matching the query. The searches would mostly be “count” aggregations (with a few that would require a select).

There are a few potential issues with using Influx DB for this scenario that I’d like to get some opinions on:

  • Since username is unique and forms the basis for every search we’d definitely want make that a tag. Everything else would be a field, which means worst case looking through a couple of thousand events for a search. I understand that the new TSI feature will help greatly here.
  • Is Influx capable of handling a (simple) search for every write? I’m not sure if a workload like that is part of the design.
  • Is there any way to shard based on the tag (username)? That would greatly help with scalability.

Any thoughts or comments would be appreciated!