At work we’re working on migrating some logging tasks to InfluxDB 2.0. We have multiple machines logging many values.
Per machine we log
~80 fields at a rate of 500Hz or 1000Hz
~50 fields at a rate of 50Hz
~50 fields at a rate of 0.2Hz
We need the high granularity on those 80 fields, so downsampling is not an option.
The ‘schema’ I’m currently testing with has all data in one single bucket, using the machine name as the measurement and 4 different tags in each record. One of the tags being a ‘job id’, of which each machine generates a few per day.
Pushing data into the database seems to be fine and doesn’t require too many resources, but querying the database seems to sometimes be problematic from a performance/memory usage stand point. Some queries cause high memory usage, causing the occasional OOM killer to spring into action. I think the high cardinality might be a cause of this high memory usage, but I’m not sure what the best way is to avoid it.
What would be a more optimal schema to store this kind of high velocity data in?
Would it be advisable to maybe use that ‘job id’ tag as the measurement instead? Or maybe create one bucket per machine or even per job?
Any help would be greatly appreciated!