I work with a client that ingest roughly 1 billion events monthly from IoT devices.
This can be GPS positions, sensor readings etc.
Currently everything is streaming data in Kafka, but we are exploring the possibility to do timeseries analytics on this data too.
After our initial spike there are some confusion to how to properly store all this in InfluxDB.
There are currently ~200 000 devices with their own ID, and the number is expected to grow double over the next year.
To get started, we tried to store just GPS positions.
Measurement: position
Longitude float
Latitude float
DeviceID string TAG
Time timestamp
while trying to ingest this, Influx DB shuts down after a few hundred thousand records.
Complaining about that the number of values per tag is too high.
warn max-values-per-tag limit may be exceeded soon
{
"log_id": "0PkVo5A0000",
"service": "store",
"perc": "100%",
"n": 100096,
"max": 100000,
"db_instance": "foo",
"measurement": "position",
"tag": "simicc"
}
Are we going about this in the wrong way?
Are Tags supposed to be a finite, smaller number of things like colors, device types etc?
I also read this on the documentation page:
The measurement acts as a container for tags, fields, and the
time
column, and the measurement name is the description of the data that are stored in the associated fields. Measurement names are strings, and, for any SQL users out there, a measurement is conceptually similar to a table. The only measurement in the sample data iscensus
. The namecensus
tells us that the field values record the number ofbutterflies
andhoneybees
- not their size, direction, or some sort of happiness index.
So are we trying to do old-school tabular data where we shouldn’t here?
TLDR; are we doing it wrong, or is InfluxDB simply the wrong tool for this?