How to store sparse device values with not-always-recent timestamps

M_M_mTitle · April 27, 2018, 12:00pm

Hello, I was wondering how can I design influxdb schema around my usecase in order to avoid bad surprires in the future.

My requirements:

I have a large number of devices (about 500) each one with 500 datapoints.
Every single datapont has a timestamp indipendent from the others in the device (meaning that the device’s datapoints are not sent together, so are not groupable by device with a single entry in the database) and will be created at most every second.
The datapoints will not be saved in a strict cronological order, but they will be sometimes few minutes in the past or few minutes in the future.
Sometimes (as an exception) some devices might disconnect from the network, so they (on reconnection) will fill back the value in the past that they have failed to send to the server (the past data will not go back more than a month).

Here an example of the measurements:

+-----------+--------------+-------------+-------+
| device_id | datapoint_id | timestamp   | value |
+-----------+--------------+-------------+-------+
| 1         | a            |  1429185600 | 5     | <== near present
+-----------+--------------+-------------+-------+
| 1         | b            |  1429185601 | 6     |
+-----------+--------------+-------------+-------+
| 1         | c            |  1429185602 | on    |
+-----------+--------------+-------------+-------+
| 2         | a            |  1429185601 | 9     |  <== near present but before some other already saved timestamps
+-----------+--------------+-------------+-------+
| 2         | b            |  1429185605 | 125   |
+-----------+--------------+-------------+-------+
| 2         | e            |  1429185500 | 70    |
+-----------+--------------+-------------+-------+
| 2         | e            |  1428074605 | 90    |  <== serveral days in the past
+-----------+--------------+-------------+-------+

The schema I want to use in InfluxDb uses only one measurement (as in table) and uses devices and datapoint id as tags (device_id, datapoint_id).

My concerns are:

Is my design high cardinality proof or can be optimized in some ways?
Should I be concerned about the mix of future and nearpast insertion of timestamps?
Can datapoints inserted a month in the past be a problem performance wise?
In order to avoid past-future datapoint mix problems an approach could be 1 “table” per datapoint? Is this approach viable?

Thank you!

Topic		Replies	Views
Best schema design for sensor with time series in time series Store schema	0	663	June 25, 2019
Recommendation on schema design Store	0	744	May 10, 2018
Ask for help about schema design InfluxDB 1 schema	4	1132	July 25, 2021
Schema design for better performance InfluxDB 2 query	0	271	February 22, 2023
Storage of sampled streaming data Store	1	976	August 15, 2018

How to store sparse device values with not-always-recent timestamps

Related topics