How to store sparse device values with not-always-recent timestamps

influxdb
#1

Hello, I was wondering how can I design influxdb schema around my usecase in order to avoid bad surprires in the future.

My requirements:

  • I have a large number of devices (about 500) each one with 500 datapoints.
  • Every single datapont has a timestamp indipendent from the others in the device (meaning that the device’s datapoints are not sent together, so are not groupable by device with a single entry in the database) and will be created at most every second.
  • The datapoints will not be saved in a strict cronological order, but they will be sometimes few minutes in the past or few minutes in the future.
  • Sometimes (as an exception) some devices might disconnect from the network, so they (on reconnection) will fill back the value in the past that they have failed to send to the server (the past data will not go back more than a month).

Here an example of the measurements:

+-----------+--------------+-------------+-------+
| device_id | datapoint_id | timestamp   | value |
+-----------+--------------+-------------+-------+
| 1         | a            |  1429185600 | 5     | <== near present
+-----------+--------------+-------------+-------+
| 1         | b            |  1429185601 | 6     |
+-----------+--------------+-------------+-------+
| 1         | c            |  1429185602 | on    |
+-----------+--------------+-------------+-------+
| 2         | a            |  1429185601 | 9     |  <== near present but before some other already saved timestamps
+-----------+--------------+-------------+-------+
| 2         | b            |  1429185605 | 125   |
+-----------+--------------+-------------+-------+
| 2         | e            |  1429185500 | 70    |
+-----------+--------------+-------------+-------+
| 2         | e            |  1428074605 | 90    |  <== serveral days in the past
+-----------+--------------+-------------+-------+

The schema I want to use in InfluxDb uses only one measurement (as in table) and uses devices and datapoint id as tags (device_id, datapoint_id).

My concerns are:

  1. Is my design high cardinality proof or can be optimized in some ways?
  2. Should I be concerned about the mix of future and nearpast insertion of timestamps?
  3. Can datapoints inserted a month in the past be a problem performance wise?
  4. In order to avoid past-future datapoint mix problems an approach could be 1 “table” per datapoint? Is this approach viable?

Thank you!