Storing multidimensional data

juha-ylikoski · May 30, 2024, 1:52pm

Hi,

We have a sound like signal which we have transformed with FFT to form an array of values. This array is for us a logical single data point / measurement.

How would you recommend storing this data in influxdb (or do you recommend). The data itself is timeseries data and the time when it was taken is important. We are anyway going to use influxdb for storing single dimensional kpi values calculated from the same original data.

As far as I know, the line protocol does not directly support this (Line protocol | InfluxDB Cloud (TSM) Documentation).

I myself can see two ways of doing this:

Storing the data as comma separated string (I think this is awful way of doing it)
Storing the data within x fields (The array is going to have multiple hundreds of values and in some cases even up to 100 000 values).

(values inside arrays are integers if that affects anything)

Would it be simply better to use e.g. document database or blob storage to store these values?

Thanks
Juha

scott · May 30, 2024, 8:09pm

Correct, InfluxDB does not support array field types. InfluxDB v2, the following field types are supported:

string
float
integer
unsigned integer
boolean

These are the primary two methods I could think of as well, but they both have downsides.

Storing data as comma-separated string: This would be cumbersome and wouldn’t compress well over time, so, depending on the retention period for this data, storage will get more expensive over time. However, with Flux, you can convert the comma-separated string into an array at query time and operate on it as an array.
Storing the data within x fields: This would be very cumbersome to query and would result in very high cardinality. In InfluxDB Cloud, the field key is part of the series key, so the more fields you have, the higher the cardinality of your data. InfluxDB Cloud limits cardinality based on your subscription plan:
- Free plan: 10k series cardinality
- Usage-based plan: 1m series cardinality

The only other thing I can really think of is storing these arrays externally, but somehow be able to reference them through an API. That way, you could store a reference key as a field in InfluxDB and then map the arrays in at query time using the reference key. The downside here is that will add a lot of latency to your queries as those values are mapped in row by row.

juha-ylikoski · May 31, 2024, 6:04am

Thank you for confirming my original thoughts. We will think about this internally and are probably going to store it externally.

Topic		Replies	Views
Storage of sampled streaming data Store	1	977	August 15, 2018
Best way to store data InfluxDB 2 influxdb	1	438	February 7, 2023
How to store variable number of values in InfluxDB? Store influxdb	1	1521	September 12, 2018
Handling / uploading multi-dimensional timeseries data Store	1	792	December 20, 2021
Schema Design for IoT Metrics Store influxdb , time-series , iot , schema	9	7505	February 12, 2024

Storing multidimensional data

Related topics