Searching for libraries in influxdb2 to apply large data processing techniques


I generated virtual IoT data, and inserted into influxdb2 bucket.
The data I generated contains “timestamp”(collected time), “coordinates” (collected location), and some other detail information of the IoT device. (these features are all generated)

In detail:

  • I specify _field attribute like this → “Detail_info”: “”
  • and among the time attributes (_start, _stop, _time), I specify _time as .
  • Finally, in the case of duplication of timestamp, I put latitude and longitude as tags

In addition, all the points are stored in a single measurement in a single influxdb2 bucket.

After the insertion, I read points on the Influxdb2 UI like below,

However, as shown in the image above, I realized that the table columns have become individual groups based on their tags, and this has a bad effect on the performance of reads.
So I used the group() function to improve the read performance a little bit, but I also realized that using the built-in function as written in the influxdb2 documentation worsens the read performance.
This is the reason I thought it would be nice if I could show the same data to the client as the data stored in the actual bucket with the built-in function or specific query applied to the temporal storage (not in ssd or hdd).

So my questions are,

  1. if there is a library or package that implements the above mentioned temporal storage, I would like to know what it is,
  2. and if there is no such feature, I would like to get advice on implementing it using a specific language tool (e.g. pandas in python).

I realize that this is a different use than the original purpose of influxdb. However, I am doing this experiment to see if it can be used as a large data processing time series database (I will be using about 170 million data as experimental data).