Schema recommendation for data "grid"

Hi, I’m wanting to store information about several geographic areas that I’ve split up into a grid of 1m x 1m cells. The information stored for each cell changes over time (thus using a time-series database). Unfortunately, I have five grids, each measuring about 150m x 200m. Thus each grid has about 30,000 cells, and there are 5 grids (total 150,000 cells).

I’m trying to figure out the best schema for this kind of data.

Initially, I was thinking of using 3 tags: SiteName, Cell-x, Cell-y, with two fields for each measurement.
(Where “Cell-x” is the integrer cell number on the grid’s X-axis, and “Cell-y” is the integer cell number on the grid’s Y axis.

But then I’m wondering if I would be better to use 2 tags: SiteName and CellX_CellY (or something like that, combining the X and Y grid locations into one tag).

With these two methods, I would have the same total number of unique tag combinations (150,000), but they are split across either two or three tags. I don’t think I’m ever likely to want to search by column or by row, so don’t care that I might lose that ability by combining the X/Y tags into one. I will, however, want to be able to search by site.

Or is there a better way that what I’ve already considered?

Thanks
M

Hello @mjlee,
I recommend using InfluxDB v3.
InfluxDB 3.0 is up to 45x Faster for Recent Data Compared to InfluxDB Open Source | InfluxData.

The dataset in that benchmark had:

  • Cardinality: 160,000
    And we were able to write >4M points/second

Also you don’t differentiate between tags and fields in v3.
OSS should be available later this year :slight_smile:

Otherwise if you’re using the previous versions, I recommend saving tags for what you want to filter by and make the rest fields.

Sounds good. Thanks for the advice.