Move data from old to new sensors

I’m running InfluxDB 1.8.10 on a RPi and feeding sensor data to it from Node Red. I have been collecting data for around 1 year. When I originally constructed the database, I used a tutorial that suggested each sensor data type should be stored in a measurement resulting in quite a few measurements for just a small handful of sensors. I have just finished revamping my NR flows such that the data is now saved with a measurement containing multiple fields and some with a tag defining an ID or location.

Before I go down a deep rabbit hole and learn there is no solution, I was hoping to learn from the experts if there is a path for running a query to copy the data from the old measurements into the new ones?

old:

rtl_433/Acurite-5n1/A/battery_ok
rtl_433/Acurite-5n1/A/humidity
rtl_433/Acurite-5n1/A/rain_in
rtl_433/Acurite-5n1/A/rain_increment_in
rtl_433/Acurite-5n1/A/rain_increment_in_test
rtl_433/Acurite-5n1/A/temperature_F
rtl_433/Acurite-5n1/A/wind_dir_deg
rtl_433/Acurite-5n1/A/wind_mph
rtl_433/Blueline_PowerCost_Monitor/Cost
rtl_433/Blueline_PowerCost_Monitor/battery_ok
rtl_433/Blueline_PowerCost_Monitor/gap
rtl_433/Blueline_PowerCost_Monitor/impulses
rtl_433/Blueline_PowerCost_Monitor/impulses_increment
rtl_433/Blueline_PowerCost_Monitor/kW
rtl_433/Blueline_PowerCost_Monitor/kWh
rtl_433/Blueline_PowerCost_Monitor/kWh_increment
rtl_433/Blueline_PowerCost_Monitor/temperature_F

new

Acurite-5n1
Acurite-Tower
Blueline-PowerCost

The new sensors have the same fields as the original (plus a few more) but are grouped into the sensor rather than split as can be seen.

> select * from "Acurite-5n1" limit 10
name: Acurite-5n1
time                           battery_ok humidity id   moisture rain_in rain_increment_in temperature_F wind_dir_deg wind_mph
----                           ---------- -------- --   -------- ------- ----------------- ------------- ------------ --------
2022-04-07T20:55:07.272667488Z 0          29       2825                                    83.7                       4.736339775709854
2022-04-07T20:55:25.640486573Z 0                   2825          96.35   0                               135          4.736339775709854
2022-04-07T20:55:43.432422666Z 0          30       2825                                    83.3                       4.736339775709854
2022-04-07T20:56:01.828863057Z 0                   2825          96.35   0                               67.5         4.736339775709854
2022-04-07T20:56:19.635236878Z 0          30       2825                                    83.1                       1.650113338105464
2022-04-07T20:56:37.99026959Z  0                   2825          96.35   0                               112.5        3.193226556907659
> select * from "Blueline-PowerCost" limit 10
name: Blueline-PowerCost
time                           Cost Rate battery_ok gap   id    impulses impulses_increment kW                 kWh      kWh_increment temperature_F
----                           ---- ---- ---------- ---   --    -------- ------------------ --                 ---      ------------- -------------
2022-04-08T16:42:03.522951172Z                      65533 10752                             1.3184197274655516
2022-04-08T16:42:34.974036277Z           1                10752                                                                       48.7688
2022-04-08T16:43:06.986781313Z                            10752 58243                                          1397.832
2022-04-08T16:43:07.473713977Z                      65533 10752                             1.3184197274655516
2022-04-08T16:43:38.932063387Z                      65533 10752                             1.3184197274655516
2022-04-08T16:44:10.953883312Z                      65521 10752                             1.3186611925947407
2022-04-08T16:44:42.899141013Z                      35889 10752                             2.4074228872356436
2022-04-08T16:45:14.991075231Z                      28233 10752                             3.0602486452024227
2022-04-08T16:45:14.993503166Z                      28233 10752                             3.0602486452024227
2022-04-08T16:45:46.919445123Z           1                10752                                                                       48.7688

A nudge in the right direction and perhaps a search term or two would be sufficient for now.

If all you need is a plain copy then there should be no issue, you just need to use INTO

It should look like:
SELECT _field_ AS "_FieldAlias_" INTO "Acurite-5n1" FROM "rtl_433/Acurite-5n1/A/battery_ok" WHERE _WhateverFilter_ GROUP BY *

Note that:

  • any field/tag in the select list will be interpreted as a field (a plain SELECT, without GROUP BY will only generate fields
  • to migrate/maintain tags you must specify them in the GROUP BY
  • you cannot “build” tags on the fly if you need to add a tag that does not exist the best approach is to export-edit-import the data (via CLI or client tools)
  • Use aliasing to give the old fields proper names (to match the new/destination field)
  • You might be able to rename tags (I’ve never tried), but it will at least require a subquery
    1. Plain SELECT with tag/field alias (inner query)
    2. SELECT INTO as in the sample, by explicitly grouping by the tags

I just switched to using tags. I don’t think they will help here so I may forgo that for my historic data. Thanks for the primer!

Out of curiosity, I assume this will enter the data from multiple sensors into the new database based on the time of the value such that instead of having one entry for temperature, humidity, wind, etc in a record, it will have several occurring fractions of a second apart (based on how long it took NR to process the MQTT and submit to Influxdb). I don’t think it matters for how the data is used, but does it have a significant difference in terms of storage space for the database? If not, I won’t worry about it. If so, I could export the database for preprocessing/linking to the same time prior to reinjecting via bullet 3.

That’s a good question and I’m not 100% sure about it, but I think it shouldn’t weigh more, the timestamp is part of the point key, but managing a huge amount of points efficiently is exactly what TSDBs are about, so I wouldn’t worry about it.
If you care about standardizing the timestamps (which is not a bad idea) that’s simple enough, you can just GROUP BY time(1s) or whatever in order to have all the data in the same moment. You will also need to use a groping function in the select list mean()/last()/whatever.

That’s a good thought. The data is transmitted in ~30s intervals, so 1s should be good enough to link all of the data in the same transmission.

About what’s the best interval, I suggest you try it out with some simple selects, worst case scenario with 1sec is to have a single point splitted in N, ie at 29,30,31 secs of a minute because of whatever delay in the processing.
That said it shouldn’t be an issue as the problem will always exist for points that are near the lower/upper window boundaries.
If the window is too narrow points might not be merged perfectly, if it’s too big you will lose/misplace some.

Just a note, the generated points will be placed at the beginning of their time window, in case of 30s points between 00 and 29.99 will be saved as mm:00, while the ones between 30 and 59.99 will be placed at mm:30

This is data from a weather station and power meter. Since values should not change significantly between readings I’m not worried about losing a point or two. At this time, my main interest is retaining the data from the last year. This might actually be a good time to consider how to handle reducing fidelity for longer term storage. I don’t really need to know what happened every 30s during a period from 10 years prior. I really only care about daily or hourly type results. I think I can use your suggestions to down sample to a lower fidelity.

1 Like