I’m running InfluxDB 1.8.10 on a RPi and feeding sensor data to it from Node Red. I have been collecting data for around 1 year. When I originally constructed the database, I used a tutorial that suggested each sensor data type should be stored in a measurement resulting in quite a few measurements for just a small handful of sensors. I have just finished revamping my NR flows such that the data is now saved with a measurement containing multiple fields and some with a tag defining an ID or location.
Before I go down a deep rabbit hole and learn there is no solution, I was hoping to learn from the experts if there is a path for running a query to copy the data from the old measurements into the new ones?
The new sensors have the same fields as the original (plus a few more) but are grouped into the sensor rather than split as can be seen.
> select * from "Acurite-5n1" limit 10
time battery_ok humidity id moisture rain_in rain_increment_in temperature_F wind_dir_deg wind_mph
---- ---------- -------- -- -------- ------- ----------------- ------------- ------------ --------
2022-04-07T20:55:07.272667488Z 0 29 2825 83.7 4.736339775709854
2022-04-07T20:55:25.640486573Z 0 2825 96.35 0 135 4.736339775709854
2022-04-07T20:55:43.432422666Z 0 30 2825 83.3 4.736339775709854
2022-04-07T20:56:01.828863057Z 0 2825 96.35 0 67.5 4.736339775709854
2022-04-07T20:56:19.635236878Z 0 30 2825 83.1 1.650113338105464
2022-04-07T20:56:37.99026959Z 0 2825 96.35 0 112.5 3.193226556907659
> select * from "Blueline-PowerCost" limit 10
time Cost Rate battery_ok gap id impulses impulses_increment kW kWh kWh_increment temperature_F
---- ---- ---- ---------- --- -- -------- ------------------ -- --- ------------- -------------
2022-04-08T16:42:03.522951172Z 65533 10752 1.3184197274655516
2022-04-08T16:42:34.974036277Z 1 10752 48.7688
2022-04-08T16:43:06.986781313Z 10752 58243 1397.832
2022-04-08T16:43:07.473713977Z 65533 10752 1.3184197274655516
2022-04-08T16:43:38.932063387Z 65533 10752 1.3184197274655516
2022-04-08T16:44:10.953883312Z 65521 10752 1.3186611925947407
2022-04-08T16:44:42.899141013Z 35889 10752 2.4074228872356436
2022-04-08T16:45:14.991075231Z 28233 10752 3.0602486452024227
2022-04-08T16:45:14.993503166Z 28233 10752 3.0602486452024227
2022-04-08T16:45:46.919445123Z 1 10752 48.7688
A nudge in the right direction and perhaps a search term or two would be sufficient for now.
If all you need is a plain copy then there should be no issue, you just need to use INTO
It should look like:
SELECT _field_ AS "_FieldAlias_" INTO "Acurite-5n1" FROM "rtl_433/Acurite-5n1/A/battery_ok" WHERE _WhateverFilter_ GROUP BY *
- any field/tag in the select list will be interpreted as a field (a plain SELECT, without GROUP BY will only generate fields
- to migrate/maintain tags you must specify them in the GROUP BY
- you cannot “build” tags on the fly if you need to add a tag that does not exist the best approach is to export-edit-import the data (via CLI or client tools)
- Use aliasing to give the old fields proper names (to match the new/destination field)
- You might be able to rename tags (I’ve never tried), but it will at least require a subquery
- Plain SELECT with tag/field alias (inner query)
- SELECT INTO as in the sample, by explicitly grouping by the tags
I just switched to using tags. I don’t think they will help here so I may forgo that for my historic data. Thanks for the primer!
Out of curiosity, I assume this will enter the data from multiple sensors into the new database based on the time of the value such that instead of having one entry for temperature, humidity, wind, etc in a record, it will have several occurring fractions of a second apart (based on how long it took NR to process the MQTT and submit to Influxdb). I don’t think it matters for how the data is used, but does it have a significant difference in terms of storage space for the database? If not, I won’t worry about it. If so, I could export the database for preprocessing/linking to the same time prior to reinjecting via bullet 3.
That’s a good question and I’m not 100% sure about it, but I think it shouldn’t weigh more, the timestamp is part of the point key, but managing a huge amount of points efficiently is exactly what TSDBs are about, so I wouldn’t worry about it.
If you care about standardizing the timestamps (which is not a bad idea) that’s simple enough, you can just GROUP BY time(1s) or whatever in order to have all the data in the same moment. You will also need to use a groping function in the select list mean()/last()/whatever.
That’s a good thought. The data is transmitted in ~30s intervals, so 1s should be good enough to link all of the data in the same transmission.
About what’s the best interval, I suggest you try it out with some simple selects, worst case scenario with 1sec is to have a single point splitted in N, ie at 29,30,31 secs of a minute because of whatever delay in the processing.
That said it shouldn’t be an issue as the problem will always exist for points that are near the lower/upper window boundaries.
If the window is too narrow points might not be merged perfectly, if it’s too big you will lose/misplace some.
Just a note, the generated points will be placed at the beginning of their time window, in case of 30s points between 00 and 29.99 will be saved as mm:00, while the ones between 30 and 59.99 will be placed at mm:30
This is data from a weather station and power meter. Since values should not change significantly between readings I’m not worried about losing a point or two. At this time, my main interest is retaining the data from the last year. This might actually be a good time to consider how to handle reducing fidelity for longer term storage. I don’t really need to know what happened every 30s during a period from 10 years prior. I really only care about daily or hourly type results. I think I can use your suggestions to down sample to a lower fidelity.