Help fixing in schema/data mess

Hi,

We have a running application in which we are storing values for heating controllers and their connected energy meters in InfluxCloud (v2). One of the fields we are tracking is the serial of the energy meter.

It turns out that either by our mistake or by the mistake on how the data is reported by the device (we have no way of knowing), we have cases of devices in which the serial was saved as a string and now it is being reported as numeric, and therefore, not written anymore.
We also have cases in which the serial was reported as numeric and still being reported as numeric and the field is still being written.

We would like to have all serials written as strings instead of numbers, so we need to update the data we have gathered so far and “fill the gaps” (write the string serial for the points that are missing the field because was/is being reported as numeric) for those serials that are not written anymore.

What is the best/clever/less painful way to fix this schema-data issue?

On top of this I am getting a hard time reproducing the case in a hosted v2 test instance, as the point that contains the type-mismatched data is not written altogether, whereas in the cloud instance, we experience that the mismatched field is not written, but the rest of the fields are. What could be the cause for this different behavior?

Thanks in advance

This is one of those problems that’s hard to fix…
I know what you can do in InfluxDB1, and as far as I know, it should be valid also for InflxDB2, I’ve never worked with InflxDB Cloud, therefore I don’t know if there are additional limitations

I’ll put here some references and a quick summary.

here is how InfluxDB behaves, and why you have different data types:

  • mixed data types for fields - FAQ link
  • changing data type - FAQ link, it also explains how to fin the time intervals with messed up data types

here is a post that had the very same problem (kind of unsolved as far as I can see).

The process is just one, Export - Delete - Insert back

  • Export the measurement data, the one you need to change if you can identify them
  • delete the “wrong” data points
  • Insert the data you previously exported (after fixing them ofc)

you may not need to actually export/download the data, you can write a query to select the data and manage/cast them to the proper data type, and write them to a temp/support measurement. (making a partial or full copy of the data)
Up to what you’ve chosen to do, you can drop the whole “old” measurement or just the wrong data points.
Insert the data form your temp/support measurement to the old one (if it has been deleted, it will just be re-created).

hope it helps

edit:
I forgot to add that the only way to avoid this kind of issue is casting/enforcing data types in the inputs (as InfluxDB is not strongly typed) if you are using Telegraf you can use the converter processor, otherwise you will need to serialize the data correctly before sending them to the DB.

Thanks a lot for your input.

I will read the links you provided, see if anything of them helps an come back.

Cheers

That was a long break in the conversation. We ended up migrating all of our data and fixing in the process:

  1. copy copy all data to new bucket: the one that was correct, copy as is; the one that was wrong (had the wrong type), change the type of the values
  2. point our app to the new bucket
  3. delete old bucket

It was annoying, but now we will be more careful with the types of what we write.

Thanks a lot for your help

1 Like