I am attempting to migrate and ingest a large, compressed Line Protocol file (mydata.lp.gz) into an InfluxDB 3.0 Core instance. This file is a backup originally pulled from an InfluxDB server hosted 1.7.4 database. During this process, I have encountered persistent issues related to UTF-8 validation errors while using the influxdb3 write CLI command. These issues have prevented successful data ingestion into the new InfluxDB 3.0 environment, which is a self-hosted Core instance running on an Linux VM
A strict character validation error occurred, indicating invalid encoding within the file. The CLI returned the following message:
Error: Write command failed: error reading file: stream did not contain valid UTF-8
This indicates that the original backup data from InfluxDB 1.7.4 may not have been strictly UTF-8 encoded.
Action 1: I attempted to explicitly fix the encoding using iconv with the ignore flag (iconv -f utf-8 -t utf-8 -c mydata_clean.lp > mydata_final.lp). Unfortunately, the UTF-8 validation error persisted even after this step.
Action 2 (Current State): As a final attempt to resolve both memory and encoding, I developed a robust, streaming Python script to read the file line-by-line, ignore non-UTF-8 characters, and write a new, clean file. Despite this, ingestion still fails at the same UTF-8 validation step.
I need help in copying the backup as i am unsure if the mistake is done while copying the backup or is it someting I would face while having a migration from 1.7.4 to 3.0.
Any Leads or feedback on this would be much appreciated.
