Hello everyone,
I’m currently facing challenges with replicating InfluxDB data to a data lake.
I’m using InfluxDB v2 OSS on-premises and have developed a fault-tolerant Python script to replicate its raw data. While functional, the script has become a headache due to the large volume of data—it takes too long to process.
My goal is to achieve full data replication to Parquet files stored in a MinIO instance. However, I haven’t found a tool that fits my needs. The Airflow operator isn’t sufficient, Telegraf and Kapacitor don’t seem to solve the problem, and Quix Streams would require me to set up Kafka, which I’d prefer to avoid since I’m not dealing with streaming right now.
Is there any tool that can handle this, similar to how Airbyte manages batch replication for Postgres and other databases?
Thanks in advance!