Tool for InfluxDB batch ETL

Hello everyone,

I’m currently facing challenges with replicating InfluxDB data to a data lake.

I’m using InfluxDB v2 OSS on-premises and have developed a fault-tolerant Python script to replicate its raw data. While functional, the script has become a headache due to the large volume of data—it takes too long to process.

My goal is to achieve full data replication to Parquet files stored in a MinIO instance. However, I haven’t found a tool that fits my needs. The Airflow operator isn’t sufficient, Telegraf and Kapacitor don’t seem to solve the problem, and Quix Streams would require me to set up Kafka, which I’d prefer to avoid since I’m not dealing with streaming right now.

Is there any tool that can handle this, similar to how Airbyte manages batch replication for Postgres and other databases?

Thanks in advance!

Hello @ArthurKretzer,
Might I interest you in InfluxDB v3? You can use the python processing engine that’s embedded in it and someone in the org created a parquet exporter so you can export parquet in influxdb v3 to iceberg so you can read iceberg tables from datalakes like duckdb or snowflake. Unfortunately its not on a public repo but I’m working to move it there asap. You might be interested in learning about the python processing engine though for influxdb 3 core and enterprise.

oooooh also fyi Quix doesn’t require you to setup kafka, it handles all the kafka under the hood for you so it might be easier than you think. It’s whole selling point is trying to relieve you of the pain you just mentioned.