I need to export some days of data from influxDB to files (csv and excel).
Can you to suggest me the right way to proceed?
I tried these solutions:
curl
curl -G 'http://my.server.com:8086/query?db=mydb' --data-urlencode "q=select * from sensor where time > now() - 2d" > output_file
pros: only 6 seconds
cons: hard to format output in csv and excel
python InfluxDBClient
...
client = influxdb.InfluxDBClient( ... )
points = client.query(query, chunked=True, chunk_size=10000).get_points()
dfs = pd.DataFrame(points)
...
for d in dfs:
d.to_excel(writer, "Sheet")
...
pros: easy to format output with pandas lib. No influx max-row-limit (default=10000)
cons: It takes 2 minutes and 24 seconds
python DataFrameClient
...
client = influxdb.DataFrameClient( ... )
dfs_dict = client.query(query, chunked=True)
ret = dfs_dict["sensor"]
...
for d in ret:
d.to_excel(writer, "Sheet")
...
pros: easy to format output with pandas lib. It take 10 seconds
cons: influx max-row-limit (default=10000)
Cons in each solution is unacceptable for me
Where Iām wrong?
I canāt wait so much time, I canāt change max-row-limit from influxdb.conf and I need to split output and write it in csv+excel format.
I hope in your help.
-format ājson|csv|columnā
Format specifies the format of the server responses: json, csv, or column.
So your command will become
curl -G āhttp://my.server.com:8086/query?db=mydbā --data-urlencode āq=select * from sensor where time > now() - 2dā -format csv> output_file
Hi guys,
Whatās the impact of this kind of query on the Influx db server?
Weāre experimenting the same thing (trying to backup every day into a csv file) and we saw a saturation of the RAM memory and then continuous swapping to the disk (the query takes really long time).
We have a single instance node, 4 cores, 8-10 GB RAM (the total RAM is 16GB but we have other processes running in it).
We are inserting 2.400.000 points per hour and we query like that: select * from telemetry where time >= 2017-09-01 00:00:00 and time <= 2017-09-01 23:59:59
We also have the same problem of resteeee and at the moment we ācutā one query in many:
select * from sensor where time >= 0 and time < 10
select * from sensor where time >= 10 and time < 20
select * from sensor where time >= 20 and time < 30
and so on ā¦
but how can i know in advance if i need to use many queries instead of one. And how many?
And what to do if influx return me only [max-row-limit] rows?
Result: saturation of 8Gb RAM + 8Gb swap
Possible solution: cut my query into 10000 smaller queries.
InfluxData team, this is the right and faster way to proceed?
Which client is the most suitable for this purpose?
Very interested in a solution. Weāre facing the same problem; currently, our data is safely stored in InfluxDB with no way to get it out for analysis
The problem is not the format, but that requesting a data dump via query-and-save is unusably slow. The InfluxDB fetcher would have the same problem, as itās the data production on the InfluxDB end thatās slow, not the saving.
Iām using the latest influx python client (GitHub - influxdata/influxdb-client-python: InfluxDB 2.0 python client) and would like to use the query chunked method. Unfortunately, the method described in the first post doesnāt work anymore for this version. I donāt understand how you can achieve the same with the new library. Anyone who knows?
I also have to export data from InfluxDB to externally readable files, preferrably csv.
Iām personally running InfluxDB 1.8.4 for Windows and am having trouble using the CLI query. Here, the query I give it under -execute returns an error with a text something like ācouldnāt find the fileā. I also get this error even if I just try to print it to the console rather than exporting to a file. I also use InfluxDB for Linux, where this approach works. My conclusion is that the InfluxDB release for Windows seems to have an issue here.
I was forced to look for alternatives on Windows, and found the HTTP query approach with curl. You write that with curl you get a file that is hard to transform to csv. I managed to overcome this by adding -H "Accept: application/csv" to the curl command. This actually causes the output file to be a comma-separated csv.
Your whole command would then look like this: curl -H "Accept: application/csv" -G 'http://my.server.com:8086/query?db=mydb' --data-urlencode "q=select * from sensor where time > now() - 2d"
Iām currently also looking for an alternative analogous to -precision, but havenāt found anything yet.