My InfluxDB 1.8 is running well since one year (with Grafana on top). I don’t have set any retention policy (except default autogen=infinite). I inject the data from CSV files on a daily basis.
But now my hard drive is getting full. The weird thing is, influxdb’s folder became huge, something like 9x the size of the row data (csv-files). Why is it like this?
Now the HD is quite full, up to 88%, and I really need to do something! Sure I could add some retention policies, but actually it should be fine now if influx’s data size ~ raw data size.
Why is influx’s data size so big compared to the raw data file? is there a kind of versioning or something similar? How can I prevent influxdb to grow crazy like this?
thank you for your reply!
I inject 2 CSV-files with influx’s python client. Each one has a 1s resolution and is a representation of the day (one file per day). One has a size of ~80mb and the other the size of ~150mb. So I inject ~ 230mb every night, at 5 AM.
The whole thing is really weird. Here a graph with the CPU load and the HD used space:
At 5 AM, the CPU is running the injection but the HD size doesn’t change (yellow circles)
At 9 AM (why?), the CPU is running again (why?) and the HD size changes by 55 GB (!) (red circles)
And this happens every day, so I’m running out of space in a couple of days
This is how I inject the data: client.write_points(df, 'my_measurement', protocol='line', batch_size=3000)
Hello @vince,
Does your DataFrame have timestamps? If it doesn’t, it’s possible you’re writing duplicate points.
Have you loaded data into InfluxDB from sources other than the CSV?
If your DataFrame has timestamps and the only data you’ve loaded into InfluxDB is from that CSV I’d suspect something in InfluxDB isn’t working right, like a cache never being cleared or temporary files not being deleted, maybe even something odd with the tsm engine causing it to use much bigger files than needed.
What does your schema look like from that CSV? How many tags and fields?
yes, my csv files have timestamps.
I do collect some more data from telegraph but I inject them into another (influx)database, and this db is absolutely fine. Only the CSV_DB is growing like crazy.
So I would guess the same as you said with the cache or the temporary files. I had a look to the file structure of influx, and in data/CSV_DB/autogen/number/, I can see A LOT of ‘.tmp’ folders. And this is inside each number folder similar. Here a pic:
fyi3: yesterday I stopped the automatic (nightly) injection of the CSV files. Nothing happened overnight, the database didn’t change its size. l also triggered the injection manually at 1 PM and… 4.5h later, the HD usage got 55GB more data (exactly the same behaviour as nightly). So the CSV injection is definitely related to the problem.