InfluxDB disk usage

influxdb

#1

Hi Guys,

I’m very new to influxDb and have a question about the disk usage of my influx installation.
First, let me describe my scenario:
I have a bunch of articles in my mysql database and all of them have a price and an available quantity. I would like to now about the price and quantity changes. Since I want to store the data every hour (I lag the possibility to do it on price / quantity change and it’s nothing I can do in the near future) I decided to use an influxdb.

I wrote a simple importer, storing all the ~1 million datasets into the database, having 5 columns:
The timestamp, a productId (tag, integer), articleId (tag, integer), the quantity (field, integer) and the price (field, float).

Why I’m writing this post:
The original mysql database table with 53 columns(!!) is like 250mb in size. Every import of those data into the influx db costs me at around 550mb of disk space. I really don’t understand why it is that much and why the database is that big.

I changed parts of the default configuration, e.g. reduce memory usage. What I changed:

  • Set index-version to tsi1
  • Set max-series-per-database to 0 because yet I don’t know how many series i’ll need
  • Set max-values-per-tag to 0 for the same reason

All the other configuration options are set to there default values (besides http of course). Is there any hint what i could do to reduce the size? I’m a bit confused, because I read a lot of articles about compression possibilities influxdb offers. Do I have to enable them somehow? Is it just because the 2 tags (ProductId, ArticleId) which increase (afaik) the cardinality? Or am I mixing things up?

Let me know if there is anything missing or in case you need any further information.

Best Regards
Chris


#2

I’m not an expert, but I had the same situation at first which was due to the default shard duration being set to 7 days. In my case I was bulk loading data with a time span of about 40 years. InfluxDB ended up creating a large number of (small) shards and each one has some overhead. Once I was able to set the shard duration more appropriately, disk usage, memory, etc improved dramatically. This may or may not apply to your situation.


#3

Thanks for your reply.
Indeed, the database size decreased in the meantime by round about 1GB.

Best, Chris