[Solved]HDD batch size to write specified amount of series

influxdb

#1

Hello every one,

For our project’s needs, we need a TS database. We have tested thanks to POCS different kind of database and, now, we are testing InfluxDB which should suits to our needs.

Nevertheless, we saw that InfluxDB was optimized for SSD Storage, but, because our app will be for other customers, we are very constrained on the hardware side: Only HDD (because we cannot “force” client to buy a bunch of SSD’s, taking in account the RAID possibility too …).

Also, we made an estimation (in worst case) in which our database, with 20 years historical data, could have up to 260 Billions rows.

So for our tests/benchmarks, thanks to a bash script in which we use Influx CLI with -import parameter, we have created dynamically .txt files with Line Protocol syntax, which work really really fine. But something bothers us:

Even if we import data with files, how does the writting process, on HDD work? Does it take, by default, 10.000 series in memory and THEN write these 10.000 series in on shot (like a transaction in Relational Databases concepts?)

For now, we have a 10 Billion data base ingested, but we are afraid that, if it writes every single series (I assume not, even on a HDD, it would make no sense for a DBMS), it could kill prematurely our HDD and, in that case, we could not use InfluxDB as a longterm data storage system.

I hope I’ve been clear in my explainations and questions.

Kind regards.

Benjamin.


#2

Well I have an answer from someone in InfluxDB’s staff. Indeed, the InfluxDB CLI writes automatically 5000 lines with its default behavior. BUT we can change this in setting the parameter wal-fsync-delay to 100ms, this will write data that has been written in the WAL during this time, and could for example 15000 lines, so in only one fsync, it will write the given 15000 lines instead of, by default, execute 3 writes of 5000 lines on the disk.