hi, i am inserting bunch of csv files into influx… the number of records and the amount of time to insert them are below. numbers don’t seem consistent. what may the reason be and how can I improve it? thanks.
saved 3528794
saved 7292384
saved 10655566
saved 14054265
saved 16820240
saved 17426402
in seconds 169
in seconds 7152
in seconds 6446
in seconds 6665
in seconds 5568
in seconds 1245
@aozoren Can you share some sample data you are inserting, the specs of the machine you are running the insert on and the exact method you used to insert the data?
*** data looks like this
064F_XU0300416S002101.317309:10:35.00010
064X30YVADE02101.317309:10:35.00010
064F_XU0300416S003101.2759309:10:35.00010
064X30YVADE03101.2759309:10:35.00010
this is stock market data. I parse type ‘064’, symbol ‘F_XU0300416S0’ and use as tags, and all of the row as value.
*** box specs
Dell PowerEdge R410
2 x Intel® Xeon® E5620 (8 Core, 2.40 GHz)
64 GB Ram
2 x 960 GB SSD RAID
box is divided by vmware, this virtual machine is using 12 cores and 55+ gb according to htop. influx is running inside a docker container on this virtual machine.
*** data written to db with java library as such
Point p = Point.measurement( "tick" )
.time( time.getTime(), // TimeUnit.MILLISECONDS )
.addField( "raw", value ) // whole line unparsed
.addField( "multiplier", multiplier )
.tag( "diff", "" + DIFF )
.tag( "type", type )
.tag( "symbol", symbol )
.build();
Import.influxDB.write( Import.DB_NAME, "autogen", p );
I can provide remote access to virtual machine over teamviewer+ssh.
Thanks.
@aozoren You need to batch writes. The optimal number of field values per batch is 5k-10k. You should see significantly improved write performance.
ok. i’ll try and post the result. thanks.
I checked the code, I’m currently using batch writes with 1k /10ms values. I’ll increase those values.
influxDB.enableBatch(1000, 10, TimeUnit.MILLISECONDS);