Inserting data timing

aozoren · May 25, 2017, 8:16am

hi, i am inserting bunch of csv files into influx… the number of records and the amount of time to insert them are below. numbers don’t seem consistent. what may the reason be and how can I improve it? thanks.

saved 3528794
saved 7292384
saved 10655566
saved 14054265
saved 16820240
saved 17426402

in seconds 169
in seconds 7152
in seconds 6446
in seconds 6665
in seconds 5568
in seconds 1245

jackzampolin · May 25, 2017, 3:31pm

@aozoren Can you share some sample data you are inserting, the specs of the machine you are running the insert on and the exact method you used to insert the data?

aozoren · May 25, 2017, 9:13pm

*** data looks like this

064F_XU0300416S002101.317309:10:35.00010
064X30YVADE02101.317309:10:35.00010
064F_XU0300416S003101.2759309:10:35.00010
064X30YVADE03101.2759309:10:35.00010

this is stock market data. I parse type ‘064’, symbol ‘F_XU0300416S0’ and use as tags, and all of the row as value.

*** box specs

Dell PowerEdge R410
2 x Intel® Xeon® E5620 (8 Core, 2.40 GHz)
64 GB Ram
2 x 960 GB SSD RAID

box is divided by vmware, this virtual machine is using 12 cores and 55+ gb according to htop. influx is running inside a docker container on this virtual machine.

*** data written to db with java library as such

		Point p = Point.measurement( "tick" )
				.time( time.getTime(), // TimeUnit.MILLISECONDS )
				.addField( "raw", value ) // whole line unparsed
				.addField( "multiplier", multiplier )
				.tag( "diff", "" + DIFF )
				.tag( "type", type )
				.tag( "symbol", symbol )
				.build();
		
		Import.influxDB.write( Import.DB_NAME, "autogen", p );

I can provide remote access to virtual machine over teamviewer+ssh.

Thanks.

jackzampolin · May 26, 2017, 4:49pm

@aozoren You need to batch writes. The optimal number of field values per batch is 5k-10k. You should see significantly improved write performance.

aozoren · May 26, 2017, 5:29pm

ok. i’ll try and post the result. thanks.

aozoren · May 26, 2017, 5:32pm

I checked the code, I’m currently using batch writes with 1k /10ms values. I’ll increase those values.

influxDB.enableBatch(1000, 10, TimeUnit.MILLISECONDS);

Topic		Replies	Views
Increasing InfluxDB insertion rate via Influx-Python lib	6	4499	January 2, 2019
Newbie - data ingestion	9	561	March 30, 2021
What is the highest-performance method of getting data in/out of InfluxDB Telegraf influxdb , time-series	12	26631	October 22, 2020
Optimizing writing performance InfluxDB 2 influxdb , performance , python	2	2023	November 25, 2022
Odd Batching Behavior - small amounts of data InfluxDB 2	5	645	August 16, 2024

Inserting data timing

Related topics