How to boost up writing speed in influxdb using python


#1

Hello,
I’m using python script to write data into influxdb using UDP protocol, My current writing speed is about 1L datapoints per second. I want to boost up the writing speed by 10x. is there any suggestions.
I’ve Upgraded hardware components:
16 core CPU
32 GB RAM,
Xeon Processor.

So please suggest me the method to write datapoints at more speed.


#2

I achieved much higher speeds over the python interface by writing a C++ program that generates the HTTP line protocol lines directly. Then I also used libcurl to run async requests to dump the data to influx. If your influx node can keep up (it looks pretty beefy), then it’s a good way to get data into the DB.

If you want to test whether your client library or your influx node is the bottleneck, you can use the influx_stress program that is included with influx to see how much data your influx instance can ingest. influx_stress is written in Go I think, so it’s unlikely to be the bottleneck.

If you are set on using python, you could do the same thing in python, or you could use async requests or threading to post the data to influx db. But neither will be as fast as doing the same thing in C++ or C. Looking at the python library code, it didn’t look speed optimized to me, so I wrote my own client.

Edit: heads up that influx_stress included with influxdb is different than influx-stress, also written and maintained by influxdata. The influx-stress tool has more features.


#3

@fluffynukeit… Thank you so much… I don’t have deep knowledge about c++, so I’ll follow your suggestions about async request and multi threading using python.


#4

If you want any more info it would be good to see the Python script.


#5

def write_operation():
client = InfluxDBClient(host=‘0.0.0.0’,database=‘webyug_test’, use_udp=True, udp_port=8089)
#client.create_database(‘webyug_test’)
client.switch_database(‘webyug_test’)
start_time = time.time()
data = dd.read_csv(“file.csv”).compute()
print(time.time()- start_time)
rowdict = data.to_dict(orient=‘records’)
print(type(rowdict))
for value in rowdict:
print(value)
json_body = [{
“measurement”: “test_table_189_7”,
“fields”: value,
“time”: value.get(‘VIMSDate_Time’)
}]
client.write_points(json_body)
print("— %s seconds —" % (time.time() - start_time))
logging.basicConfig(filename=‘time.log’, level=logging.INFO)
write_operation()


#6

here is my python script to write data into influxdb… Please suggest something to elevate the writing speed in influxdb.