InfluxDB ingestion rate chokes up after reaching just over 1.5+ million records (size >~2GB)

Avin · July 30, 2018, 12:31pm

Overview:
InfluxdB(v1.5.2) is in azure and I am trying to insert records from windows/centos. I have written a python script to read the data from csv and insert the same to the influxdB. The ingestion rate is quite good on the newly created database but once the number of records crosses the 1.5+ million mark the ingestion rate deteriorates, such that not all the records gets ingested. If I use the same csv file and insert the records to a newly created database then I get more number of records. The deterioration rate varies between 40 to 80% data loss.

Configuration Details:
Azure instance that we have signed up is Premium SSD Managed disks.
P4 flavor: Disk size= 32GB, IOPS rate per disk = 120, RAM=8GB and Throughput per disk=25 MB/sec.
No problem was faced until the records count reached over 1.5+million. The ingestion rate began to choke after this value.

Azure disk details:
SDA1 >> Size= 976MB;Used=46MB (5%);Available=879MB
SDA2 >> Size= 29GB;Used=9.2GB (35%);Available=18GB

InfluxDB(v1.5.2) instance size allocated= ~30GB.
My InfluxDB database size = ~2.0G

Influx Query related details:
Measurement = 1
No of tags= 9 (8+1uniq tag)
No of fields= 3
Number of Series = 999952

One csv file having max of 10K rows (can be less but not more) is used for ingesting the data for every 5 minutes (288000 records per day). Simple query is used with no regular expression.

For every 10K records ingested:
Time taken for ingestion ranges between= 5-10secs (depending upon the size of the db.)
Field writes per second is around= 100 per second.
Total unique series= 10000

Any suggestions are welcome.!

Thanks in advance.

jangaraj · August 2, 2018, 5:42pm

I guess your problem is high cardinality data. You will need more mem for the heap.
Ref: The Effect of Cardinality on Data Ingest — Part 1 | InfluxData

Avin · August 3, 2018, 6:55am

Thanks @jangaraj for the reply.
Once the ingestion rate started choking up we doubled up the memory size. 8cpu and 64GB ram. But still that didn’t improved the ingestion rate. However I noticed in one of the influx topic (What is the highest-performance method of getting data in/out of InfluxDB - #6 by JeremySTX); if the series rate is over 90% under the same measurement then the issues bound to happen. In my case there was only one measurement and obviously 100% of the series was under the same measurement. I then tweaked the settings in the conf file which helped to get back the ingestion rate back to normal.
I guess the utility link shared by you (written by Edd Robinson and other contributors) will be handy going forward for benchmarks.

-Avin

Avin · August 16, 2018, 1:02pm

I was able to overcome this issue my making couple of config changes (max-series-per-database, max-values-per-tag) in the /etc/influxdb/influxdb.conf file.

Topic		Replies	Views
Cardinality and system performance InfluxDB 2 influxdb	5	3052	September 22, 2021
Influxdb doesn't work after insert 1GB data	1	822	July 28, 2017
Writing large data in influxdb AWS InfluxDB 2	2	533	November 25, 2020
Influx Cloud TSM adjustable quotas and bulk imports InfluxDB 2 performance , influxdb-cloud-2-0	2	10	August 20, 2025
Difficulties in ingesting large amount of data in annotated CSV files influxdb	1	537	February 15, 2021

InfluxDB ingestion rate chokes up after reaching just over 1.5+ million records (size >~2GB)

Related topics