InfluxDB ingestion rate chokes up after reaching just over 1.5+ million records (size >~2GB)

Overview:
InfluxdB(v1.5.2) is in azure and I am trying to insert records from windows/centos. I have written a python script to read the data from csv and insert the same to the influxdB. The ingestion rate is quite good on the newly created database but once the number of records crosses the 1.5+ million mark the ingestion rate deteriorates, such that not all the records gets ingested. If I use the same csv file and insert the records to a newly created database then I get more number of records. The deterioration rate varies between 40 to 80% data loss.

Configuration Details:
Azure instance that we have signed up is Premium SSD Managed disks.
P4 flavor: Disk size= 32GB, IOPS rate per disk = 120, RAM=8GB and Throughput per disk=25 MB/sec.
No problem was faced until the records count reached over 1.5+million. The ingestion rate began to choke after this value.

Azure disk details:
SDA1 >> Size= 976MB;Used=46MB (5%);Available=879MB
SDA2 >> Size= 29GB;Used=9.2GB (35%);Available=18GB

InfluxDB(v1.5.2) instance size allocated= ~30GB.
My InfluxDB database size = ~2.0G

Influx Query related details:
Measurement = 1
No of tags= 9 (8+1uniq tag)
No of fields= 3
Number of Series = 999952

One csv file having max of 10K rows (can be less but not more) is used for ingesting the data for every 5 minutes (288000 records per day). Simple query is used with no regular expression.

For every 10K records ingested:
Time taken for ingestion ranges between= 5-10secs (depending upon the size of the db.)
Field writes per second is around= 100 per second.
Total unique series= 10000

Any suggestions are welcome.!

Thanks in advance.

I guess your problem is high cardinality data. You will need more mem for the heap.
Ref: https://www.influxdata.com/blog/the-effect-of-cardinality-on-data-ingest-part-1/

Thanks @jangaraj for the reply.
Once the ingestion rate started choking up we doubled up the memory size. 8cpu and 64GB ram. But still that didn’t improved the ingestion rate. However I noticed in one of the influx topic (What is the highest-performance method of getting data in/out of InfluxDB); if the series rate is over 90% under the same measurement then the issues bound to happen. In my case there was only one measurement and obviously 100% of the series was under the same measurement. I then tweaked the settings in the conf file which helped to get back the ingestion rate back to normal.
I guess the utility link shared by you (written by Edd Robinson and other contributors) will be handy going forward for benchmarks.

-Avin

1 Like

I was able to overcome this issue my making couple of config changes (max-series-per-database, max-values-per-tag) in the /etc/influxdb/influxdb.conf file.