Failed CLI import inserts do not show up when I break down the file into smaller parts


#1

Hi everyone,

I am currently working on NASDAQ data parsing and insertion into the influx database. I have taken care of all the data insertion rules (escaping special characters and organizing the according to the format : <measurement>[,<tag-key>=<tag-value>...] <field-key>=<field-value>[,<field2-key>=<field2-value>...] [unix-nano-timestamp]).

Below is a sample of my data:
apatel17@*****:~/output$ head S051018-v50-U.csv
# DDL
CREATE DATABASE NASDAQData
# DML
# CONTEXT-DATABASE:NASDAQData
U,StockLoc=6445,OrigOrderRef=22159,NewOrderRef=46667 TrackingNum=0,Shares=200,Price=73.7000 1525942800343419608
U,StockLoc=6445,OrigOrderRef=20491,NewOrderRef=46671 TrackingNum=0,Shares=200,Price=73.7800 1525942800344047668
U,StockLoc=952,OrigOrderRef=65253,NewOrderRef=75009 TrackingNum=0,Shares=400,Price=45.8200 1525942800792553625
U,StockLoc=7092,OrigOrderRef=51344,NewOrderRef=80292 TrackingNum=0,Shares=100,Price=38.2500 1525942803130310652
U,StockLoc=7092,OrigOrderRef=80292,NewOrderRef=80300 TrackingNum=0,Shares=100,Price=38.1600 1525942803130395217
U,StockLoc=7092,OrigOrderRef=82000,NewOrderRef=82004 TrackingNum=0,Shares=300,Price=37.1900 1525942803232492698

I have also created the database: NASDAQData inside influx.

The problem I am facing is this:
The file has approximately 13 million rows (12,861,906 to be exact). I am trying to insert this data using the CLI import command as below:
influx -import -path=S051118-v50-U.csv -precision=ns -database=NASDAQData

I usually get upto 5,000,000 lines before I start getting the error for insertion. I have run this code multiple times and sometimes I get the error at 3,000,000 lines as well. To figure out this error, I am running the same code on a part of the file. I divide the data into 500,000 lines each and the code successfully ran for all the smaller files. (all 26 files of 500,000 rows)

Has this happened to somebody else or does somebody know a fix for this problem wherein a huge file shows errors during data insert but if broken down and worked with smaller data size, the import works perfectly.

Any help is appreciated. Thanks


#2

Hi @ptladit. Import works best with files that are batched into 5000-10,000 points. From the docs: https://docs.influxdata.com/influxdb/v1.7/tools/shell/#import-data-from-a-file-with-import

" * If your data file has more than 5,000 points, it may be necessary to split that file into several files in order to write your data in batches to InfluxDB. We recommend writing points in batches of 5,000 to 10,000 points. Smaller batches, and more HTTP requests, will result in sub-optimal performance. By default, the HTTP request times out after five seconds. InfluxDB will still attempt to write the points after that time out but there will be no confirmation that they were successfully written."