Error while writing large line protocol file

Yogesh · July 7, 2019, 5:56pm

I’m getting parsing error when I try to write a 1.3 GB text file with data points in line protocol format.
If I pick few 100 lines of the file and upload as a separate file, it completes without any error.
Data file is written by a test java program and writes uniform data so it’s unlikely that there is some formatting mistake in the file.
Error :
Command: influx write -b TestBucket -o INFA -p ms @/data/cloudqa/influx/data30.txt
Error: Failed to write data: http/handleWrite: unable to parse ‘82’: missing fields
unable to parse ‘process_count,tenant_id=Tenant2,process_name=Proc30,status=Faulted pcount=130 1559928204082process_count,tenant_id=Tenant1,process_name=Proc87,status=Faulted pcount=187 1559928204082’: bad timestamp
unable to parse ‘1559928204082’: missing fields
unable to parse ‘process_count,tenant_id=Tenant6,process_name=Proc64,status=Suspended pcount=164process_count,tenant_id=Tenant6,process_name=Proc22,status=Suspended pcount=122 1559928204082’: invalid number
unable to parse ‘process_count,tenant_process_count,tenant_id=Tenant8,process_name=Proc82,status=Faulted pcount=182 1559928204082’: missing tag value
unable to parse ‘pcount=169 1559928204082’: invalid field format
unable to parse ‘process_count,tenant_id=Tenant9,process_name=Proc70,status=Completeprocess_count,tenant_id=Tenant9,process_name=Proc27,status=Faulted pcount=127 1559928204082’: duplicate tags

MarcV · July 8, 2019, 9:24am

Hi @Yogesh welcome !

how many lines does the file contain ?
can you extract the first few lines that give an error from the text file and post the content ?

best regards ,

Yogesh · July 8, 2019, 2:21pm

Hi Marc,

Thanks for responding.

When file upload did not work I looked for other options and found the java client. It’s a bit slow but seems to work. So I can probably live with the file upload issue for now.

To answer your question, issue cropped up at about line 130. Below are few lines from the file:
process_count,tenant_id=Tenant1,process_name=Proc1,status=Faulted pcount=101 1559928504688
process_count,tenant_id=Tenant1,process_name=Proc2,status=Suspended pcount=102 1559928504688
process_count,tenant_id=Tenant1,process_name=Proc3,status=Suspended pcount=103 1559928504688
process_count,tenant_id=Tenant1,process_name=Proc4,status=Faulted pcount=104 1559928504688
process_count,tenant_id=Tenant1,process_name=Proc5,status=Completed pcount=105 1559928504688
process_count,tenant_id=Tenant1,process_name=Proc6,status=Suspended pcount=106 1559928504688
process_count,tenant_id=Tenant1,process_name=Proc7,status=Faulted pcount=107 1559928504688
process_count,tenant_id=Tenant1,process_name=Proc8,status=Suspended pcount=108 1559928504688
process_count,tenant_id=Tenant1,process_name=Proc9,status=Completed pcount=109 1559928504688
process_count,tenant_id=Tenant1,process_name=Proc10,status=Completed pcount=110 1559928504688

MarcV · July 8, 2019, 3:20pm

@Yogesh , maybe there are newline character in the file ?

I get the same kind of errors when I try …

> insert 151151515125151
ERR: {"error":"unable to parse '151151515125151': missing fields"}

> insert pcount=169 1559928204082
ERR: {"error":"unable to parse 'pcount=169 1559928204082': invalid field format"}

> insert process_count,tenant_id=Tenant9,process_name=Proc70,status=Completeprocess_count,tenant_id=Tenant9,process_name=Proc27,status=Faulted pcount=127 1559928204082
ERR: {"error":"unable to parse 'process_count,tenant_id=Tenant9,process_name=Proc70,status=Completeprocess_count,tenant_id=Tenant9,process_name=Proc27,status=Faulted pcount=127 1559928204082': duplicate tags"}

Matthias · December 23, 2019, 2:07pm

Hello everyone,

investigating influxdb 2.0 I feel I am running into the same problem as shown above on influxdb_2.0.0-alpha.21_linux_amd64. My file has 200 rows, with a total size of 53305 bytes. Checking the file against Line protocol | InfluxDB OSS 2.0 Documentation I feel reasonable sure the file is correctly formatted - it has been written using python 2.7, it has exactly 200 newline / LF characters (hex x0a), and 400 spaces, separating the tags from the fields and the fields from the time stamp, respectively. Double quotes (") on field values for strings, i for integers on field values.

When calling the influx command for each row individually on the command line, there are no errors shown. This makes me feel the formatting would be correct.

When sending the full file using the “@” symbol, all kinds of errors show up, like missing tag value, invalid field format, bad timestamp, duplicate tags, etc.

Investigating the text string quoted on the error messages, i.e. on

Error: Failed to write data: unable to parse ‘some text string shown here’: missing tag value

it feels the quoted text string is somehow corrupt. It could be anything from just two or three bytes long or cover several input rows. It could start in the middle of a tag or field or in the middle of a timestamp. Oddly enough, it could show non-contiguous parts of the input file, i.e. part of one input row, and abruptly change to another part of another row, in the middle of a tag or somewhere else.

Assuming my newline / LF character x0a is okay on Linux it feels to me like an issue with the string quoted on the error message, or, if the quoted string truly reflects what the parser is attempting to process, an issue reading the rows of the input file or handing them over to the parser.

Thanks a lot, kind regards & happy holidays, Matthias

mivola · January 10, 2024, 2:36pm

Even though this thread seems to be quite old, I want to add my findings since I have experienced similar problems. When I try to import larger line protocol files, I get strange error messages (as with test0and1.line.txt). The error disappears if I split the file into two files (test0.line.txt, test1.line.txt) and import both files independently.

I use this command to import the files:

influx write --bucket test2 --file "test0and1.line.txt" --rate-limit "1MB/s" --debug --skipRowOnError --errors-file errors.txt --format=lp

Actually I wanted to upload 3 example files but as a new user I’m not allowed to do it. Therefore I created 3 pastebins:

test0.line.txt: test0.line.txt - Pastebin.com
test1.line.txt: test1.line.txt - Pastebin.com
test0and1.line.txt: https:// pastebin .com/DcjBzGA5

What might be interesting is that the smaller files (that are working well) are all below 4.098bytes in size. If I add just one more character/digit to the file, I get an error like this:

2024/01/10 15:28:11 invalid point on line 77: unable to parse '0': missing fields

0 is the first character behind the 4098 byte - if I change that to 1, the error message changes accordingly! So to me it seems that all files with 4099 bytes or more are just not properly parsed and the error is completely independend of the actual content of the file.

Can anybody confirm this? How can we proceed with this issue?

Thanks a lot!
Best Regards,
Michael

Topic		Replies	Views
Line protocol works for few entries fails for large file InfluxDB 2 influxdb , import	2	1228	December 15, 2021
Line protocol upload bug? InfluxDB 2	4	513	September 29, 2023
Influxdbv2 large line protocol ingest InfluxDB 2	3	701	February 26, 2020
Line protocol is not working with InfluxDB 2.1.1 InfluxDB 2	1	1015	December 19, 2022
Difficulties in ingesting large amount of data in annotated CSV files influxdb	1	537	February 15, 2021

Error while writing large line protocol file

Related topics