Telegraf → InfluxDB - Error 400 bad request - single line longer than the maximum of 65536 bytes

I’ve got two Telegraf processes gathering data using the SQL Server Input.
I get a few hundred points every 15sec.
Sometimes, some of those points seem to be invalid, and I get the following error

2020-02-14T09:53:18Z E! [outputs.influxdb] When writing to [http://127.0.0.1:8186]: 400 Bad Request: http: bad request
2020-02-14T09:53:18Z E! [agent] Error writing to outputs.influxdb: could not write any address

Since I got invalid points for several hours, I would like to count the number of points for each series in InfluxDB (filtered by time), in order to see which one has no data points for that interval and then debug further.

Now I’m using queries like this:

SELECT count("read_bytes") AS COUNT FROM "monitor"."short"."sqlserver_database_io" WHERE time > :dashboardTime: AND time < :upperDashboardTime: GROUP BY *

Which must be created for each measurement, Is it possible to count the number of points by series in a different way?

Could you get this from the _Internal database?

No, the internal db can tell you how many series there are in each database, but not how many points there are in a series

I know you can select count(*) in the CLI, not sure about from Chronograf. I’m curious though, is Telegraf writing directly to InfluxDB or are you using a proxy, the error isn’t quite what I would expect.

I’m not using a proxy, but I’m using a “Telegraf Gateway”. I have several Telegraf instances that gathers data and send it to another telegraf instance (the gateway), which uses the “influxdb listener” input plugin and finally sends the data to the database (InfluxDB).

I see the error in the telegraf logs of the “gatherers” instances.

An update on this issue:
Counting the number of points in the end helped me… every measurement had the same number of points for the given interval except one, which is the “sqlserver_requests” measurement.
It captures the query running on the database, so it does not have a constant amount of points by time interval, that said, every other measurement (which have a constant amount of point per interval) had the same amount of points, therefore none of them lost points. or at least this is what I think.

The “sqlserver_requests” measurement stores the query text as a tag (statement_text). My guess is that some texts cause this issue even if I have no proof of that yet.

I will set the log to debug mode, and also output the values to a file, to see if that captures the problematic points. Any other suggestion about how to debug this issue is appreciated.

I’ve discovered the source of the issue.
What didn’t work
Enabling more verbose logging (debug = true) and outputting the data to a file (outputs.file) did not report any error.

What did work
Enabling verbose logging on the Telegraf “gateway” process reported the following error

D! [inputs.influxdb_listener] Http_listener received a single line longer than the maximum of 65536 bytes
2020-02-17T10:49:23Z D! [inputs.influxdb_listener] Http_listener received a single line longer than the maximum of 65536 bytes
2020-02-17T10:49:23Z D! [inputs.influxdb_listener] Http_listener received a single line longer than the maximum of 65536 bytes
2020-02-17T10:49:23Z D! [outputs.influxdb] Wrote batch of 5000 metrics in 95.2228ms
2020-02-17T10:49:23Z D! [outputs.influxdb] Buffer fullness: 11833 / 50000 metrics

The Issue
The issue is that the query text is too big to be stored, strings have a limit of 64kb (docs here)

From this old post, looks like the limit only applies to tag values, and not to field values.

@daniel can you confirm that the limit does not apply to field values?

At the moment a solution could be the parameter “max_line_size” in the influx listener plugin used by the Telegraf “gateway”

[[inputs.influxdb_listener]]
  {...}
  ## Maximum line size allowed to be sent in bytes.
  ## 0 means to use the default of 65536 bytes (64 kibibytes)
  max_line_size = 0
  {...}

Update:
changing the “max_line_size” configuration helps and I get fewer errors, but another problem has appered…

2020-02-17T11:49:00Z E! [outputs.influxdb] When writing to [http://127.0.0.1:8086]: received error partial write: __HugeLineProtocolString__  : max key length exceeded: 85962 > 65535 dropped=0; discarding points

The point key max size is 64kb, and the string alone weights 84kb, and I think for this there is no workaround if not opening an issue to change this column from tag to field

Sounds like we have a couple issues to sort out. As an aside, we are actually making some changes to the influxdb_listener plugin right now, and we are actually planning to remove the line limiter and use only the max body limit in 1.14.

A quick test seems to show that the string fields can be larger than 65KiB, I’ll ask around on if there is a maximum. I’ll also look into adding the 64KiB limit to the line protocol parser, or perhaps only when we are serializing the data the limit should be applied.

There has been some discussion about this particular query, and I’m beginning to think we should move this item to a field. Check out #6976 and #6678. In the meantime, you can use the converter processor to convert it to a string field.