Data is not written into influx


#1

Hi,

We have installed influx 1.5.3 version quickstart (standalone) in r4.4xlarge AWS instance type.

We are trying to write ~25000 data/sec with jmeter which will create the 3619202 measurements in a min. Over a period, influx stops writing the data into influx. From the stats, PointsWrittenDropped is164000 and PointsWrittenFailed 602670 .

I am using influxdb-java-2.8.jar to write into influx.

Is there any limitation on cardinality in standalone version? Please help us to solve this issue.

Logs:
Shows timedout
[httpd] 10.0.9.230 - ceon [23/Jun/2018:17:53:18 +0000] “POST /write?consistency=one&db=ceondb&p=%5BREDACTED%5D&precision=n&rp=autogen&u=ceon HTTP/1.1” 500 20 “-” “okhttp/3.9.1” 4fa0573b-770e-11e8-8251-000000000000 10017160
ts=2018-06-23T17:53:28.412792Z lvl=error msg="[500] - “timeout”" log_id=08sR_btl000 service=httpd

show measurement cardinality
cardinality estimation


3619202

Show stats:
name: write
pointReq pointReqLocal req subWriteDrop subWriteOk writeDrop writeError writeOk writeTimeout


787314 787314 215 0 215 0 0 210 22

name: subscriber
createFailures pointsWritten writeFailures


0 0 0

name: cq
queryFail queryOk


0 0

name: httpd
tags: bind=:8086
authFail clientError pingReq pointsWrittenDropped pointsWrittenFail pointsWrittenOK promReadReq promWriteReq queryReq queryReqDurationNs queryRespBytes recoveredPanics req reqActive reqDurationNs serverError statusReq writeReq writeReqActive writeReqBytes writeReqDurationNs


0 0 2 0 164000 602670 0 0 2 573286 120 0 196 5 821578321435 22 0 192 4 73694579 821559547622


#2

@soundari
Were you able to figure this out? I am using the IoT Particle device to send sensor data through Telegraf into my influxdb database, but when I show stats I get the same error:

name: httpd
tags: bind=:8086
authFail clientError pingReq pointsWrittenDropped pointsWrittenFail pointsWrittenOK promReadReq promWriteReq queryReq queryReqDurationNs queryRespBytes recoveredPanics req... 

I am not sending as much data as you, but I was wondering why it would do this. I can’t see any of my data in influx as a result.

Please me know. Thanks.


#3

The HTTP error code suggests that the server is under too much pressure. Perhaps memory usage or disk performance causing the time out?


#4

What batch sizes are you sending??

In the 1.6.x release, the database now has some HTTP settings to back pressure writes.

But, this is usually caused/addressed by a handful of things:

  1. number of HTTP connections created/destroyed – this is typically addressed by using larger batches
  2. check out the IOPS on AWS. There is a 160Mbps limit on the gp2 SSDs and you can get up to 320Mbps on io1…but you should also check the limits based on machine class! For example: r4.4xlarge Mbps:437.5 IOPS:18,750 If you use lower class machines, you’ll hit limits faster.
  3. what is your shard duration and what are the typical time range that queries that are being run against? If you have a long shard duration (1 week for example is default when you have an infinite retention period), there can be some competition for the shard between writes, compactions, and queries. So, if you have a large amount of data arriving, shorten the shard duration. Ideal case is that you ensure that the majority of your queries access a single shard… For example, we are ingesting stats from 1800 hosts across a 6-10 Telegraf Input Plugins. The data is captured at 10 second intervals and reports at 1 min frequency. But, the majority of our queries are only looking at the data in the last 1-2 hours and usually only 1 day matters. We set our shard duration to 8 hours. We have 3 shards per day…and when we use the longer duration queries (which are less typical than the 1-2 hour ones), everything performs quite well.

I would suggest using the latest java client – 2.12. See enhancements here:
https://www.influxdata.com/blog/influxdb-java-client-performance-and-functionality-improvements/