We use AWS to pipe our data via Lambda functions (Python) to write to Synchronously InfluxDB 2.0 Cloud using
write_api.write()
We write about once every 2 sec from two or three simultaneously running Lambda clients.
Ever 1000 successful calls (approximately 50 times / day) the write() call fails with
ApiException: (503) Reason: Service Unavailable; upstream connect error or disconnect/reset before headers. reset reason: connection failure
Because of the way AWS Lambda functions work, we can never be sure how many Lambda functions are actually running. As load increases, more lambda functions will be started and may be running / writing simultaneously.
I was under the impression that InfluxDB 2.0 Cloud could handle millions of writes / sec from multiple clients. So Iām puzzled how 2 or 3 concurrent Python clients calling write_api.write() are causing problems.
Are there any limitations Iām overstepping that I should know about?
Thanks @Anaisdg
Can I ask what happens if two Python clients try to write simultaneously to the same bucket (i.e. one client calls client.write() and before it has returned (i.e. 200-300ms later) the other client calls client.write().
The code is pretty mature and has been running for over a year. As we connect machines we are seeing higher data rates. But this code is only being executed once every 2 sec or so. However, AWS does create 2-3 copies of the Lambda function which are running simultaneously.
What would really really help is to test if the problem goes away by using HTTPS requests.
I have not been able to find a python example of sending data to influx using HTTPS. If you have something then I can try a HTTPS version.
Iāve spent the day trying to implement a https solution but Iām not getting anywhere. Iām sure that a failure to write should cause retries no? Not an exception. I donāt really understand the InfluxDB Python client or the underlying HTTP / Sockets connection.
Is that what is happening? The socket is getting closed and the client tries to write on a closed / dead socket? In that case; How would I
Ahaaa. Yes that looks right.
Regarding the use of Python requests
Do you think that this will make any difference to disconnections?
Because doesnāt the
influxdb_client.client.write_api
use https under the hood anyway?
And did you have any thoughts about the effects of two clients calling client_api.write() at the same time?
Hi All, sorry for late response I was out of grid.
There isnāt know a limitation for this.
Perform second request to InfluxDB Server.
The upstream connect error or disconnect/reset before headers. error is from AWS Lambda network stack. The ācoreā error is connection failure. I am not expert to AWS Lamda but it can be same network issue.
upstream connect error or disconnect/reset before headers.
is a Lambda / AWS one. However, I have seen the same error when Iām on InfluxDB 2.0 Cloud and exploring data. Are we sure that itās not an availability issue that happens from time to time?
@bednar , I appreciate the suggestion of a solution (sending a second request if the first fails).
But to return to my question for a moment ⦠are we saying that InfluxDB can cope with simultaneous requests, or weāre saying that it canāt (i.e. one of them will fail and therefore it will be always necessary to anticipate the need for retries)?
Because Iād like it to try 10 times over the next few seconds but raise an exception if it fails
Can I ask why we would ever need redirect>0?
Is there any legitimate case in which an attempt to write to InfluxDB Cloud 2.0 would get redirected?
I know my API key gets stripped but isnāt it better security to say there are no redirects?
I am also testing an alternative function that uses only urllib3 but doesnāt use retries.
So far it works but has two negative side effects:
it takes 50% longer to write than the python InfluxDB library (200ms rather than 125ms). With millions of writes / month and AWS billing Lambda code by millisecond, this is significant.
it screws up the use of AWS boto3 library (some conflict between import requests and import boto3
I will report back with my findings on the best fix when I fully test it (for others to find)
Cheers
Andrew