Write to InfluxDB cloud fails with ApiException: (503) Reason: Service Unavailable; upstream connect error or disconnect/reset before headers. reset reason: connection failure

asmith · September 1, 2021, 3:36pm

We use AWS to pipe our data via Lambda functions (Python) to write to Synchronously InfluxDB 2.0 Cloud using

write_api.write()

We write about once every 2 sec from two or three simultaneously running Lambda clients.

Ever 1000 successful calls (approximately 50 times / day) the write() call fails with

ApiException: (503) Reason: Service Unavailable; upstream connect error or disconnect/reset before headers. reset reason: connection failure

Because of the way AWS Lambda functions work, we can never be sure how many Lambda functions are actually running. As load increases, more lambda functions will be started and may be running / writing simultaneously.

I was under the impression that InfluxDB 2.0 Cloud could handle millions of writes / sec from multiple clients. So I’m puzzled how 2 or 3 concurrent Python clients calling write_api.write() are causing problems.

Are there any limitations I’m overstepping that I should know about?

Anaisdg · September 1, 2021, 8:59pm

@asmith,
Can you please share your script?
@bednar do you know of any limitations?

asmith · September 2, 2021, 7:14am

Thanks @Anaisdg
Can I ask what happens if two Python clients try to write simultaneously to the same bucket (i.e. one client calls client.write() and before it has returned (i.e. 200-300ms later) the other client calls client.write().

asmith · September 2, 2021, 7:22am

import json
import boto3
from influxdb_client import InfluxDBClient
from influxdb_client.client.write_api import SYNCHRONOUS
import re
import dateutil.parser
from datetime import datetime, timezone
import os
import sys, traceback
import logging

logger = logging.getLogger()
logger.setLevel(logging.WARNING)

#SET UP THE THINGS THAT NEED ONLY CALLING ONCE

# SET UP THE INFLUX DB CONNECTION
InfluxBucket = os.environ['InfluxBucket']
influx_client = InfluxDBClient(url=os.environ['influx_url'], token=os.environ['token'])
write_api = influx_client.write_api(write_options=SYNCHRONOUS)
epoch = dateutil.parser.parse('1970-01-01T00:00:00.000Z')
Influx_TAGS_shadowname = os.environ['TAGS_shadowname']
shadow_client = boto3.client('iot-data', region_name='eu-central-1')
 
code_link='https://eu-central-1.console.aws.amazon.com/lambda/home?region=eu-central-1#/functions/MQTT_InfluxDB?tab=code'

def get_JSON_logger(message, logLevel, do_trace, event, context, err_no):
    logmessage = {}
    logmessage["logLevel"] =logLevel
    logmessage["Error Number"] =err_no
    logmessage["message"]=message
    if do_trace: logmessage["traceback"]=traceback.format_exc(limit=1, chain=True) 
    else: logmessage["traceback"]=None
    logmessage["event"] =event
    logmessage["function"] =context.function_name
    logmessage["code_link"]=code_link
    return logmessage
        
def lambda_handler(event, context):
    
    Influx_lines=[]
''' I've removed the code that prepares Influx_lines 
....
'''
    try:
        if len(Influx_lines)>0:
            influx_returns = write_api.write(InfluxBucket, "xxxxxxxxxxxxxxxxx", Influx_lines,'ms')

        else:
            logmessage = get_JSON_logger('received event, but no InfluxDB data could be constructed from it', 'ERROR', True, event, context, 2001)
            logmessage['influx_returns']=influx_returns
            logmessage['Influx_lines']=Influx_lines
            logger.error(json.dumps(logmessage, indent = 5))
            return { 'Lambda FAIL': 400 }
    
            
    except:
        logmessage = get_JSON_logger('exception', 'ERROR', True, event, context, 2000)
        logmessage['influx_returns']=influx_returns
        logmessage['Influx_lines']=Influx_lines
        # add other elements here
        logger.error(json.dumps(logmessage, indent = 5))
        return { 'Lambda FAIL': 400 }

    ###########
    return {
        'SUCCESS': 200
    }

asmith · September 2, 2021, 7:27am

The error returned is caught in the catch section:

“traceback”: "Traceback (most recent call last):
File "/var/task/lambda_function.py", line 109, in lambda_handler\n influx_returns = write_api.write(InfluxBucket, "xxxxxxxxxxxxxxxx", Influx_lines,‘ms’)
influxdb_client.rest.ApiException: (503)
Reason: Service Unavailable
HTTP response headers: HTTPHeaderDict({‘Date’: ‘Thu, 02 Sep 2021 01:10:44 GMT’, ‘Content-Type’: ‘text/plain’, ‘Content-Length’: ‘91’, ‘Connection’: ‘keep-alive’, ‘Strict-Transport-Security’: ‘max-age=15724800; includeSubDomains’})
HTTP response body: upstream connect error or disconnect/reset before headers. reset reason: connection failure"

asmith · September 2, 2021, 7:36am

The code is pretty mature and has been running for over a year. As we connect machines we are seeing higher data rates. But this code is only being executed once every 2 sec or so. However, AWS does create 2-3 copies of the Lambda function which are running simultaneously.

asmith · September 2, 2021, 8:33am

What would really really help is to test if the problem goes away by using HTTPS requests.
I have not been able to find a python example of sending data to influx using HTTPS. If you have something then I can try a HTTPS version.

asmith · September 2, 2021, 5:20pm

@Anaisdg and @bednar
Thanks for engaging…

I’ve spent the day trying to implement a https solution but I’m not getting anywhere. I’m sure that a failure to write should cause retries no? Not an exception. I don’t really understand the InfluxDB Python client or the underlying HTTP / Sockets connection.

Is that what is happening? The socket is getting closed and the client tries to write on a closed / dead socket? In that case; How would I

detect this and
prevent the exception and
make sure my data got written reliably

asmith · September 2, 2021, 5:40pm

Should I be using Retries functionality in order to prevent this?

github.com

influxdata/influxdb-client-python/blob/master/examples/import_data_set_sync_batching.py

"""
How to use RxPY to prepare batches for synchronous write into InfluxDB
"""

from csv import DictReader

import reactivex as rx
from reactivex import operators as ops

from influxdb_client import InfluxDBClient, Point
from influxdb_client.client.write.retry import WritesRetry
from influxdb_client.client.write_api import SYNCHRONOUS


def csv_to_generator(csv_file_path):
    """
    Parse your CSV file into generator
    """
    for row in DictReader(open(csv_file_path, 'r')):
        point = Point('financial-analysis') \

This file has been truncated. show original

Anaisdg · September 3, 2021, 2:59pm

Hello @asmith,
Here are some examples using the requests library with python:

github.com

InfluxCommunity/iot_app/blob/main/project/api.py

#  api.py

import requests
import json
import os
import secrets

BOLD = '\u001b[1m'
WHITE  = '\33[37m'
END = '\033[0m'

flask_token = os.environ['INFLUX_FLASK_TOKEN']
flask_orgid = os.environ['INFLUX_FLASK_ORGID']
proto="http://"
domain="us-west-2-1.aws.cloud2.influxdata.com"
api_path="/api/v2/authorizations/"
url = proto + domain + api_path

def read_token():
    headers = {"Authorization": "Token " + flask_token}

This file has been truncated. show original

I’m not sure about the rest. Yes I would try to use retry options.
@bednar can you help here?

I’m not sure how often he’s on the forums. You might want to just submit an issue in the client library repo.

asmith · September 3, 2021, 3:16pm

Thanks @Anaisdg ,
I’ve seen similar code but what is always missing is the actual place where we put influx data

measures,host=“myhost” data1=34,data2=2.5 <timestamp

Do you have any idea where this goes since the “payload” above seems taken up with other information.

Thanks

Anaisdg · September 3, 2021, 3:34pm

Hello @asmith,
It should look something like this:

proto="http://"
domain="us-west-2-1.aws.cloud2.influxdata.com"
api_path="/api/v2/write/"
query_param = "write?org=YOUR_ORG&bucket=YOUR_BUCKET&precision=ns"
url = proto + domain + api_path + query_param

headers = {"Authorization": "Token " + my_token}
raw_data = 
 "
mem,host=host1 used_percent=23.43234543 1556896326
mem,host=host2 used_percent=26.81522361 1556896326
mem,host=host1 used_percent=22.52984738 1556896336
mem,host=host2 used_percent=27.18294630 1556896336
"
r = requests.post(url, headers=headers, data=raw_data)

asmith · September 3, 2021, 3:42pm

Ahaaa. Yes that looks right.
Regarding the use of Python requests
Do you think that this will make any difference to disconnections?
Because doesn’t the

influxdb_client.client.write_api

use https under the hood anyway?

And did you have any thoughts about the effects of two clients calling client_api.write() at the same time?

Anaisdg · September 3, 2021, 4:39pm

The client support synchronous writes.

I don’t think it should but it can’t hurt to test it I suppose.

bednar · September 6, 2021, 7:02am

Hi All, sorry for late response I was out of grid.

There isn’t know a limitation for this.

Perform second request to InfluxDB Server.

The upstream connect error or disconnect/reset before headers. error is from AWS Lambda network stack. The “core” error is connection failure. I am not expert to AWS Lamda but it can be same network issue.

Yes, you have to configure retries this way.

Regards

asmith · September 6, 2021, 7:26am

Thanks @bednar,
You mentioned that the error

upstream connect error or disconnect/reset before headers.

is a Lambda / AWS one. However, I have seen the same error when I’m on InfluxDB 2.0 Cloud and exploring data. Are we sure that it’s not an availability issue that happens from time to time?

asmith · September 6, 2021, 7:31am

@bednar , I appreciate the suggestion of a solution (sending a second request if the first fails).
But to return to my question for a moment … are we saying that InfluxDB can cope with simultaneous requests, or we’re saying that it can’t (i.e. one of them will fail and therefore it will be always necessary to anticipate the need for retries)?

bednar · September 6, 2021, 7:52am

It can.

As a solution use retry strategy. You can configure retries by: GitHub - influxdata/influxdb-client-python: InfluxDB 2.0 python client

Regards

asmith · September 6, 2021, 4:05pm

@bednar that link is BRILLIANT!
Should be in the docs!
I can also recommend Utilities - urllib3 2.0.0a3 documentation for details of the parameters to use in retries.

I have used

retries = Retry(total=10, redirect=0,backoff_factor=0.1,raise_on_redirect=True,raise_on_status=True)

Because I’d like it to try 10 times over the next few seconds but raise an exception if it fails

Can I ask why we would ever need redirect>0?
Is there any legitimate case in which an attempt to write to InfluxDB Cloud 2.0 would get redirected?
I know my API key gets stripped but isn’t it better security to say there are no redirects?

I am also testing an alternative function that uses only urllib3 but doesn’t use retries.

import requests
influx_returns=requests.post(url, data=Influx_lines, headers=headers)

So far it works but has two negative side effects:

it takes 50% longer to write than the python InfluxDB library (200ms rather than 125ms). With millions of writes / month and AWS billing Lambda code by millisecond, this is significant.
it screws up the use of AWS boto3 library (some conflict between import requests and import boto3

I will report back with my findings on the best fix when I fully test it (for others to find)
Cheers
Andrew

bednar · September 7, 2021, 5:47am

It is useful when you use a HTTP proxy.

I don’t think so.

Checkout this doc to disable stripping your key - GitHub - influxdata/influxdb-client-python: InfluxDB 2.0 python client

Topic		Replies	Views
Bug in Javascript client SDK and cloud error Client SDKs influxdb , client-libraries , javascript	4	227	March 20, 2024
Problem writing to InfluxDB Cloud from AWS Lambda InfluxDB 2 influxdb	3	1966	August 24, 2020
InfluxDBv2.3 (Ubuntu 22.04) is stuck after large writing operation (python) InfluxDB 2 influxdb , python	7	1707	September 9, 2022
Call to /api/v2/write yields 500 internal server error due to timeout InfluxDB 2	1	294	July 14, 2024
Influx Stops Writing	4	1854	June 17, 2017

Write to InfluxDB cloud fails with ApiException: (503) Reason: Service Unavailable; upstream connect error or disconnect/reset before headers. reset reason: connection failure

Related topics