Telegraf inputs.http: i/o timeout and context deadline exceeded (Client.Timeout exceeded while awaiting headers)

I’m unsure if this is a bug or if I’m overlooking something obvious. I’ve examined the issue in great detail and have searched github issues, this forum, and the general internet.

I should not be seeing this in the logs when I can connect and get data from the API by other means. I’m near certain influxdb is not an issue here as other plugins are writing successfully.

** anticipating question: have you tried a different API? Yes, several. This one was chosen to demo because it’s totally open / no key required (and very interesting content). All other APIs used this way have undergone/ passed/failed same rigorous testing shown below in same exact way

This is my input definition. An API (api.artic.edu) request, expecting json, interval at 35 seconds:

[[inputs.http]]

urls = [ "https://api.artic.edu/api/v1/artworks/search?q=cats" ]
interval = "35s"
use_system_proxy = true
timeout = "15s"
data_format = "json"

I see this in my logs regularly. I can reproduce consistently. Alternating dial tcp and context deadline exceeded:



Dec 10 12:05:10 ingest001 telegraf[10718]: 2023-12-10T20:05:10Z E! [inputs.http] Error in plugin: [url=https://api.artic.edu/api/v1/artworks/search?q=cats]: Get "https://api.artic.edu/api/v1/artworks/search?q=cats": dial tcp 108.138.85.28:443: i/o timeout (Client.Timeout exceeded while awaiting headers)
Dec 10 12:05:45 ingest001 telegraf[10718]: 2023-12-10T20:05:45Z E! [inputs.http] Error in plugin: [url=https://api.artic.edu/api/v1/artworks/search?q=cats]: Get "https://api.artic.edu/api/v1/artworks/search?q=cats": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Dec 10 12:06:20 ingest001 telegraf[10718]: 2023-12-10T20:06:20Z E! [inputs.http] Error in plugin: [url=https://api.artic.edu/api/v1/artworks/search?q=cats]: Get "https://api.artic.edu/api/v1/artworks/search?q=cats": dial tcp 108.138.85.25:443: i/o timeout (Client.Timeout exceeded while awaiting headers)
Dec 10 12:06:55 ingest001 telegraf[10718]: 2023-12-10T20:06:55Z E! [inputs.http] Error in plugin: [url=https://api.artic.edu/api/v1/artworks/search?q=cats]: Get "https://api.artic.edu/api/v1/artworks/search?q=cats": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

From the same system (ingest001) I can get data from the API

# curl -v 'https://api.artic.edu/api/v1/artworks/search?q=cats'
* Uses proxy env variable no_proxy == 'localhost, .somewhere.io'
* Uses proxy env variable https_proxy == 'http://proxy.somewhere.io:8118'
*   Trying 192.168.1.30:8118...
* Connected to (nil) (192.168.1.30) port 8118 (#0)
* allocate connect buffer!
* Establish HTTP proxy tunnel to api.artic.edu:443
> CONNECT api.artic.edu:443 HTTP/1.1
> Host: api.artic.edu:443
> User-Agent: curl/7.81.0
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 200 OK
< date: Sun, 10 Dec 2023 20:40:51 GMT
< transfer-encoding: chunked
* Ignoring Transfer-Encoding in CONNECT 200 response
< set-cookie: SERVERID=proxy01; path=/
<
* Proxy replied 200 to CONNECT request
* CONNECT phase completed!
* ALPN, offering h2
* ALPN, offering http/1.1
[ ... ]

< content-type: application/json
< server: Apache/2.4.58 () OpenSSL/1.0.2k-fips
< access-control-allow-origin: *
< date: Sun, 10 Dec 2023 20:40:52 GMT
< cache-control: no-cache, private
< etag: "02c1253400c6505230e952333558acaf"
< vary: Accept-Encoding
< x-cache: RefreshHit from cloudfront
< via: 1.1 55545918b0c914bb8f5282930649df4c.cloudfront.net (CloudFront)
< x-amz-cf-pop: IAD12-P2
< x-amz-cf-id: JH2v7mX2QhjDsAqf-bvLMBzG93kyFojE17eoYpBpAr6WBds3W47ULg==
<
{"preference":null,"pagination":{"total":7346,"limit":10,"offset":0,"total_pages":735,"current_page":1},"data":[{"_score":135.74696,"thumbnail":{"alt_text":"A bronze lion, deep green and muscular, looks out in the distance from its pedestal in front of the Art Institute of Chicago." [ ... ]

I can run a telegraf --test and get data:

# telegraf --debug --test --config testing_http_temp_trash.conf
2023-12-10T20:16:42Z I! Loading config: testing_http_temp_trash.conf
2023-12-10T20:16:42Z I! Starting Telegraf 1.28.5 brought to you by InfluxData the makers of InfluxDB
2023-12-10T20:16:42Z I! Available plugins: 240 inputs, 9 aggregators, 29 processors, 24 parsers, 59 outputs, 5 secret-stores
2023-12-10T20:16:42Z I! Loaded inputs: http
2023-12-10T20:16:42Z I! Loaded aggregators:
2023-12-10T20:16:42Z I! Loaded processors:
2023-12-10T20:16:42Z I! Loaded secretstores:
2023-12-10T20:16:42Z W! Outputs are not used in testing mode!
2023-12-10T20:16:42Z I! Tags enabled: host=ingest001
2023-12-10T20:16:42Z D! [agent] Initializing plugins
2023-12-10T20:16:42Z D! [agent] Starting service inputs
2023-12-10T20:16:42Z D! [agent] Stopping service inputs
2023-12-10T20:16:42Z D! [agent] Input channel closed
2023-12-10T20:16:42Z D! [agent] Stopped Successfully
> http,host=ingest001,url=https://api.artic.edu/api/v1/artworks/search?q\=cats data_0__score=135.76062,data_0_id=656,data_0_thumbnail_height=5620,data_0_thumbnail_width=8430,data_1__score=119.26675,data_1_id=117241,data_1_thumbnail_height=7661,data_1_thumbnail_width=6486,data_2__score=115.99253,data_2_id=45259,data_2_thumbnail_ [ ... ]

testing_http_temp_trash.conf

[[inputs.http]]

urls = [ "https://api.artic.edu/api/v1/artworks/search?q=cats" ]
interval = "35s"
use_system_proxy = true
timeout = "15s"
data_format = "json"

telegraf.conf

[global_tags]
[agent]
    interval = "10s"
    debug = false
    hostname = "ingest001.somewhere.io"
    round_interval = true
    flush_interval = "10s"
    flush_jitter = "0s"
    collection_jitter = "0s"
    metric_batch_size = 1000
    metric_buffer_limit = 10000
    quiet = false
    logfile = ""
    omit_hostname = false
[[outputs.influxdb]]
    urls = ["https://influxdb.somewhere.io:8086"]
    username = "influx_telegraf"
    password = "hellothere!"
    insecure_skip_verify = true
    database = "telegraf"

Client.Timeout exceeded while awaiting headers

From the FAQ:

This is a generic error received from Go’s HTTP client. It is generally the result of a network blip or hiccup as a result of a DNS, proxy, firewall, and/or other network issue.

The error should be temporary and Telegraf will recover shortly after without the loss of data.

use_system_proxy = true

I see you are using a proxy. It could be that your proxy is stopping your traffic or trying to cache something along the way.

Thanks for responding. However, what’s really concerning is this message is repeated at every single interval. Also, influx is not receiving ANY data from this input.

As shown in the troubleshooting above you can see the proxy connection is fine, I’m getting data, etc. Also, there is no issue when this input is executed specifically.

The scenario here is all alternate means of doing what telegraf should do with this input is working. However, when telegraf itself makes an attempt, it fails.

Again, I may be overlooking something obvious, but all of the evidence I’ve uncovered here is contradicting acceptable functionality or occasional blip.

However, when telegraf itself makes an attempt, it fails.

Well no, you showed that when your user runs it as --test it works, so what is different between when you run it with --test and how you are normally running it? Is Telegraf then running as a service? Does that service have access or have the proxy set as well?

what’s really concerning is this message is repeated at every single interval . Also, influx is not receiving ANY data from this input.

This means that something is preventing the http request from ever getting there, hence no data is received, and as a result no data is generated and nothing to send to InfluxDB.

Yes, yes, and… my apologies, I took time to construct the scenario above and may have been unclear or overlooked some critical details:

  • telegraf can run the input shown with --test and is successful 100% reproduction.
  • when telegraf runs that same input as a service (a la systemd), it is prints the error shown 100% reproduction.
  • --test execution, telegraf running as service, and curl test shown above on same host. 100% access to proxy. Also, 100% access to influxdb (‘other plugins are writing successfully’).

I’ve been known to make bad assumptions so correct me if I’m wrong: (we expect) telegraf to get data from api. Once data is got, it would send to influx. If data is not got, it would not even bother sending to influx. Right?

With this assumption I’m focusing on the first problem of getting the data before wondering why it’s not being written to influx.

Correct - so my theory is that your service is unable to access the proxy correctly and hence your HTTP input plugin is unable to gather data in the first place. I see you have use system proxy set, are you using the https_proxy or http_proxy environment variables? Does the service see these?

Again, yes 100% of the time:

From the same system (ingest001) I can get data from the API

# curl -v 'https://api.artic.edu/api/v1/artworks/search?q=cats'
* Uses proxy env variable no_proxy == 'localhost, .somewhere.io'
* Uses proxy env variable https_proxy == 'http://proxy.somewhere.io:8118'
*   Trying 192.168.1.30:8118...
...
200
...
{data!}

Does the service see these variables

Does the service see these variables

Great question. By ‘service’ I assume you mean telegraf. I have to believe that yes, telegraf does acknowlede the system defined http_proxy (and https_proxy, no_proxy). That is, while telegraf --debug doesn’t offer env, I can see every single request in the proxy log. Also, names don’t always mean much but use_system_proxy tells me it’s going to use the system defined proxy.

What I’ll do for further verification is use http_proxy_url in place of (or in addition to) use_system_proxy.

With http_proxy_url defined:

# telegraf --debug --test --config testing_http_temp_trash.conf
2023-12-11T17:59:58Z I! Loading config: testing_http_temp_trash.conf
2023-12-11T17:59:58Z I! Starting Telegraf 1.28.5 brought to you by InfluxData the makers of InfluxDB
2023-12-11T17:59:58Z I! Available plugins: 240 inputs, 9 aggregators, 29 processors, 24 parsers, 59 outputs, 5 secret-stores
2023-12-11T17:59:58Z I! Loaded inputs: http
2023-12-11T17:59:58Z I! Loaded aggregators:
2023-12-11T17:59:58Z I! Loaded processors:
2023-12-11T17:59:58Z I! Loaded secretstores:
2023-12-11T17:59:58Z W! Outputs are not used in testing mode!
2023-12-11T17:59:58Z I! Tags enabled: host=ingest001
2023-12-11T17:59:58Z D! [agent] Initializing plugins
2023-12-11T17:59:58Z D! [agent] Starting service inputs
2023-12-11T17:59:59Z D! [agent] Stopping service inputs
2023-12-11T17:59:59Z D! [agent] Input channel closed
2023-12-11T17:59:59Z D! [agent] Stopped Successfully
> http,host=ingest001,url=https://api.artic.edu/api/v1/artworks/search?q\=cats data_0__score=135.60323
[ ... ]

testing_http_temp_trash.conf:

[[inputs.http]]

urls = [ "https://api.artic.edu/api/v1/artworks/search?q=cats" ]
interval = "35s"
# use_system_proxy = true
http_proxy_url = "http://proxy.somewhere.io:8118"
timeout = "15s"
data_format = "json"

proxy log (http response code is hidden in the middle ...1225 200 8969...:

Dec 11 09:59:59 ingest001 haproxy[4072503]: 192.168.1.30:49274 [11/Dec/2023:09:59:58.025] http_front http_back/proxy01 0/0/0/75/1225 200 8969 - - --NI 17/17/16/8/0 0/0 "CONNECT api.artic.edu:443 HTTP/1.1"

So this is the same as shown above. Let’s mess w/telegraf and see how it responds:

http_proxy_url = "http://proxyfoo.somewhere.io:8118"

2023-12-11T18:09:14Z E! [inputs.http] Error in plugin: [url=https://api.artic.edu/api/v1/artworks/search?q=cats]: Get "https://api.artic.edu/api/v1/artworks/search?q=cats": proxyconnect tcp: dial tcp: lookup proxyfoo.somewhere.io on 192.168.1.25:53: no such host

…and with the other parameter:

# env | grep proxy
https_proxy=http://proxybar.somewhere.io:8818
http_proxy=http://proxybar.somewhere.io:8818

use_system_proxy = true

2023-12-11T18:12:59Z E! [inputs.http] Error in plugin: [url=https://api.artic.edu/api/v1/artworks/search?q=cats]: Get "https://api.artic.edu/api/v1/artworks/search?q=cats": proxyconnect tcp: dial tcp: lookup proxybar.somewhere.io on 192.168.1.25:53: no such host

So that is definitely using the system proxy or the defined proxy.

I have a solution/workaround.

Using the explicit proxy parameter works: http_proxy_url = "http://proxy.somewhere.io:8118" The data is collected, sent to influx, and no error appears in the log.

This is opposed to the use of: use_system_proxy = true

Although telegraf DOES have the http_proxy var set (see below). What I overlooked (and didn’t share above for sake of brevity) is the negation no_proxy . That is, 192.168.1.0/24 does not get a proxy, therefore telegraf was trying to make the call directly. Shoulda caught this sooner. That is not the proxy ipaddr, that’s the public ipaddr and 443 is blocked:

Dec 10 12:05:10 ingest001 telegraf[10718]: 2023-12-10T20:05:10Z E! [inputs.http] Error in plugin: [url=https://api.artic.edu/api/v1/artworks/search?q=cats]: Get "https://api.artic.edu/api/v1/artworks/search?q=cats": dial tcp 108.138.85.28:443: i/o timeout (Client.Timeout exceeded while awaiting headers)

# sudo -u telegraf env
MAIL=/var/mail/telegraf
LOGNAME=telegraf
USER=telegraf
HOME=/etc/telegraf
SHELL=/bin/false
SUDO_COMMAND=/usr/bin/env
SUDO_USER=root
SUDO_UID=0
SUDO_GID=0
http_proxy=http://proxy.somewhere.io:8118
https_proxy=http://proxy.somewhere.io:8118
ftp_proxy=ftp://proxy.somewhere.io:8118
no_proxy=localhost, .somewhere.io, `echo 192.168.1.{1..255} | sed 's/ /,/g'`, `echo 192.168.20.{1..255} | sed 's/ /,/g'`, `echo 192.168.25.{1..255} | sed 's/ /,/g'`

With recent changes to our environments similar issues have surfaced. It’s a bit of a shit show and this analysis helps: We need to talk: Can we standardize NO_PROXY?