Context deadline exceeded (Client.Timeout exceeded while awaiting headers) in Influxdb 2.0

I am running one container of Influxdb 2.0 and one container of telegraf and I am getting data from the AWS kinesis and using telegraf as a plugin to push into influxdb. Everything I was working fine where I was using influxdb 1.8 version and I was getting data into influxdb using kinesis as well but recently I updated my influxdb 1.8 to influxdb 2.0 now telegraf plugin is not working fine. I tried manually configuring the telegraf from the influxdb 2.0 and still, it’s giving errors.
Telegraf Configuration

[[outputs.influxdb_v2]]
   ## urls exp: http://127.0.0.1:8086
   ##urls = ["http://influxdb:8086"]
   urls = ["http://14.206.177.28:8086"]
   token = "$INFLUX_TOKEN"
   organization = "airvana"
   bucket = "onecell_logs"
[[inputs.kinesis_consumer]]
   region = "ap-south-1"
    access_key ="AKIAZWANSIFMDYO"
    secret_key = "mWEBI4UbErGwI9GCDE8RrFU+GPfpgyuL/yMd+8"
    profile = "arn:aws:iam::666268854852:instance-profile/ec2_to_aws_admin"
    streamname = "atlas_cu_om"
    shard_iterator_type = "TRIM_HORIZON"
   data_format = "csv"
  csv_header_row_count = 1
  csv_tag_columns = ["node","operatorName","type"]
  csv_timestamp_column = "datetime"
  csv_timestamp_format = "2006-01-02T15:04:05" 

In URLs, I am not sure we have to pass the IP and port or influxdb:port. I tried with both but still, I am getting errors.

thanks

Hello @Ravikant_Gautam,
Are you using an all access token?
Can you please include debug=true in the agent portion of your telegraf config and share the telegraf logs?
I also like to print to stdout with telegraf to help me debug/make sure the line protocol is as expected.

and what was now the solution?

Hello @costa299,
I’m not sure. I guess the user marked solution because either they a) needed an all. access token or b) were able to debug by setting debug=true ?
What problem are you having specifically?

Hi @Ravikant_Gautam - Can you please confirm what the issue was? I’m trying to troubleshoot the same issue, but haven’t been able to figure out the solution. If you can share what worked for you that would be much appreciated. Thanks!

I am using the telegraf 1.18 and InfluxDB 2.0 as docker containers in my ec2 machine.
Earlier when I was getting this error because I haven’t defined the token in my telegraf.conf file but after defining the token in telegraf.conf error still persists but telegraf able to write the data in Influxdb.
I think it’s a warning when telegraf starts it wants to communicate to the Influxdb but InfluxDB is not ready that time so it starts giving this error.
As you can see below I am getting the same error but still, my telegraf is working fine and writing data to the InfluxDB.


You can see giving the error for connection refused but I just pushed the data right now from Kinesis stream to InfluxDB using telegraf and I got the correct data in the proper format, So for me with this error, everything is working but after some time this error goes automatically that’s why I referred this as a warning.

I was facing the same issue in InfluxDB 1.8 and at that time everything was working fine even with this error as well.
If you are not getting the data in InfluxDB then it must be token error try to import the token in telegraf.conf file.

Can you tell me which input and output plugin you are using so I can help you better?

That’s an interesting observation. I had likewise defined my token via an environment variable and tried to define the token in the configuration, but am still facing the same error unfortunately.

Here’s what I’m seeing in the log file:

2021-04-12T05:29:43Z E! [outputs.influxdb_v2] When writing to [http://hdc-gfappd1:8086]: Post "http://hdc-gfappd1:8086/api/v2/write?bucket=telegraf%2Ftwo_months&org=wiltsegroup": context deadline exceeded (Client.Timeoeeded while awaiting headers)
2021-04-12T05:29:43Z E! [agent] Error writing to outputs.influxdb_v2: Post "http://hdc-gfappd1:8086/api/v2/write?bucket=telegraf%2Ftwo_months&org=wiltsegroup": context deadline exceeded (Client.Timeout exceeded while ag headers)
2021-04-12T05:29:51Z E! [outputs.influxdb_v2] When writing to [http://hdc-gfappd1:8086]: Post "http://hdc-gfappd1:8086/api/v2/write?bucket=telegraf%2Ftwo_months&org=wiltsegroup": dial tcp 10.10.100.107:8086: i/o timeouent.Timeout exceeded while awaiting headers)
2021-04-12T05:29:51Z E! [agent] Error writing to outputs.influxdb_v2: Post "http://hdc-gfappd1:8086/api/v2/write?bucket=telegraf%2Ftwo_months&org=wiltsegroup": dial tcp 10.10.100.107:8086: i/o timeout (Client.Timeout ed while awaiting headers)
2021-04-12T05:33:10Z E! [outputs.influxdb_v2] When writing to [http://hdc-gfappd1:8086]: Post "http://hdc-gfappd1:8086/api/v2/write?bucket=telegraf%2Ftwo_months&org=wiltsegroup": context deadline exceeded (Client.Timeoeeded while awaiting headers)
2021-04-12T05:33:10Z E! [agent] Error writing to outputs.influxdb_v2: Post "http://hdc-gfappd1:8086/api/v2/write?bucket=telegraf%2Ftwo_months&org=wiltsegroup": context deadline exceeded (Client.Timeout exceeded while ag headers)

My Telegraf config is pretty vanilla - just getting some machine stats from a Raspberry Pi 3.

Here’s the config:

[agent]
   # Batch size of values that Telegraf sends to output plugins.
   metric_batch_size = 1000
   # Default data collection interval for inputs.
   interval = "30s"
   # Added degree of randomness in the collection interval.
   collection_jitter = "5s"
   # Send output every 5 seconds
   flush_interval = "5s"
   # Buffer size for failed writes.
   metric_buffer_limit = 10000
   # Run in quiet mode, i.e don't display anything on the console.
   quiet = true
   debug = true
   # Specify the log file name. The empty string means to log to stderr.
   logfile = "/var/log/telegraf/telegraf.log"

# Read metrics about cpu usage
[[inputs.cpu]]
   ## Whether to report per-cpu stats or not
   percpu = false
   ## Whether to report total system cpu stats or not
   totalcpu = true
   ## If true, collect raw CPU time metrics.
   collect_cpu_time = false
   ## If true, compute and report the sum of all non-idle CPU states.
   report_active = true

[[inputs.disk]]
  ## By default stats will be gathered for all mount points.
  ## Set mount_points will restrict the stats to only the specified mount points.
  # mount_points = ["/"]

  ## Ignore mount points by filesystem type.
  ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs", "cifs"]

 [[outputs.influxdb_v2]]
  ## The URLs of the InfluxDB cluster nodes.
  ##
  ## Multiple URLs can be specified for a single cluster, only ONE of the
  ## urls will be written to each interval.
  ## urls exp: http://127.0.0.1:8086
  urls = ["http://hdc-gfappd1:8086"]
  #urls = ["http://10.10.100.107:8086"]

  ## Token for authentication.
  #token = "$INFLUX_TOKEN"
  token = "<token removed for posting>"

  ## Organization is the name of the organization you wish to write to; must exist.
  organization = "wiltsegroup"

  ## Destination bucket to write into.
  bucket = "telegraf/two_months"

With the exception of defining the api token directly in the config file, I’m using the exact same configuration on a separate computer (Raspberry Pi Zero W) and Telegraf is working fine there.

Any thoughts on what else to check?

Thanks much for the help. Let me know if you’d like to know any other details and I’ll be happy to share them.

Edit: Fixed Telegraf Config

Try to debug your operation using the below output plugin and comment the [[outputs.influxdb_v2]]

[[outputs.file]]
      ## Files to write to, "stdout" is a specially handled file.
      files = ["stdout"] # give any path where you can store the temporary data e.g /tmp/metrics.out 

      ## Data format to output.
      ## Each data format has its own unique set of configuration options, read
      ## more about them here:      data_format = "influx"

If your telegraf is working fine but it facing the problem writing into the InfluxDB then it will write the data on the above-given path and if telegraf is not working fine then you won’t get any output in the file as well then it means you have a problem from the source.
Try this step and let me know then I can help better.

1 Like

Thanks for the tip! I tried that and from what I can tell Telegraf is working fine. It successfully wrote the expected data both to stdout and to my temp file. Sample output:

disk,device=mmcblk0p7,fstype=ext4,host=pihole.wiltsegroup.local,mode=rw,path=/ inodes_used=152944i,total=29743366144i,free=16544137216i,used=11664719872i,used_percent=41.35126721231876,inodes_total=1855952i,inodes_free=1703008i 1618373824000000000
disk,device=mmcblk0p6,fstype=vfat,host=pihole.wiltsegroup.local,mode=rw,path=/boot inodes_free=0i,inodes_used=0i,total=68124672i,free=44721152i,used=23403520i,used_percent=34.35395622895623,inodes_total=0i 1618373824000000000
cpu,cpu=cpu-total,host=pihole.wiltsegroup.local usage_system=0.15183146707140577,usage_nice=0,usage_iowait=1.0343518694249227,usage_softirq=0,usage_steal=0,usage_user=0.22774720060710868,usage_idle=98.58606946290509,usage_irq=0,usage_guest=0,usage_guest_nice=0,usage_active=1.413930537$
disk,device=mmcblk0p7,fstype=ext4,host=pihole.wiltsegroup.local,mode=rw,path=/ total=29743366144i,free=16544149504i,used=11664707584i,used_percent=41.351223651532294,inodes_total=1855952i,inodes_free=1703009i,inodes_used=152943i 1618373853000000000
disk,device=mmcblk0p6,fstype=vfat,host=pihole.wiltsegroup.local,mode=rw,path=/boot free=44721152i,used=23403520i,used_percent=34.35395622895623,inodes_total=0i,inodes_free=0i,inodes_used=0i,total=68124672i 1618373853000000000
cpu,cpu=cpu-total,host=pihole.wiltsegroup.local usage_user=0.18833933835573777,usage_nice=0,usage_irq=0,usage_guest=0,usage_active=0.6059613494846995,usage_system=0.18833933835573777,usage_idle=99.3940386505153,usage_iowait=0.22928267278085768,usage_softirq=0,usage_steal=0,usage_guest$
disk,device=mmcblk0p7,fstype=ext4,host=pihole.wiltsegroup.local,mode=rw,path=/ total=29743366144i,free=16544141312i,used=11664715776i,used_percent=41.351252692056605,inodes_total=1855952i,inodes_free=1703009i,inodes_used=152943i 1618373883000000000
disk,device=mmcblk0p6,fstype=vfat,host=pihole.wiltsegroup.local,mode=rw,path=/boot total=68124672i,free=44721152i,used=23403520i,used_percent=34.35395622895623,inodes_total=0i,inodes_free=0i,inodes_used=0i 1618373883000000000

I’m running Telegraf 1.18.1 and Influx 2.0.4 for reference.

Thanks again for the help. Any other ideas of things I could try testing?

Can you try putting your data in a different bucket and choose the proper time range for querying the data?

[[outputs.influxdb_v2]]
  ## The URLs of the InfluxDB cluster nodes.
  ##
  ## Multiple URLs can be specified for a single cluster, only ONE of the
  ## urls will be written to each interval.
  ## urls exp: http://127.0.0.1:8086
  urls = ["http://hdc-gfappd1:8086"]
  #urls = ["http://10.10.100.107:8086"]

  ## Token for authentication.
  #token = "$INFLUX_TOKEN"
  token = "<token removed for posting>"

  ## Organization is the name of the organization you wish to write to; must exist.
  organization = "wiltsegroup"

  ## Destination bucket to write into.
  bucket = "telegraf/two_months" # e.g.  bucket = "temp"

and also check your token has read and write both permissions if not then create a different token or use the admin token for reading and writing the data into the InfluxDB.

If this also not working fine it means you have to test your InfluxDB for connection and also check the logs of the InfluxDB.
For the testing of your InfluxDB, use the python library and try to establish the connection using the same token and credentials.
I am sure after these steps you will figure out the problem.

Try to check the logs as soon as your telegraf restart.

Sure - I’ll give that a shot. I’ll check the influx logs as well. I’m less familiar with those, but I’m sure I can figure out how to dig through them.

I just restarted Telegraf to get a fresh set of logs and I see this:

pi@pihole:~ $ telegraf -config /etc/telegraf/telegraf.conf
2021-04-14T04:53:41Z I! Starting Telegraf 1.18.1
2021-04-14T04:53:41Z E! Unable to open /var/log/telegraf/telegraf.log (open /var/log/telegraf/telegraf.log: permission denied), using stderr
2021-04-14T04:53:41Z I! Loaded inputs: cpu disk
2021-04-14T04:53:41Z I! Loaded aggregators:
2021-04-14T04:53:41Z I! Loaded processors:
2021-04-14T04:53:41Z I! Loaded outputs: influxdb_v2
2021-04-14T04:53:41Z I! Tags enabled: host=pihole.wiltsegroup.local
2021-04-14T04:53:41Z I! [agent] Config: Interval:30s, Quiet:false, Hostname:"pihole.wiltsegroup.local", Flush Interval:5s
2021-04-14T04:53:41Z D! [agent] Initializing plugins
2021-04-14T04:53:41Z D! [agent] Connecting outputs
2021-04-14T04:53:41Z D! [agent] Attempting connection to [outputs.influxdb_v2]
2021-04-14T04:53:41Z D! [agent] Successfully connected to outputs.influxdb_v2
2021-04-14T04:53:41Z D! [agent] Starting service inputs
2021-04-14T04:53:46Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2021-04-14T04:53:51Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2021-04-14T04:53:56Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2021-04-14T04:54:06Z W! [agent] ["outputs.influxdb_v2"] did not complete within its flush interval
2021-04-14T04:54:06Z D! [outputs.influxdb_v2] Buffer fullness: 3 / 10000 metrics
2021-04-14T04:54:06Z E! [outputs.influxdb_v2] When writing to [http://hdc-gfappd1.wiltsegroup.local:8086]: Post "http://hdc-gfappd1.wiltsegroup.local:8086/api/v2/write?bucket=telegraf%2Ftwo_months&org=wiltsegroup": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021-04-14T04:54:06Z D! [outputs.influxdb_v2] Buffer fullness: 3 / 10000 metrics
2021-04-14T04:54:06Z E! [agent] Error writing to outputs.influxdb_v2: Post "http://hdc-gfappd1.wiltsegroup.local:8086/api/v2/write?bucket=telegraf%2Ftwo_months&org=wiltsegroup": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021-04-14T04:54:16Z W! [agent] ["outputs.influxdb_v2"] did not complete within its flush interval
2021-04-14T04:54:16Z D! [outputs.influxdb_v2] Buffer fullness: 3 / 10000 metrics
2021-04-14T04:54:16Z E! [outputs.influxdb_v2] When writing to [http://hdc-gfappd1.wiltsegroup.local:8086]: Post "http://hdc-gfappd1.wiltsegroup.local:8086/api/v2/write?bucket=telegraf%2Ftwo_months&org=wiltsegroup": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021-04-14T04:54:16Z D! [outputs.influxdb_v2] Buffer fullness: 3 / 10000 metrics
2021-04-14T04:54:16Z E! [agent] Error writing to outputs.influxdb_v2: Post "http://hdc-gfappd1.wiltsegroup.local:8086/api/v2/write?bucket=telegraf%2Ftwo_months&org=wiltsegroup": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021-04-14T04:54:26Z W! [agent] ["outputs.influxdb_v2"] did not complete within its flush interval
2021-04-14T04:54:26Z D! [outputs.influxdb_v2] Buffer fullness: 3 / 10000 metrics
2021-04-14T04:54:26Z E! [outputs.influxdb_v2] When writing to [http://hdc-gfappd1.wiltsegroup.local:8086]: Post "http://hdc-gfappd1.wiltsegroup.local:8086/api/v2/write?bucket=telegraf%2Ftwo_months&org=wiltsegroup": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021-04-14T04:54:26Z D! [outputs.influxdb_v2] Buffer fullness: 3 / 10000 metrics
2021-04-14T04:54:26Z E! [agent] Error writing to outputs.influxdb_v2: Post "http://hdc-gfappd1.wiltsegroup.local:8086/api/v2/write?bucket=telegraf%2Ftwo_months&org=wiltsegroup": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021-04-14T04:54:36Z E! [outputs.influxdb_v2] When writing to [http://hdc-gfappd1.wiltsegroup.local:8086]: Post "http://hdc-gfappd1.wiltsegroup.local:8086/api/v2/write?bucket=telegraf%2Ftwo_months&org=wiltsegroup": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Which looks to me like it makes a successful connection to Influx initially, but then fails later on for some reason.

At any rate, I’ll try what you suggested and will report back. Thanks again for the help.

What strikes me:

  1. You have permission issues:
  1. You still haven’t solved the connection problems you wrote about in another thread. Solve that first. It’s no use to increase the complexity if basic errors are not fixed…

I saw the problem in my EC2 instance in AWS. the CPU Credits were exhausted and hence the server performance was scaled down by AWS to basic level. As a result, Influxdb2 was not able to accept data from Telegraf. Telegraf thus showed the error “Cleint.timeout exceedee…” while logging.