Error getting docker stats: context deadline exceeded

devth · May 17, 2017, 4:46pm

I’m using the telegraf-ds chart 1.3 running on Kubernetes 1.6.2 (GKE) via the tick-charts repo.

Logs show continuous errors like:

2017-05-17T16:44:44Z E! Error in plugin [inputs.docker]: E! Error gathering container [/k8s_chronograf_chronograf-prod-chronograf-604308575-ldmn6_infra_8963a62f-3a53-11e7-9643-42010a8001a5_0] stats: Error getting docker stats: context deadline exceeded

The above is the log for the chronograf pod, but this error is logged for 18 out of my 45 pods.

I tried increasing the docker input timeout to 10s and then to 20s with the same result.

jackzampolin · May 17, 2017, 5:31pm

@devth I’ve seen this issue on every cluster I’ve spun up but it seems intermittent and slightly random as to which pods are affected. I initially thought this was isolated to the kube-system namespace but it appears it is not. Looking at the docker logs on the host the following errors are repeated:

May 17 02:06:02 gke-influx-kube-default-pool-1202464b-3g6m docker[1290]: time="2017-05-17T02:06:02.551558050Z" level=error msg="collecting stats for 5ac5af5db592267d33ea53d3f3eb53757bf
May 17 02:06:02 gke-influx-kube-default-pool-1202464b-3g6m docker[1290]: time="2017-05-17T02:06:02.556143500Z" level=error msg="collecting stats for fc54907131a326049c1e0e6ef498e31d426
May 17 02:06:02 gke-influx-kube-default-pool-1202464b-3g6m docker[1290]: time="2017-05-17T02:06:02.557884430Z" level=error msg="collecting stats for b4325f3fdf93ac516fa1b80bd62e4e1592e
May 17 02:06:02 gke-influx-kube-default-pool-1202464b-3g6m docker[1290]: time="2017-05-17T02:06:02.561983229Z" level=error msg="collecting stats for 86290e7ba07c3d146848a5795f6cdb4436a
May 17 02:06:02 gke-influx-kube-default-pool-1202464b-3g6m docker[1290]: time="2017-05-17T02:06:02.563776749Z" level=error msg="collecting stats for c51d92e5dcb0ec3f271e28f00c004b3bf02
May 17 02:06:02 gke-influx-kube-default-pool-1202464b-3g6m docker[1290]: time="2017-05-17T02:06:02.565495838Z" level=error msg="collecting stats for 8feadbb8ddacfc7fb37d99065c0e3c01632

jackzampolin · May 17, 2017, 5:32pm

Docker Version information on gke hosts:

Client:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.7.4
 Git commit:   4dc5990
 Built:        
 OS/Arch:      linux/amd64
Server:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.7.4
 Git commit:   4dc5990
 Built:        
 OS/Arch:      linux/amd64

and related docker issues:

github.com/moby/moby

Loose stat after reboot on live-restore mode

opened 10:44AM - 19 Jan 17 UTC

sebglon

area/kernel area/networking

**Description** It seems related to the option "live-restore": true, When i restart docker daemon, containers are not restarted and then when i try a docker stats on one of my running conatiner, there is no metrics : CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS infra_consul_static_1 -- -- / -- -- -- / -- -- / -- -- after restart of the container i got my metrics back: CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS infra_consul_static_1 0.00% 10.81 MiB / 3.86 GiB 0.27% 17.8 kB / 8.624 kB 7.401 MB / 16.38 kB 0  **Steps to reproduce the issue:** 1. add live-restore=on daemon.json 2. systemctl start docker 3. docker run --name httpd -d httpd && docker stats httpd => all is OK 4. systemctl restart docker 5. docker stats httpd => No data received And on docker log, i can see Got lots of lines like this one on 1.12.5 on centos7 (and 1.12.6) : docker: time="2017-01-18T16:04:28.671968138+01:00" level=error msg="collecting stats for httpd: sandbox httpd not found"

github.com/moby/moby

Docker giving out duplicate IP address post upgrade, with --fixed-cidr enabled.

opened 08:57PM - 05 Jan 17 UTC

sakserv

area/networking version/1.12

**Description** Recently, docker was upgraded from Docker 1.12.2 to Docker 1.…12.5 on a set of Centos 7.2 servers. The docker configuration includes live-restore and fixed-cidr. Two containers were running prior to upgrade. After the upgrade, a new container was launched. Docker inspect shows the MAC and IP address given to the new container matches that of one of the containers that was started prior to the upgrade. Potentially similar to: #10096 **Steps to reproduce the issue:** 1. Start a container with fixed-cidr enabled. Capture the IP address and MAC from docker inspect. 2. Upgrade from 1.12.2 to 1.12.5 3. Start a new container and compare the IP address and MAC. **Describe the results you received:** The IP address and MAC address of the new container is the same as the container started prior to upgrade. **Describe the results you expected:** IP addresses should not be reused as it impacts existing containers. **Additional information you deem important (e.g. issue happens only occasionally):** The upgrade was performed via yum upgrade using the yum.dockerproject.org yum repo. I have validated this behavior on 3 different nodes. **Output of `docker version`:** ``` Client: Version: 1.12.5 API version: 1.24 Go version: go1.6.4 Git commit: 7392c3b Built: Fri Dec 16 02:23:59 2016 OS/Arch: linux/amd64 Server: Version: 1.12.5 API version: 1.24 Go version: go1.6.4 Git commit: 7392c3b Built: Fri Dec 16 02:23:59 2016 OS/Arch: linux/amd64 ``` **Output of `docker info`:** ``` Containers: 3 Running: 2 Paused: 0 Stopped: 1 Images: 29 Server Version: 1.12.5 Storage Driver: devicemapper Pool Name: vg01-docker--pool Pool Blocksize: 524.3 kB Base Device Size: 268.4 GB Backing Filesystem: xfs Data file: Metadata file: Data Space Used: 39.92 GB Data Space Total: 5.63 TB Data Space Available: 5.59 TB Metadata Space Used: 15.8 MB Metadata Space Total: 16.98 GB Metadata Space Available: 16.96 GB Thin Pool Minimum Free Space: 563 GB Udev Sync Supported: true Deferred Removal Enabled: false Deferred Deletion Enabled: false Deferred Deleted Device Count: 0 Library Version: 1.02.107-RHEL7 (2015-12-01) Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: overlay null host bridge Swarm: inactive Runtimes: runc Default Runtime: runc Security Options: seccomp Kernel Version: 3.10.0-327.13.1.el7.x86_64 Operating System: CentOS Linux 7 (Core) OSType: linux Architecture: x86_64 CPUs: 32 Total Memory: 251.6 GiB Name: server.foo.bar ID: RFOH:6J7S:HDHR:ICD7:TS7O:ATUH:YAD4:UI3R:2HLX:QYBA:IOCE:TMMA Docker Root Dir: /docker Debug Mode (client): false Debug Mode (server): true File Descriptors: 26 Goroutines: 31 System Time: 2017-01-05T20:54:45.674747462Z EventsListeners: 0 Registry: https://index.docker.io/v1/ WARNING: bridge-nf-call-iptables is disabled WARNING: bridge-nf-call-ip6tables is disabled Insecure Registries: 127.0.0.0/8 ``` **Additional environment details (AWS, VirtualBox, physical, etc.):** Physical, CentOS 7.2

Topic		Replies	Views
Container metrics collection -- Telegraf (on windows) throws error, when configured for docker Telegraf influxdb , telegraf , windows	1	1491	October 2, 2017
Question about telegraf: docker-container-status / exitcode Telegraf telegraf	2	1901	September 14, 2020
Apparent discrepancy in memory usage reported by telegraf and docker stats Telegraf docker	1	929	October 14, 2022
Docker Plugin issues with 1.3 Telegraf telegraf	3	6601	October 17, 2017
Telegraf stops collecting metrics after some time, Error in plugin [inputs.mysql]: took longer to collect than collection interval (1m0s)	1	1884	December 8, 2017

Error getting docker stats: context deadline exceeded

Related topics