Skip to content

[docker] container & volume metrics (wraps #2740)#3077

Merged
hkaj merged 4 commits intomasterfrom
parkr-docker-daemon-track-containers-volumes
Feb 7, 2017
Merged

[docker] container & volume metrics (wraps #2740)#3077
hkaj merged 4 commits intomasterfrom
parkr-docker-daemon-track-containers-volumes

Conversation

@hkaj
Copy link
Member

@hkaj hkaj commented Dec 12, 2016

Based on the great work of @parkr in #2740

and

  • fix tests & conflicts
  • change how we name and tag the metrics a little
  • kill some dead code

@hkaj hkaj force-pushed the parkr-docker-daemon-track-containers-volumes branch 2 times, most recently from c424a15 to 839d020 Compare December 20, 2016 16:11
@hkaj hkaj changed the title wip [docker] container & volume metrics (wraps #2740) Dec 20, 2016
@hkaj
Copy link
Member Author

hkaj commented Dec 20, 2016

test failure unrelated

Copy link
Contributor

@parkr parkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks for picking this up.

If there is no way to test this on Travis, were you able to get tests for the daemon working locally?

@hkaj
Copy link
Member Author

hkaj commented Dec 20, 2016

@parkr yes i tested it locally. Testing it on travis would require sudo and docker, but to speed up tests we're using their container infrastructure which doesn't support sudo and DinD (wise decision, but painful here).

@parkr
Copy link
Contributor

parkr commented Jan 9, 2017

@hkaj Hey! Anything I can do to help get this past the finish line?

@parkr
Copy link
Contributor

parkr commented Jan 9, 2017

Tried this on a host with 5.10.1 installed:

2017-01-09 10:50:12 PST | ERROR | dd.collector | checks.docker_daemon(__init__.py:784) | Check 'docker_daemon' instance #0 failed
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/checks/__init__.py", line 767, in run
    self.check(copy.deepcopy(instance))
  File "/etc/dd-agent/checks.d/docker_daemon.py", line 255, in check
    self._report_performance_metrics(containers_by_id)
  File "/etc/dd-agent/checks.d/docker_daemon.py", line 580, in _report_performance_metrics
    self._report_cgroup_metrics(container, tags)
  File "/etc/dd-agent/checks.d/docker_daemon.py", line 598, in _report_cgroup_metrics
    stat_file = self._get_cgroup_from_proc(cgroup["cgroup"], container['_pid'], cgroup['file'])
  File "/etc/dd-agent/checks.d/docker_daemon.py", line 861, in _get_cgroup_from_proc
    return DockerUtil.find_cgroup_from_proc(self._mountpoints, pid, cgroup, self.docker_util._docker_root) % (params)
  File "/opt/datadog-agent/agent/utils/dockerutil.py", line 311, in find_cgroup_from_proc
    with open(proc_path, 'r') as fp:
IOError: [Errno 2] No such file or directory: '/proc/24207/cgroup'

@therc
Copy link
Contributor

therc commented Jan 9, 2017

@parkr: did you set docker_root: /host in the init_config section of your docker_daemon.yaml? I just wasted a whole hour on this because I had started from a more recent template and erroneously assumed that it was the default setting.

@parkr
Copy link
Contributor

parkr commented Jan 10, 2017

@parkr: did you set docker_root: /host in the init_config section of your docker_daemon.yaml? I just wasted a whole hour on this because I had started from a more recent template and erroneously assumed that it was the default setting.

No, I never set docker_root. In general, /proc/$PID/cgroup is the correct place to look, it's just that it doesn't always contain this file (depending on the process). Sometimes the proc quits before the cgroup info can be read. I think it's just a race condition.

@hkaj hkaj added this to the 5.12.0 milestone Jan 25, 2017
parkr added 3 commits February 7, 2017 13:29
- collect_exited_container_count: gauge of how many containers have State=Exited
- collect_dead_container_count: gauge of how many containers have State=Dead
- collect_container_count: gauge of how many containers there are
- collect_volume_count: gauge of how many volumes there are
- collect_dangling_volume_count: gauge of how many dangling volumes there are
@hkaj hkaj force-pushed the parkr-docker-daemon-track-containers-volumes branch from 839d020 to 6604186 Compare February 7, 2017 14:05
@hkaj hkaj requested a review from hush-hush February 7, 2017 14:06
@hkaj
Copy link
Member Author

hkaj commented Feb 7, 2017

@hush-hush i rebased it, can you have a look please?

if 'Size' in image:
self.gauge('docker.image.size', image['Size'], tags=tags)
except:
import ipdb;ipdb.set_trace()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be on 2 line (that why the CI flake8 tests are falling). End I'm not sure we want to keep it, looks a lot like some debug lines.

['Metadata Space Available', '9 MB'],
['Metadata Space Total', '10 MB'],
['Metadata Space Used', '11 MB'],
['Metadata Space Available', '0 MB'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we set values to 0 ? Doesn't that broke test_devicemapper_invalid_values test ?

@hkaj hkaj force-pushed the parkr-docker-daemon-track-containers-volumes branch from 6604186 to 6923524 Compare February 7, 2017 14:47
@hkaj hkaj merged commit 5b90a21 into master Feb 7, 2017
@hkaj hkaj deleted the parkr-docker-daemon-track-containers-volumes branch February 7, 2017 17:16
@parkr
Copy link
Contributor

parkr commented Feb 7, 2017

🎉 Thank you, all!

@hkaj
Copy link
Member Author

hkaj commented Feb 7, 2017

Thank you @parkr 🍰

@parkr
Copy link
Contributor

parkr commented Feb 7, 2017

@hkaj Thank you for following up and writing the tests. And for ensuring it got reviewed and merged eventually. 😄 Will this make it into a release in the coming week(s)?

@hkaj
Copy link
Member Author

hkaj commented Feb 7, 2017

5.11 is dedicated to releasing the new APM agent so we won't release any change from dd-agent then, but 5.12 will follow soon after.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants