Skip to content

Commit d0f4ca8

Browse files
committed
Merge branch 'master' into quentin/windows-omnibus-5.0
* master: (53 commits) [nginx] Update example config [service_discovery] Add a Zookeeper service discovery implementation. [aggregator] if sample rate is bad, fix it but still parse tags. (#3073) [yarn] whitelist authorized application_tags Alex poe/update jmx with refresh beans (#3068) [config] Fix `_is_affirmative` when passed argument is `None` (#3063) Send all configured tags with process checks. (#2976) fix flake8 errors [flare] ignore whitespace before proxy credentials [core] add a switch to disable profiling, but still use developer mode (#2898) [tests] allow tests to use the additional_checksd parameter (#3056) [service_discovery][jmx] trying to pick-up JMX changes with SD. (#3010) [install_script] Make `dd-agent` group of `datadog.conf` (#3036) [postgres] Allow disable postgresql.database_size (#3035) [core] Fixes IndexError for process lookup (#3043) remove warning message leaking password strings (#3053) trap psutil.NoSuchProcess exception (#3052) Fix grammar and casing in exception text (#3050) allow override of kubelet host with KUBERNETES_KUBELET_HOST env var [service discovery] properly handle config reload for removed containers ...
2 parents bc82757 + 2a98280 commit d0f4ca8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+4562
-762
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,3 +36,6 @@ embedded/*
3636
dump.rdb
3737
tests/core/fixtures/flare/dd*
3838
.python-version
39+
.ropeproject
40+
.bundle
41+
tags

.rubocop.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,6 @@ Style/Documentation:
2323
# Configuration parameters: Methods.
2424
Style/SingleLineBlockParams:
2525
Enabled: false
26+
27+
BlockLength:
28+
Max: 110

CHANGELOG.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,33 @@
11
Changes
22
=======
33

4+
# 5.10.1 / 11-21-2016
5+
**Linux, Windows, Docker and Source Install**
6+
7+
### Details
8+
https://github.com/DataDog/dd-agent/compare/5.10.0...5.10.1
9+
10+
### Updated Integrations
11+
12+
* RiakCS
13+
* Mongo
14+
15+
### Changes
16+
* [IMPROVEMENT] Core/Forwarder: stop flushing after 10s. See [#3018][].
17+
* [IMPROVEMENT] Core: isolate system checks. See [#3011][].
18+
* [IMPROVEMENT] RiakCS: support Riak CS 2.1+ stats format. See [#2920][]. (Thanks [@millerdev][])
19+
* [IMPROVEMENT] Status: Silence requests exception. See [#3023][].
20+
* [IMPROVEMENT] SpooledTemporaryFile for subprocess output. See [#3002][].
21+
22+
* [BUGFIX] Core: fix unintended subprocess_output empty output errors. See [#3024][].
23+
* [BUGFIX] Core/Multiple Checks: Only set `psutil.PROCFS_PATH` once in the collector. See [#3013][].
24+
* [BUGFIX] Core: use proxy for API key status check in info page. See [#3012][]. (Thanks [@2rs2ts][])
25+
* [BUGFIX] Mongo: use db.current_op instead of manually querying. See [#3016][] (Thanks [@ebroder][])
26+
* [BUGFIX] Mongo: use `currentOp` for monodb 3.1+. See [#3015][] (Thanks [@lattwood][])
27+
28+
* [DEPRECATE] Process: `procfs_path` is now deprecated, should be set in `datadog.conf`. See [#3013][].
29+
30+
431
# 5.10.0 / 11-09-2016
532
**Linux, Windows, Docker and Source Install**
633

@@ -3609,6 +3636,7 @@ https://github.com/DataDog/dd-agent/compare/2.2.9...2.2.10
36093636
[#2908]: https://github.com/DataDog/dd-agent/issues/2908
36103637
[#2910]: https://github.com/DataDog/dd-agent/issues/2910
36113638
[#2915]: https://github.com/DataDog/dd-agent/issues/2915
3639+
[#2920]: https://github.com/DataDog/dd-agent/issues/2920
36123640
[#2921]: https://github.com/DataDog/dd-agent/issues/2921
36133641
[#2926]: https://github.com/DataDog/dd-agent/issues/2926
36143642
[#2928]: https://github.com/DataDog/dd-agent/issues/2928
@@ -3644,8 +3672,18 @@ https://github.com/DataDog/dd-agent/compare/2.2.9...2.2.10
36443672
[#2984]: https://github.com/DataDog/dd-agent/issues/2984
36453673
[#2989]: https://github.com/DataDog/dd-agent/issues/2989
36463674
[#2991]: https://github.com/DataDog/dd-agent/issues/2991
3675+
[#3002]: https://github.com/DataDog/dd-agent/issues/3002
36473676
[#3006]: https://github.com/DataDog/dd-agent/issues/3006
3677+
[#3011]: https://github.com/DataDog/dd-agent/issues/3011
3678+
[#3012]: https://github.com/DataDog/dd-agent/issues/3012
3679+
[#3013]: https://github.com/DataDog/dd-agent/issues/3013
3680+
[#3015]: https://github.com/DataDog/dd-agent/issues/3015
3681+
[#3016]: https://github.com/DataDog/dd-agent/issues/3016
3682+
[#3018]: https://github.com/DataDog/dd-agent/issues/3018
3683+
[#3023]: https://github.com/DataDog/dd-agent/issues/3023
3684+
[#3024]: https://github.com/DataDog/dd-agent/issues/3024
36483685
[#3399]: https://github.com/DataDog/dd-agent/issues/3399
3686+
[@2rs2ts]: https://github.com/2rs2ts
36493687
[@AirbornePorcine]: https://github.com/AirbornePorcine
36503688
[@AntoCard]: https://github.com/AntoCard
36513689
[@CaptTofu]: https://github.com/CaptTofu
@@ -3736,6 +3774,7 @@ https://github.com/DataDog/dd-agent/compare/2.2.9...2.2.10
37363774
[@jslatts]: https://github.com/jslatts
37373775
[@jzoldak]: https://github.com/jzoldak
37383776
[@kzw]: https://github.com/kzw
3777+
[@lattwood]: https://github.com/lattwood
37393778
[@leifwalsh]: https://github.com/leifwalsh
37403779
[@leucos]: https://github.com/leucos
37413780
[@loris]: https://github.com/loris
@@ -3751,6 +3790,7 @@ https://github.com/DataDog/dd-agent/compare/2.2.9...2.2.10
37513790
[@micktwomey]: https://github.com/micktwomey
37523791
[@mike-lerch]: https://github.com/mike-lerch
37533792
[@mikekap]: https://github.com/mikekap
3793+
[@millerdev]: https://github.com/millerdev
37543794
[@mms-gianni]: https://github.com/mms-gianni
37553795
[@mooney6023]: https://github.com/mooney6023
37563796
[@morskoyzmey]: https://github.com/morskoyzmey

agent.py

Lines changed: 103 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@
1919
import signal
2020
import sys
2121
import time
22+
import supervisor.xmlrpc
23+
import xmlrpclib
2224
from copy import copy
2325

2426
# For pickle & PID files, see issue 293
@@ -29,13 +31,19 @@
2931
from checks.collector import Collector
3032
from config import (
3133
get_config,
34+
get_jmx_pipe_path,
3235
get_parsed_args,
3336
get_system_stats,
3437
load_check_directory,
35-
load_check
38+
load_check,
39+
generate_jmx_configs,
40+
_is_affirmative,
41+
SD_PIPE_NAME
42+
3643
)
3744
from daemon import AgentSupervisor, Daemon
3845
from emitter import http_emitter
46+
from utils.platform import Platform
3947

4048
# utils
4149
from utils.cloud_metadata import EC2
@@ -51,11 +59,18 @@
5159
from utils.watchdog import new_watchdog
5260

5361
# Constants
62+
from jmxfetch import JMX_CHECKS
5463
PID_NAME = "dd-agent"
5564
PID_DIR = None
5665
WATCHDOG_MULTIPLIER = 10
5766
RESTART_INTERVAL = 4 * 24 * 60 * 60 # Defaults to 4 days
5867

68+
JMX_SUPERVISOR_ENTRY = 'datadog-agent:jmxfetch'
69+
JMX_GRACE_SECS = 2
70+
SERVICE_DISCOVERY_PREFIX = 'SD-'
71+
SD_CONFIG_SEP = "#### SERVICE-DISCOVERY ####\n"
72+
73+
DEFAULT_SUPERVISOR_SOCKET = '/opt/datadog-agent/run/datadog-supervisor.sock'
5974
DEFAULT_COLLECTOR_PROFILE_INTERVAL = 20
6075

6176
# Globals
@@ -80,6 +95,9 @@ def __init__(self, pidfile, autorestart, start_event=True, in_developer_mode=Fal
8095
# this flag can be set to True, False, or a list of checks (for partial reload)
8196
self.reload_configs_flag = False
8297
self.sd_backend = None
98+
self.supervisor_proxy = None
99+
self.sd_pipe = None
100+
83101

84102
def _handle_sigterm(self, signum, frame):
85103
"""Handles SIGTERM and SIGINT, which gracefully stops the agent."""
@@ -105,6 +123,7 @@ def reload_configs(self, checks_to_reload=set()):
105123
Can also reload only an explicit set of checks."""
106124
log.info("Attempting a configuration reload...")
107125
hostname = get_hostname(self._agentConfig)
126+
jmx_sd_configs = None
108127

109128
# if no check was given, reload them all
110129
if not checks_to_reload:
@@ -114,13 +133,23 @@ def reload_configs(self, checks_to_reload=set()):
114133
check.stop()
115134

116135
self._checksd = load_check_directory(self._agentConfig, hostname)
136+
if self._jmx_service_discovery_enabled:
137+
jmx_sd_configs = generate_jmx_configs(self._agentConfig, hostname)
117138
else:
118139
new_checksd = copy(self._checksd)
119140

120-
self.refresh_specific_checks(hostname, new_checksd, checks_to_reload)
141+
jmx_checks = [check for check in checks_to_reload if check in JMX_CHECKS]
142+
py_checks = set(checks_to_reload) - set(jmx_checks)
143+
self.refresh_specific_checks(hostname, new_checksd, py_checks)
144+
if self._jmx_service_discovery_enabled:
145+
jmx_sd_configs = generate_jmx_configs(self._agentConfig, hostname, jmx_checks)
146+
121147
# once the reload is done, replace existing checks with the new ones
122148
self._checksd = new_checksd
123149

150+
if jmx_sd_configs:
151+
self._submit_jmx_service_discovery(jmx_sd_configs)
152+
124153
# Logging
125154
num_checks = len(self._checksd['initialized_checks'])
126155
if num_checks > 0:
@@ -218,9 +247,32 @@ def run(self, config=None):
218247
if self._agentConfig.get('service_discovery'):
219248
self.sd_backend = get_sd_backend(self._agentConfig)
220249

250+
if _is_affirmative(self._agentConfig.get('sd_jmx_enable')):
251+
pipe_path = get_jmx_pipe_path()
252+
if Platform.is_windows():
253+
pipe_name = pipe_path.format(pipename=SD_PIPE_NAME)
254+
else:
255+
pipe_name = os.path.join(pipe_path, SD_PIPE_NAME)
256+
257+
if os.access(pipe_path, os.W_OK):
258+
if not os.path.exists(pipe_name):
259+
os.mkfifo(pipe_name)
260+
self.sd_pipe = os.open(pipe_name, os.O_RDWR) # RW to avoid blocking (will only W)
261+
262+
# Initialize Supervisor proxy
263+
self.supervisor_proxy = self._get_supervisor_socket(self._agentConfig)
264+
else:
265+
log.debug('Unable to create pipe in temporary directory. JMX service discovery disabled.')
266+
221267
# Load the checks.d checks
222268
self._checksd = load_check_directory(self._agentConfig, hostname)
223269

270+
# Load JMX configs if available
271+
if self._jmx_service_discovery_enabled:
272+
jmx_sd_configs = generate_jmx_configs(self._agentConfig, hostname)
273+
if jmx_sd_configs:
274+
self._submit_jmx_service_discovery(jmx_sd_configs)
275+
224276
# Initialize the Collector
225277
self.collector = Collector(self._agentConfig, emitters, systemStats, hostname)
226278

@@ -241,13 +293,15 @@ def run(self, config=None):
241293
self.restart_interval = int(self._agentConfig.get('restart_interval', RESTART_INTERVAL))
242294
self.agent_start = time.time()
243295

296+
self.allow_profiling = self._agentConfig.get('allow_profiling', True)
297+
244298
profiled = False
245299
collector_profiled_runs = 0
246300

247301
# Run the main loop.
248302
while self.run_forever:
249303
# Setup profiling if necessary
250-
if self.in_developer_mode and not profiled:
304+
if self.allow_profiling and self.in_developer_mode and not profiled:
251305
try:
252306
profiler = AgentProfiler()
253307
profiler.enable_profiling()
@@ -344,6 +398,52 @@ def _set_agent_config_hostname(self, agentConfig):
344398
log.info('Not running on EC2, using hostname to identify this server')
345399
return agentConfig
346400

401+
def _get_supervisor_socket(self, agentConfig):
402+
if Platform.is_windows():
403+
return None
404+
405+
sockfile = agentConfig.get('supervisor_socket', DEFAULT_SUPERVISOR_SOCKET)
406+
supervisor_proxy = xmlrpclib.ServerProxy(
407+
'http://127.0.0.1',
408+
transport=supervisor.xmlrpc.SupervisorTransport(
409+
None, None, serverurl="unix://{socket}".format(socket=sockfile))
410+
)
411+
412+
return supervisor_proxy
413+
414+
@property
415+
def _jmx_service_discovery_enabled(self):
416+
return self.sd_pipe is not None
417+
418+
def _submit_jmx_service_discovery(self, jmx_sd_configs):
419+
420+
if not jmx_sd_configs or not self.sd_pipe:
421+
return
422+
423+
if self.supervisor_proxy is not None:
424+
jmx_state = self.supervisor_proxy.supervisor.getProcessInfo(JMX_SUPERVISOR_ENTRY)
425+
log.debug("Current JMX check state: %s", jmx_state['statename'])
426+
# restart jmx if stopped
427+
if jmx_state['statename'] in ['STOPPED', 'EXITED', 'FATAL'] and self._agentConfig.get('sd_jmx_enable'):
428+
self.supervisor_proxy.supervisor.startProcess(JMX_SUPERVISOR_ENTRY)
429+
time.sleep(JMX_GRACE_SECS)
430+
else:
431+
log.debug("Unable to automatically start jmxfetch on Windows via supervisor.")
432+
433+
buffer = ""
434+
for name, yaml in jmx_sd_configs.iteritems():
435+
try:
436+
buffer += SD_CONFIG_SEP
437+
buffer += "# {}\n".format(name)
438+
buffer += yaml
439+
except Exception as e:
440+
log.exception("unable to submit YAML via RPC: %s", e)
441+
else:
442+
log.info("JMX SD Config via named pip %s successfully.", name)
443+
444+
if buffer:
445+
os.write(self.sd_pipe, buffer)
446+
347447
def _should_restart(self):
348448
if time.time() - self.agent_start > self.restart_interval:
349449
return True

aggregator.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -492,13 +492,14 @@ def parse_metric_packet(self, packet):
492492
# Parse the sample rate
493493
if m[0] == '@':
494494
sample_rate = float(m[1:])
495-
assert 0 <= sample_rate <= 1
495+
# in case it's in a bad state
496+
sample_rate = 1 if sample_rate < 0 or sample_rate > 1 else sample_rate
496497
elif m[0] == '#':
497498
tags = tuple(sorted(m[1:].split(',')))
498-
except (IndexError, AssertionError):
499+
except IndexError:
499500
log.warning(u'Incorrect metric metadata: metric_name:%s, metadata:%s',
500501
name, u' '.join(value_and_metadata[2:]))
501-
sample_rate = 1 # In case it's in a bad state
502+
502503
parsed_packets.append((name, value, metric_type, tags, sample_rate))
503504

504505
return parsed_packets

0 commit comments

Comments
 (0)