Conversation
|
Based on my recent fights with BGP IPv6 AF on Cisco IOS. Would love to hear your feedback @ssasso @DanPartelly @jbemmel |
There was a problem hiding this comment.
I'd rather prefer having the debug "code" in a separate file (or even ansible include task - similar to what we do for device readiness check), to be called before initial configuration in case the debug hostvar is present.
There was a problem hiding this comment.
I'd rather prefer having the debug "code" in a separate file (or even ansible include task - similar to what we do for device readiness check)
Any particular reason for that? I'm already annoyed by the amount of noise Ansible produces 🤷♂️
There was a problem hiding this comment.
I would suggest a Jinja include file - same Ansible task, but keeping the debug logic separate from the rest.
I could imagine that the debugging logic could become quite extensive (dozens of flags to be set)
There was a problem hiding this comment.
I'd rather prefer having the debug "code" in a separate file (or even ansible include task - similar to what we do for device readiness check)
Any particular reason for that? I'm already annoyed by the amount of noise Ansible produces 🤷♂️
Just logical separation, maybe easier to follow for my mind.
There was a problem hiding this comment.
I do not think that modularization and device abstractions are worthy here. While would be cool to have, just imagine tenths of debug flags, abstracted over N modules over M devices. So many lookup tables, so many git diffs, and then you end up in the situation you forgot to abstract Y's favorite BGP debugging flag, then you have another pull request, and so on and so forth.
Who will do all this work ? By contrast, current method "just works" even if its not abstracted or modular in nature.
I frankly like it as it is, its really pragmatic.
|
I love this one. As it is. Very easy to set . Not very keen to see more Ansible tasks/roles/whatever |
There was a problem hiding this comment.
This should work, but would it be an idea to have a 'debug' flag per module (bgp.debug, ospf.debug, etc.) and then have the debug flags to be enabled defined in the device specific YAML files, under the features for that module (features.bgp.debug)
Users could customize these in their topology if they need to
The current implementation feels a bit like a quick hack (and I think I know what I'm talking about ;) - it lacks the Netlab signature device abstraction. When debugging, it makes most sense to enable the same debugging flags on all devices of a particular kind - as opposed to on each node individually
Imagine the following: The user would set
bgp.debug: True
and BGP debugging would be automatically enabled on all devices in the lab, handling vendor specific nuances
DanPartelly
left a comment
There was a problem hiding this comment.
For me, pragmatism here beats abstraction and modularization.
|
Thanks a million for the feedback. Going through it in random order:
Correct. This is a hack that addresses a very specific need: enabling debugging early enough to capture all events, but immediately after the device boots in case some events might be uptime-specific. For anything else, we already have solutions (documented in the .md file that's part of this PR).
While that sounds great, I'm not ready to invest time into making it happen (ignoring for the moment that testing the correctness of the implementations would be... interesting). Nobody asked for it, and the debugging capabilities heavily depend on the implementations. Also, "debugging BGP" sounds great, but do go through the debug bgp CLI on any reasonable device and you'll see how many options there are. The options you want to use depend on what you're trying to troubleshoot. Enable too little and you won't see a thing. Enable too much and you'll be swamped. Oh, and BGP also has something called "address families" ;)
In theory, that's correct. In practice, it's a tiny for loop, and I don't expect it to be anything more any time soon (see above). In theory, we could make this part of the "normalize" (pre-initial) phase, but that's just moving the problem around (RFC 1925 rule 6). We could also add "debug" config module to To wrap up: think of this as custom configuration templates, but executed very early on in the process. I don't want to have anything more than that at this stage, and I don't see (at the moment) the need for any other custom configuration executed early in the initial configuration process. However, if we get to the point where we have good reasons to have other pre-initial custom configuration templates, then this could be easily merged into that logic. Obviously, we could also drop the whole thing (I think the problem I was trying to solve was not uptime-specific after all 😜) and just keep the "debugging network devices" documentation. |
let's keep as it is. |
This is a proof-of-concept (for Cisco IOS) of a capability that enables debugging at the very beginning of initial device configuration to ensure no relevant events are lost. It introduces a new node attribute (debug), an extra flag to 'must_be_list' function that can split lines of a string value when netlab expects a list of values, and a sample initial config template. Also, I added a whole document explaning how one could do debugging based on my lovely recent experience with Cisco IOS and Aruba CX (more about those coming soon from the usual soapbox).
|
This is harder than I expected. For example, EOS won't allow you to enable debugging for things that are not configured, and I would expect NX-OS to behave in a similar way (due to their use of features). Back to the drawing board. For the moment, it looks like I'll implement this as device-specific features (similar to eos.serialnumber). However, as IOS debugging applies to a while range of devices, I have to add another tweak first. I'll be back ;) |
|
@ipspace Leaving aside debug flags for a second, but this issue is somehow linked. today i tried to make a lab where IS-IS overload bit is used at startup with wait-for-bgp. I failed. My methodology was to deploy custom configs for ISIS and ACLs which block BGP neighbor formation so i have time to observe what is going on. I failed. Although BGP neighbors where never formed , so at least the ACL part escaped racing. This will require further investigation and uses of other images besides iol, like Xrd or CSR. Anyways, one issue is that custom configs are always applied last. For some items, to eliminate races, they should really come first(after initial) . This is why I choose to post this here instead of a new issues. Order might matter. Your thoughts ? |
You can solve that with a sequence of commands:
Not a good idea ;) Someone might have a similar issue, and now the discussion will be buried in some unrelated stuff (not to mention we're bloating this PR).
I don't want to open that can of worms. I need something before initial. You need something between initial and other modules. Someone will need something between IS-IS and BGP... The only sane way to solve this edge requests is to use a more complex lab startup sequence (bash FTW!). |
node:
config:
type: list
_subtype:
file: str
before: list
after: list
_alt_types: [ str ]not the "only" sane way |
Congratulations, you successfully defined the data schema. Now go and solve the remaining 95% of the problem, but do it somewhere else, not in an unrelated PR. |
|
Thanks again for all the feedback. Will replace this PR with a more focused one targeting IOS and FRR. |
This is a proof-of-concept (for Cisco IOS) of a capability that enables debugging at the very beginning of initial device configuration to ensure no relevant events are lost.
It introduces a new node attribute (debug), an extra flag to 'must_be_list' function that can split lines of a string value when netlab expects a list of values, and a sample initial config template.
Also, I added a whole document explaning how one could do debugging based on my lovely recent experience with Cisco IOS and Aruba CX (more about those coming soon from the usual soapbox).