Skip to content

Conversation

@nmoray
Copy link

@nmoray nmoray commented Oct 25, 2023

  • What I did
    Added a support of TACACS passkey encryption feature.
    Ref. : HLD
    This PR comprises the encryption logic.

  • How I did it
    Implemented the feature by following HLD

  • How to verify it
    `1. Configured TACACS passkey:
    root@sonic:/# config tacacs passkey

  1. Verified whether passkey is encrypted:
    root@sonic:/# show runningconfiguration all | grep passkey
    "passkey": "U2FsdGVkX19kFwDeP3IhgqbLJeed3pXtazJ73FtmD3I="

  2. Verified /etc/pam.d/common-auth-sonic file to validate if the passkey is decrypted correctly [Referred while ssh'ing into the device]
    root@sonic:~# cat /etc/pam.d/common-auth-sonic | grep secret
    auth [success=done new_authtok_reqd=done default=ignore auth_err=die] pam_tacplus.so server=:49 secret=<pass_in_plaintext> login=login timeout=5 try_first_pass
    auth [success=done new_authtok_reqd=done default=ignore auth_err=die] pam_tacplus.so server=:49 secret=<pass_in_plaintext> login=login timeout=5 try_first_pass

  3. Verified passkey is hidden in show tacacs output
    root@sonic:~# show tacacs
    TACPLUS global auth_type pap (default)
    TACPLUS global timeout 5 (default)
    TACPLUS global passkey configured Yes

  4. Verified user able to login into device with TACACS credentials`

@madhupalu
Copy link
Contributor

@nmoray please take care of unit test coverage for PR to be approved.

@liuh-80
Copy link
Contributor

liuh-80 commented Nov 2, 2023

@nmoray , please fix coverage issue by add new test case:

@azure-pipelines-wrapper
coverage.Azure.sonic-utilities.Python3 — Pull Request Coverage

Total: 44 lines
Missing: 39 lines
Coverage: 11%
Threshold: 80%

@nmoray
Copy link
Author

nmoray commented Nov 2, 2023

@nmoray , please fix coverage issue by add new test case:

@azure-pipelines-wrapper coverage.Azure.sonic-utilities.Python3 — Pull Request Coverage

Total: 44 lines Missing: 39 lines Coverage: 11% Threshold: 80%

@liuh-80 can you please give me some pointers for better understanding of this AUT infra. Like, how to write a new testcase.

@liuh-80
Copy link
Contributor

liuh-80 commented Nov 3, 2023

@nmoray , please fix coverage issue by add new test case:
@azure-pipelines-wrapper coverage.Azure.sonic-utilities.Python3 — Pull Request Coverage
Total: 44 lines Missing: 39 lines Coverage: 11% Threshold: 80%

@liuh-80 can you please give me some pointers for better understanding of this AUT infra. Like, how to write a new testcase.

You can find existing test case and add new test case here: https://github.com/sonic-net/sonic-utilities/blob/master/tests/aaa_test.py

@zhangyanzhao
Copy link
Collaborator

Reviewers, if you are ok with this PR, please help to approve it. Thanks.

Copy link
Contributor

@madhupalu madhupalu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am good with the changes

config/aaa.py Outdated
if not errs:
add_table_kv('TACPLUS', 'global', 'passkey', outsecret)
else:
click.echo('Passkey configuration failed' % errs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to add a string formatting operator here (%)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!

config/aaa.py Outdated
except Exception as e:
click.echo('getpass aborted' % e)
return
add_table_kv('TACPLUS', 'global', 'key_encrypt', True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the value must be "true" and not a literal True (and also not "True"), look at yang boolean value handling for details

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying it should be (t)rue? I just referred below.
add_table_kv('AAA', 'authentication', 'failthrough', True)

click.echo('Passkey configuration failed' % errs)
return
else:
add_table_kv('TACPLUS', 'global', 'key_encrypt', False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, the value must be "false" and not a literal False. "False" will work but not as intended, should stick with lowercase "false"

config/aaa.py Outdated
else:
add_table_kv('TACPLUS', 'global', 'key_encrypt', False)
add_table_kv('TACPLUS', 'global', 'passkey', secret)
secure_cipher.del_cipher_pass()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will delete the entire file, are we sure that's what you want?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been resolved, as the underlying key manager no longer deletes the whole file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah.

@qiluo-msft
Copy link
Contributor

Please add testcases to cover new code, otherwise coverage checker will block this PR.

@nmoray
Copy link
Author

nmoray commented Apr 28, 2025

Please add testcases to cover new code, otherwise coverage checker will block this PR.

@qiluo-msft Sure I will add the testcases. But In my opinion, if sonic-net/sonic-buildimage#17201 PR gets merged first, it will be easy to focus on this dependent PR.

config/aaa.py Outdated
if encrypt:
try:
passwd = getpass.getpass()
except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception

Please use specific exception type.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay to have following?

except (EOFError, KeyboardInterrupt):
    print("\nInput cancelled.")
    passwd = None
except Exception as e:
    print(f"Unexpected error: {e}")
    passwd = None

config/aaa.py Outdated
@click.option('-p', '--pri', help="Priority, default 1", type=click.IntRange(1, 64), default=1)
@click.option('-m', '--use-mgmt-vrf', help="Management vrf, default is no vrf", is_flag=True)
def add(address, timeout, key, auth_type, port, pri, use_mgmt_vrf):
@click.option('-e', '--encrypt', help='Enable passkey encryption feature', is_flag=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encrypt

This command line extra option is confusing, why you want to change TACPLUS/global/key_encrypt in 2 command lines? do you want to encrypt other passkey if the 2 commands are both used but with different encrypt option?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the --key option allows the password to be set inline when first configuring tacacs (rather than as a separate passkey invocation. I think that the author is correct in adding something here to keep the behavior consistent, but it's for sure confusing to have two args related to the same thing.

Maybe instead of having --encrypt and having that modify the --key option, change the second option to be --encrypted-key and change the command to accept either option, but not both? so you can pass --key <some passkey> and keep the original behavior, or pass --encrypted-key and follow the steps to encrypt the key. The help strings should make the difference between the options clear, and they should probably be put one after the other in the arg list.

Copy link
Contributor

@anders-nexthop anders-nexthop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nmoray this PR has many open review comments, can you please update the PR and resolve the comments?

Copy link
Contributor

@anders-nexthop anders-nexthop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to rethink how the individual server passkeys will work with this feature.

RADIUS_MAXSERVERS = 8
RADIUS_PASSKEY_MAX_LEN = 65
VALID_CHARS_MSG = "Valid chars are ASCII printable except SPACE, '#', and ','"
TACACS_PASSKEY_MAX_LEN = 65
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sonic-net/sonic-buildimage#17201 changes this limit to 256, do you want to change it here as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the length set for actual secret which is in plaintext. IMO, 65 chars are good enough.

config/aaa.py Outdated
def passkey(ctx, secret, encrypt):
"""Specify TACACS+ server global passkey <STRING>"""
if ctx.obj == 'default':
del_table_key('TACPLUS', 'global', 'passkey')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to reset key_encrypt here? On the one hand I don't think it will matter, since we always set key_encrypt correctly when a new passkey is added. But on the other, we don't really want to leave stale configuration hanging around without cause.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the TACPLUS table itself is deleted, no need to worry about key_encrypt as it is part of the same table only.

config/aaa.py Outdated
click.echo('getpass aborted' % e)
return
add_table_kv('TACPLUS', 'global', 'key_encrypt', True)
outsecret, errs = secure_cipher.encrypt_passkey('TACPLUS', key, passwd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work. Each TACPLUS server can have its own passkey, but we are only setting one global encryption password. If I configure two servers, the password for the second will override the first, and the first passkey will not be usable. Further, we are using the single global key_encrypt var for every server, so if a server is configured without it we will brick the others.

I can see two ways to resolve this. Either we can add a separate passkey for each server (which means extending the passkey manager to handle more than one TACACS key, and extend the yang model to have a key_encrypt value for each individual server), or we can enhance the logic of the cipherkey manager somewhat to re-use the same passkey for every TACACS server (we could make the 'passwd' arg of encrypt_passkey optional, and just use the existing passkey if it exists. we would then configure that only through the global cli command).

I guess a third option would be to make the passkey generation an explicit step, separate from the encryption. so instead of passing a passwd to encrypt_passkey() we would have a separate function add_cipher_password() which adds a password for a given feature, and then encrypt_passkey() would just use that configured password. We could add logic to call add_cipher_password() automatically the first time encrypt_passkey() is called for a feature, or maybe call it only when the global option changes.

Copy link
Author

@nmoray nmoray Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any real production usecase where the user need different passkeys for different TACACS servers? IMO, better to have a single passkey across the servers. If so, it can easily be handled by making use of is_key_encrypt_enabled() API while configuring the passkey.

If the passkey is already encrypted while configuring the global TACACS configurations, we will check if it is already configured via is_key_encrypt_enabled() in add() function while configuring TACACS server level configs and vice versa.

This way, we can have a single FEATURE id for TACACS, RADIUS and so on. Let me know your thoughts on the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how widespread the use case is, but since the ability exists to have different passkeys for different servers I don't think we would want to get rid of that. However, I do think we could share the same cipherpass between servers, which is I think what you are suggesting here. That would work, and i see that is_key_encrypt_enabled() returns the cipherpass, so we can use that to pass to encrypt_passkey() (we might want to rename is_key_encrypt_enabled() func at some point to make it clear that it returns the cipherpass). Would it be better to move the getpass() logic to security_cipher code itself? and only prompt the user for the encryption password the first time the FEATURE id is used? That probably makes more sense if we are allowing a FEATURE id to be used by more than one caller, and it would be re-usable in that case as well.

Also, what about removing the cipherpass? Right now we always remove it whenever the feature is disabled, but that won't work if more than one caller is using the same cipherpass. We could keep a reference count, or check in the caller for any server/global tables and only remove the cipherpass if none are found (or at least none that have key_encrypt enabled). That seems a bit messy but is maybe the right approach. How else would we ever know to remove the cipherpass? Do we even need to get rid of a cipherpass for a feature once it's added? We do want to be able to rotate them if needed. what do you thinK?

Copy link
Author

@nmoray nmoray Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am planning to have following design changes, let me know your thoughts on the same.

  • Every module who wants to use security Cipher application has to register with it first. As a part of registration, it has to specify the Feature type and a callback function which will further be used at the time of password rotation. This way, the security Cipher module will be having list of callbacks that needs to be called when a user wants to rotate the password. It's individual module's / application's job to implement that callback. The implementation would be, just re-encrypt and update the encrypted key in the respective CONFIG_DB tables. This way, we can have only one password per feature type and multiple modules can make use of same password for generating their unique encrypted key.
  • New API to set the password which will add a new password only if it is not already set.
  • Rotate API which will only be allowed to modify the existing password as it is having the ability to update the passkey (with re-encryption) in all the registered modules
  • Change the cipher_pass text file to a json file
  • Extend the cipher_pass to hold the list of callbacks per feature along with a password
    Something like,
{
  "RADIUS": {
    "callbacks": [
      "rotate_radius_server1_passkey",
      "rotate_radius_server2_passkey"
    ],
    "password": "TEST1"
  },
  "TACPLUS": {
    "callbacks": [
      "rotate_tacplus_global_passkey"
    ],
    "password": "TEST2"
  }
}

Copy link
Author

@nmoray nmoray Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anders-nexthop Please refer to sonic-net/sonic-buildimage#22711 PR. Here I have updated the existing design to support both rotate feature as well as the support of reusability of same the password.

ryanzhu706 and others added 9 commits June 25, 2025 11:40
…onic-net#3333)

Description
Add a check for ensuring mirror session ACLs are programmed to ASIC

What is the issue?
This fix is to address an issue where an ACL is added to CONFIG_DB, but before it could be programmed to ASIC, Orchagent is paused.
This leads to APPLY_VIEW failure when base image OA could not process this ACL entry and target image's OA still creates it.
The issue has an image fix available at sonic-net/sonic-sairedis#1240
This issue is very rare, and has been caught by upgrade path tests only once in thousands of iterations.

What is this fix?
A new logic is added to check if mirror session ACLs for arp and nd are added to ASIC..
ACLs are looked into ASIC_DB and matched using SAI_ACL_ENTRY_ATTR_PRIORITY attribute.
SAI_ACL_ENTRY_ATTR_PRIORITY for arp ACL is 8888 and for nd is 8887
If one of the ACLs is found missing then warmboot is aborted.

Tested on physical testbed running 202311 and master
…et#3353)

There's a difference in behavior when an external command is run under
the default mode vs when it is run under the alias mode. In the default
mode, execution control returns to the caller unless the command had a
non-zero exit code. In the alias mode, regardless of exit code, the
Python script exits. This may result in some tasks not completing.

Fix this by not unconditionally exiting if running a command in the
alias mode. Note that there are other differences still present, but
this fixes at least this one.

Signed-off-by: Saikrishna Arcot <[email protected]>
…INTEGRITY_DROP,SAI_QUEUE_STAT_CREDIT_WD_DELETED_PACKETS) for Voq/Fabric switches (sonic-net#3322)

What I did
Added cli support to show SAI_SWITCH_STAT_PACKET_INTEGRITY_DROP counter in show dropcounter counts command and show SAI_QUEUE_STAT_CREDIT_WD_DELETED_PACKETS counters in show queue counter --voq command.

How I did it
 Modified the dropstat and queuestat cli commands to show these new counters 
How to verify it
Simulated the Packet integrity (CRC, RQP errors) and Credit Watchdog delete drops (disabled the TX for the ports and simulated the credit watchdog deletes) and verified that the show commands are showing the correct output from COUNTERS_DB.

Previous command output (if the output of a command-line utility has changed)
New command output (if the output of a command-line utility has changed)
1)show dropcounter counts
2)show queue counter --voq

Signed-off-by: saksarav <[email protected]>
* [wcmp]: Add WCMP CLI.

Signed-off-by: Nazarii Hnydyn <[email protected]>
**What I did?**
1. Bugfix for console CLI (This is introduced by [consutil] replace shell=True sonic-net#2725, * cannot be treated as wildcard correctly).
```
admin@sonic:~$ show line
ls: cannot access '/dev/C0-*': No such file or directory
```
2. Enhance UT to avoid regression mentioned in 1.
3. Fix incorrect statement in UT.
4. Fix critical Flake8 error.

**How to verify it**
1. Verified on Nokia-7215 MC0 device.
2. Verified by UT

Sign-Off By: Zhijian Li <[email protected]>
#### What I did

Add `config` `checkpoint`, `rollback`, `replace`, `list-checkpoints`, `delete-checkpoint` support of Multi ASIC

#### How I did it
Add namespace for each of operation to support Multi ASIC.

#### How to verify it
1. Single ASIC
```admin@str2-msn2700-spy-1:~/gcu$ sudo config checkpoint 20240522-xincun
Config Rollbacker: Config checkpoint starting.
Config Rollbacker: Checkpoint name: 20240522-xincun.
Config Rollbacker: Getting current config db.
Config Rollbacker: Getting checkpoint full-path.
Config Rollbacker: Ensuring checkpoint directory exist.
Config Rollbacker: Saving config db content to /etc/sonic/checkpoints/20240522-xincun.cp.json.
Config Rollbacker: Config checkpoint completed.
Checkpoint created successfully.

admin@str2-msn2700-spy-1:~/gcu$ sudo config list-checkpoints 
[
    "20240522-xincun"
]

admin@str2-msn2700-spy-1:~/gcu$ sudo config rollback 20240522-xincun
Config Rollbacker: Config rollbacking starting.
Config Rollbacker: Checkpoint name: 20240522-xincun.
Config Rollbacker: Verifying '20240522-xincun' exists.
Config Rollbacker: Loading checkpoint into memory.
Config Rollbacker: Replacing config using 'Config Replacer'.
Config Replacer: Config replacement starting.
Config Replacer: Target config length: 71214.
Config Replacer: Getting current config db.
Config Replacer: Generating patch between target config and current config db.
Config Replacer: Applying patch using 'Patch Applier'.
Patch Applier: localhost: Patch application starting.
Patch Applier: localhost: Patch: []
Patch Applier: localhost getting current config db.
Patch Applier: localhost: simulating the target full config after applying the patch.
Patch Applier: localhost: validating all JsonPatch operations are permitted on the specified fields
Patch Applier: localhost: validating target config does not have empty tables,
                               since they do not show up in ConfigDb.
Patch Applier: localhost: sorting patch updates.
Patch Applier: The localhost patch was converted into 0 changes.
Patch Applier: localhost: applying 0 changes in order.
Patch Applier: localhost: verifying patch updates are reflected on ConfigDB.
Patch Applier: localhost patch application completed.
Config Replacer: Verifying config replacement is reflected on ConfigDB.
Config Replacer: Config replacement completed.
Config Rollbacker: Config rollbacking completed.
Config rolled back successfully.

admin@str2-msn2700-spy-1:~/gcu$ sudo config delete-checkpoint 20240522-xincun
Config Rollbacker: Deleting checkpoint starting.
Config Rollbacker: Checkpoint name: 20240522-xincun.
Config Rollbacker: Checking checkpoint exists.
Config Rollbacker: Deleting checkpoint.
Config Rollbacker: Deleting checkpoint completed.
Checkpoint deleted successfully.

admin@str2-msn2700-spy-1:~/gcu$ sudo config list-checkpoints 
[]
```

2. Multi ASIC
```
stli@str2-7250-2-lc01:~/gcu$ sudo config checkpoint 20240522-xincun
MultiASICConfigRollbacker: Config checkpoint starting.
MultiASICConfigRollbacker: Checkpoint name: 20240522-xincun.
MultiASICConfigRollbacker: Getting current  config db.
MultiASICConfigRollbacker: Getting current asic0 config db.
MultiASICConfigRollbacker: Getting current asic1 config db.
MultiASICConfigRollbacker: Getting checkpoint full-path.
MultiASICConfigRollbacker: Ensuring checkpoint directory exist.
MultiASICConfigRollbacker: Saving config db content to /etc/sonic/checkpoints/20240522-xincun.cp.json.
MultiASICConfigRollbacker: Config checkpoint completed.
Checkpoint created successfully.
stli@str2-7250-2-lc01:~/gcu$ sudo config list-checkpoints 
[
    "20240522-xincun"
]

stli@str2-7250-2-lc01:~/gcu$ sudo config rollback 20240522-xincun
MultiASICConfigRollbacker: Config rollbacking starting.
MultiASICConfigRollbacker: Checkpoint name: 20240522-xincun.
MultiASICConfigRollbacker: Verifying '20240522-xincun' exists.
MultiASICConfigRollbacker: Loading checkpoint '20240522-xincun' into memory.
MultiASICConfigRollbacker: Replacing config '20240522-xincun' using 'Config Replacer'.
Config Replacer: Config replacement starting.
Config Replacer: Target config length: 38147.
Config Replacer: Getting current config db.
Config Replacer: Generating patch between target config and current config db.
Config Replacer: Applying patch using 'Patch Applier'.
Patch Applier: localhost: Patch application starting.
Patch Applier: localhost: Patch: []
Patch Applier: localhost getting current config db.
Patch Applier: localhost: simulating the target full config after applying the patch.
Patch Applier: localhost: validating all JsonPatch operations are permitted on the specified fields
Patch Applier: localhost: validating target config does not have empty tables,
                               since they do not show up in ConfigDb.
Patch Applier: localhost: sorting patch updates.
Patch Applier: The localhost patch was converted into 0 changes.
Patch Applier: localhost: applying 0 changes in order.
Patch Applier: localhost: verifying patch updates are reflected on ConfigDB.
Patch Applier: localhost patch application completed.
Config Replacer: Verifying config replacement is reflected on ConfigDB.
Config Replacer: Config replacement completed.
Config Replacer: Config replacement starting.
Config Replacer: Target config length: 97546.
Config Replacer: Getting current config db.
Config Replacer: Generating patch between target config and current config db.
Config Replacer: Applying patch using 'Patch Applier'.
Patch Applier: asic0: Patch application starting.
Patch Applier: asic0: Patch: []
Patch Applier: asic0 getting current config db.
Patch Applier: asic0: simulating the target full config after applying the patch.
Patch Applier: asic0: validating all JsonPatch operations are permitted on the specified fields
Patch Applier: asic0: validating target config does not have empty tables,
                               since they do not show up in ConfigDb.
Patch Applier: asic0: sorting patch updates.
Patch Applier: The asic0 patch was converted into 0 changes.
Patch Applier: asic0: applying 0 changes in order.
Patch Applier: asic0: verifying patch updates are reflected on ConfigDB.
Patch Applier: asic0 patch application completed.
Config Replacer: Verifying config replacement is reflected on ConfigDB.
Config Replacer: Config replacement completed.
Config Replacer: Config replacement starting.
Config Replacer: Target config length: 97713.
Config Replacer: Getting current config db.
Config Replacer: Generating patch between target config and current config db.
Config Replacer: Applying patch using 'Patch Applier'.
Patch Applier: asic1: Patch application starting.
Patch Applier: asic1: Patch: []
Patch Applier: asic1 getting current config db.
Patch Applier: asic1: simulating the target full config after applying the patch.
Patch Applier: asic1: validating all JsonPatch operations are permitted on the specified fields
Patch Applier: asic1: validating target config does not have empty tables,
                               since they do not show up in ConfigDb.
Patch Applier: asic1: sorting patch updates.
Patch Applier: The asic1 patch was converted into 0 changes.
Patch Applier: asic1: applying 0 changes in order.
Patch Applier: asic1: verifying patch updates are reflected on ConfigDB.
Patch Applier: asic1 patch application completed.
Config Replacer: Verifying config replacement is reflected on ConfigDB.
Config Replacer: Config replacement completed.
MultiASICConfigRollbacker: Config rollbacking completed.
Config rolled back successfully.

stli@str2-7250-2-lc01:~/gcu$ sudo config delete-checkpoint 20240522-xincun
MultiASICConfigRollbacker: Deleting checkpoint starting.
MultiASICConfigRollbacker: Checkpoint name: 20240522-xincun.
MultiASICConfigRollbacker: Checking checkpoint: 20240522-xincun exists.
MultiASICConfigRollbacker: Deleting checkpoint: 20240522-xincun.
MultiASICConfigRollbacker: Deleting checkpoint: 20240522-xincun completed.
Checkpoint deleted successfully.
stli@str2-7250-2-lc01:~/gcu$ sudo config list-checkpoints 
[]

stli@str2-7250-2-lc01:~/gcu$ sudo config replace 20240522-xincun.cp.json 
Config Replacer: Config replacement starting.
Config Replacer: Target config length: 38147.
Config Replacer: Getting current config db.
Config Replacer: Generating patch between target config and current config db.
Config Replacer: Applying patch using 'Patch Applier'.
Patch Applier: localhost: Patch application starting.
Patch Applier: localhost: Patch: []
Patch Applier: localhost getting current config db.
Patch Applier: localhost: simulating the target full config after applying the patch.
Patch Applier: localhost: validating all JsonPatch operations are permitted on the specified fields
Patch Applier: localhost: validating target config does not have empty tables,
                               since they do not show up in ConfigDb.
Patch Applier: localhost: sorting patch updates.
Patch Applier: The localhost patch was converted into 0 changes.
Patch Applier: localhost: applying 0 changes in order.
Patch Applier: localhost: verifying patch updates are reflected on ConfigDB.
Patch Applier: localhost patch application completed.
Config Replacer: Verifying config replacement is reflected on ConfigDB.
Config Replacer: Config replacement completed.
Config Replacer: Config replacement starting.
Config Replacer: Target config length: 97546.
Config Replacer: Getting current config db.
Config Replacer: Generating patch between target config and current config db.
Config Replacer: Applying patch using 'Patch Applier'.
Patch Applier: asic0: Patch application starting.
Patch Applier: asic0: Patch: []
Patch Applier: asic0 getting current config db.
Patch Applier: asic0: simulating the target full config after applying the patch.
Patch Applier: asic0: validating all JsonPatch operations are permitted on the specified fields
Patch Applier: asic0: validating target config does not have empty tables,
                               since they do not show up in ConfigDb.
Patch Applier: asic0: sorting patch updates.
Patch Applier: The asic0 patch was converted into 0 changes.
Patch Applier: asic0: applying 0 changes in order.
Patch Applier: asic0: verifying patch updates are reflected on ConfigDB.
Patch Applier: asic0 patch application completed.
Config Replacer: Verifying config replacement is reflected on ConfigDB.
Config Replacer: Config replacement completed.
Config Replacer: Config replacement starting.
Config Replacer: Target config length: 97713.
Config Replacer: Getting current config db.
Config Replacer: Generating patch between target config and current config db.
Config Replacer: Applying patch using 'Patch Applier'.
Patch Applier: asic1: Patch application starting.
Patch Applier: asic1: Patch: []
Patch Applier: asic1 getting current config db.
Patch Applier: asic1: simulating the target full config after applying the patch.
Patch Applier: asic1: validating all JsonPatch operations are permitted on the specified fields
Patch Applier: asic1: validating target config does not have empty tables,
                               since they do not show up in ConfigDb.
Patch Applier: asic1: sorting patch updates.
Patch Applier: The asic1 patch was converted into 0 changes.
Patch Applier: asic1: applying 0 changes in order.
Patch Applier: asic1: verifying patch updates are reflected on ConfigDB.
Patch Applier: asic1 patch application completed.
Config Replacer: Verifying config replacement is reflected on ConfigDB.
Config Replacer: Config replacement completed.
Config replaced successfully.
```
* Improve load_mingraph to wait eth0 restart before exist
harjotsinghpawra and others added 23 commits June 25, 2025 11:40
…ic-net#3884)

New prober_type harware is being added to linkmgrd. This is config knob addition for the specific field in mUX_CABLE
show command show muxcable config edited to add prober_type field
What I did
Added show command to display icmp echo offload sessions

How I did it
added show utility icmp.py
How to verify it
added pytests to verify
…t#3755)

What I did
Currently all AAA config commands create a new ConfigDBConnector object for every operation. This doesn't interact well with the pytest mock infrastructure used in the unit tests. I changed the AAA config commands to accept a ConfigDBConnector from the top-level command, which follows the standard pattern used by other config commands (such as those in sonic-utilities/config/main.py).

I fixed up tests so that they actually test config functionality, instead of manually manipulating the mock DB to return valid show results.

Some expected output in tests was not in line with actual product output. I updated the expected output in these cases.

fixes sonic-net#3754

How I did it
Convert AAA config commands to use the pass_db decorator pattern, which is used by most other commands in sonic-utilities. This decorator allows the ConfigDBConnector to be passed in directly.

I went through sonic-utilities/tests/aaa_test.py and sonic-utilities/tests/radius_test.py, and removed all instances where manual database manipulation was being done unnecessarily (such as here, for example).

How to verify it
Removing the db modification from the AAA tests causes them to fail before my changes. After my changes, all AAA tests pass.
…c-net#3872)

* Fix warm-reboot script so it can be run via reboot DBus service.
Use whoami if logname failed.

* Add UT

* Fix review comments

* Update scripts/fast-reboot

---------
…onic-net#3860)

What I did
Enabling RADIUS statistics adds config entries which are not valid in YANG. The option adds the True/False boolean values instead of the 'true'/'false' booleans that are defined in the YANG model, which causes failures when trying to reload/replace the config.

Fixes: #22407

How I did it
Modified relevant config command to use the correct values.

How to verify it
Manually verified.

I added a test case for config radius statistics command. Note that this test follows the pattern in this file, which is not actually exercising the correct codepath (see sonic-net#3754 for details).
…erfaces names (sonic-net#3894)

What I did
Add support for --non_zero/-nz option for portstat (--nonzero for show interfaces counters) to display only meaningful counter records
Define a new sorting function that only looks at the digits in the interface names. So that when it prints the counter records of some devices that define interfaces aliases heterogeneously (e.g., Cisco packet chassis define interface aliases according to port speed), it will still sort the records based on interface indexes only. We use it when collecting counters from LC on supervisor
Refactor the implementation of portstat class to reduce duplicate code.
How I did it
Omit an interface's counter record if all fields have zero value, when "--non_zero" is present
Use regular expression to extract the digits in the interface names and sort the records based on the digits
Use cnstat_diff_print (which is more generic) to replace the cnstat_print

* add a new default mocked counters_db.json for portstat to replace the original global counters_db.json

* fix for missing rate data

* fix for UT failures

* fix syntax error

* fix incorrect import name

* fix incorrect expected output

* add debugging logs

* move debugging log

* fix incomplete setting and expected output

* remove debugging log

* add assert for debugging

* mock is_chassis

* fix a typo

* add more debugging assertions

* remove useless assertion and fix UT

* enable -d frontend for all platforms

* implement a new way to judge nonzero

* change a parameter  in UT

* fix formatting

* add missing trailing line

* add one missing statement

* fix typo

* add --nonzero for show interfaces counters fec-stat

* refactor the code to reduce redundancy

* fix for formatting

* fix inconsistent behavior

* fix UT output and setup

* adapt is_non_zero to string with commas

* fix dump copilot recommended code

* make sorting method consistent
…er-vrf basis (sonic-net#3866)

What I did
Modified the existing 'show ip bgp [summary/neighbors/network]' CLIs to display bgp information for specific vrfs. The new CLIs have similar outputs to their default VRF counterparts:

show ip bgp vrf {vrf_name} summary
show ip bgp vrf {vrf_name} network
show ip bgp vrf {vrf_name} neighbors
show ipv6 bgp vrf {vrf_name} summary
show ipv6 bgp vrf {vrf_name} network
show ipv6 bgp vrf {vrf_name} neighbors
How I did it
Modified bgp_frr_v4.py and bgp_frr_v6.py to add a subcommand 'vrf' that accepts a vrf_name as an argument

How to verify it
UT coverage in show_bgp_neighbor_test.py, show_bgp_neighbor_test.py and bgp_commands_test.py
…ion calls (sonic-net#3900)

* Change function names

* Test fix

* Fix test coverage

* Fix tests

* Fix test issues

* Fix test
What I did
Fixed migration error during upgrade from 202411_02.

How I did it
Added version_202411_02 function in db_migrator to support proper migration during upgrade
(called dynamically via getattr from the migrate function).
Updated db_migrator_test accordingly.
How to verify it
Manual:

Use canonical setup with 202411 image
Verify the version in config_db and Redis is version_202411_02
hget VERSIONS|DATABASE VERSION
vim /etc/sonic/config_db.json
Install master image and reboot
Check syslog for db_migrator errors or run config reload
Automatic:

Use canonical setup with 202411 image
Verify the version in config_db and Redis is version_202411_02
Run test_deploy-and-upgrade
Additional Option:

Verify the version in config_db and Redis is version_202411_02
Run the following command to migrate (to 202505_01):
/usr/local/bin/db_migrator.py -o migrate
…onic-net#3902)

What I did
Aligned the output format of intfstat with comma formatting.

How I did it
I used the format_number_with_comma function from netstat to print numeric output with commas where it was missing

How to verify it

I cleared the existing cache
I verified the output format of - "intfstat", "intfstat -i Ethernet64", "intfstat -p 5", "intfstat -j"
I created a new cache - "intfstat -c"
I verified again the output format of - "intfstat", "intfstat -i Ethernet64", "intfstat -p 5", "intfstat -j"
Additionally, I've updated intfstat_test.py to cover this behavior
What I did
Added sai.xml file from switch to sysdump

How I did it
Copied sai.xml file to sai_sdk_dump folder in generate_dump file.

How to verify it
Verify the sai.xml file in sai_sdk_dump folder after generating sysdump using "show techsupport" command
…t#3881)

Fixes sonic-net#3883

What I did
Enable BUFFER_PROFILE replace operation on dynamic_th field and
PFC_WD field operations on marvell-teralynx platform starting 20241100 Sonic version

How I did it
Added marvell-teralynx platform name to gcu_field_operation_validators.conf.json.

How to verify it
cat buffer_profile.json
[
  {
     "op": "replace",
     "path": "/BUFFER_PROFILE/pg_lossless_50000_5m_profile/dynamic_th",
     "value": "2"
  }
]
Verify above configuration on marvell-teralynx platform using below command:

config apply-patch buffer_profile.json
Patch application starting.
Patch: [{"op": "replace", "path": "/BUFFER_PROFILE/pg_lossless_50000_5m_profile/dynamic_th", "value": "2"}, {"op": "replace", "path": "/BUFFER_PROFILE/pg_lossless_50000_40m_profile/dynamic_th", "value": "2"}]
localhost getting current config db.
localhost: simulating the target full config after applying the patch.
localhost: validating all JsonPatch operations are permitted on the specified fields
Patch Applier: localhost: validating target config does not have empty tables, since they do not show up in ConfigDb.
Patch Applier: localhost: sorting patch updates.
Patch Applier: The localhost patch was converted into 2 changes:
Patch Applier: localhost: applying 2 changes in order:
Patch Applier:   * [{"op": "replace", "path": "/BUFFER_PROFILE/pg_lossless_50000_5m_profile/dynamic_th", "value": "2"}]
Patch Applier: localhost: verifying patch updates are reflected on ConfigDB.
Patch Applier: localhost patch application completed.
Patch applied successfully.
cat pfcwd.json
[
  {
     "op": "add",
     "path": "/PFC_WD/GLOBAL/POLL_INTERVAL",
     "value": "390"
 }
]
config apply-patch pfcwd.json
Patch Applier: localhost: Patch application starting.
Patch Applier: localhost: Patch: [{"op": "add", "path": "/PFC_WD/GLOBAL/POLL_INTERVAL", "value": "390"}]
Patch Applier: localhost getting current config db.
Patch Applier: localhost: simulating the target full config after applying the patch.
Patch Applier: localhost: validating all JsonPatch operations are permitted on the specified fields
Patch Applier: localhost: validating target config does not have empty tables,
                            since they do not show up in ConfigDb.
Patch Applier: localhost: sorting patch updates.
Patch Applier: The localhost patch was converted into 1 change:
Patch Applier: localhost: applying 1 change in order:
Patch Applier:   * [{"op": "replace", "path": "/PFC_WD/GLOBAL/POLL_INTERVAL", "value": "390"}]
Patch Applier: localhost: verifying patch updates are reflected on ConfigDB.
Patch Applier: localhost patch application completed.
Patch applied successfully.
Related PTF test cases:

generic_config_updater/test_mmu_dynamic_threshold_config_update.py::test_dynamic_th_config_updates
generic_config_updater/test_pfcwd_interval.py::test_pfcwd_interval_config_updates
Log:
sonic_mgmt_test_success_log.txt
What I did
Add new reboot-cause: Kernel Panic - Out of memory

How I did it
Existing code already supports "Kernel Panic" as reboot cause. By checking dmesg for kernel panic, overwrite the reboot cause as "Kernel Panic - Out of memory" when possible.

How to verify it
Enable kdump: config kdump enable; config save; reboot
Trigger OOM: python3 -c "blocks=[]; [blocks.append(bytearray(50010241024)) for _ in iter(int, 1)]"
Check reboot cause: "show reboot-cause" outputs "Kernel Panic - Out of memory [Time: Tue Jun 10 05:33:16 PM UTC 2025]"
Previous command output (if the output of a command-line utility has changed)
show reboot-cause
Kernel Panic [Time: Tue Jun 10 03:40:28 PM UTC 2025]

New command output (if the output of a command-line utility has changed)
show reboot-cause
Kernel Panic - Out of memory [Time: Tue Jun 10 05:33:16 PM UTC 2025]
…ic-net#3911)

* [sfputil] Use host lane mask as part of rx-output enable/disable

Signed-off-by: Mihir Patel <[email protected]>

* Fixed pre-commit failures

* Fixed pre-commit failure

* Modified the state restore logic in case of failure

* Removed output restoration in case of an error

* Modified get_subport to return 0 if subport field does not exists

* Returning integer in get_subport

---------

Signed-off-by: Mihir Patel <[email protected]>
What I did:

Added CoPP show commands:

show copp configuration
show copp configuration detailed --trapid <trap_id>
show copp configuration detailed --group <group>
Added UT for the CLI commands.

Why I did it

Details in HLD: sonic-net/SONiC#1943

How to verify it

Verified with UT
… add check_sids (sonic-net#3919)

* skip static route in check_frr_pending_routes

* add check_sids in route_check.py

* add a way to combine results from check_routes and check_sids

* add UTs for new features

* fix the way to combine results

* fix test data

* add missing return statement

* correct the implementation of get_appdb_sids
…n_config.json files provided (sonic-net#3895)

What I did
Fix the issue introduced by Enable config reload yang validation for multiasic sonic-net#3825.
Fix the config reload flow in case when multiple golden_config.json files provided.

How I did it
Add check for multi-asic.

How to verify it
Manual verification by running the:

cisco@sonic-lc5:~$ sudo config reload -y -f -l /etc/sonic/running_golden_config.json,/etc/sonic/running_golden_config0.json,/etc/sonic/running_golden_config1.json,/etc/sonic/running_golden_config2.json
Run the whole sonic-utilities UT suite.

Previous command output (if the output of a command-line utility has changed)
cisco@sonic-lc5:~$ sudo config reload -y -f -l /etc/sonic/running_golden_config.json,/etc/sonic/running_golden_config0.json,/etc/sonic/running_golden_config1.json,/etc/sonic/running_golden_config2.json
Acquired lock on /etc/sonic/reload.lock
Released lock on /etc/sonic/reload.lock
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/config/main.py", line 158, in read_json_file
    with open(fileName) as f:
         ^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/etc/sonic/running_golden_config.json,/etc/sonic/running_golden_config0.json,/etc/sonic/running_golden_config1.json,/etc/sonic/running_golden_config2.json'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/config", line 8, in <module>
    sys.exit(config())
             ^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/click/decorators.py", line 64, in new_func
    return ctx.invoke(f, obj, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/utilities_common/flock.py", line 71, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/config/main.py", line 1884, in reload
    config_file_yang_validation(filename)
  File "/usr/local/lib/python3.11/dist-packages/config/main.py", line 1410, in config_file_yang_validation
    config = read_json_file(filename)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/config/main.py", line 161, in read_json_file
    raise Exception(str(e))
Exception: [Errno 2] No such file or directory: '/etc/sonic/running_golden_config.json,/etc/sonic/running_golden_config0.json,/etc/sonic/running_golden_config1.json,/etc/sonic/running_golden_config2.json'
New command output (if the output of a command-line utility has changed)
cisco@sonic-lc5:~$ sudo config reload -y -f -l /etc/sonic/running_golden_config.json,/etc/sonic/running_golden_config0.json,/etc/sonic/running_golden_config1.json,/etc/sonic/running_golden_config2.json
Acquired lock on /etc/sonic/reload.lock
Disabling container and routeCheck monitoring ...
Stopping SONiC target ...
Running command: /usr/local/bin/sonic-cfggen -H -k Cisco-88-LC0-36FH-M-O36 --write-to-db
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/running_golden_config.json --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Running command: /usr/local/bin/sonic-cfggen -H -k Cisco-88-LC0-36FH-M-O36 -n asic0 --write-to-db
Warning: Failed to get device ID from asic.conf file for asic0
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/running_golden_config0.json -n asic0 --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate -n asic0
Running command: /usr/local/bin/sonic-cfggen -H -k Cisco-88-LC0-36FH-M-O36 -n asic1 --write-to-db
Warning: Failed to get device ID from asic.conf file for asic1
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/running_golden_config1.json -n asic1 --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate -n asic1
Running command: /usr/local/bin/sonic-cfggen -H -k Cisco-88-LC0-36FH-M-O36 -n asic2 --write-to-db
Warning: Failed to get device ID from asic.conf file for asic2
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/running_golden_config2.json -n asic2 --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate -n asic2
Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment
Restarting SONiC target ...
Enabling container and routeCheck monitoring ...
Reloading Monit configuration ...
Reinitializing monit daemon
Released lock on /etc/sonic/reload.lock

Signed-off-by: vhlushko <[email protected]>
@mssonicbld
Copy link
Collaborator

/azp run

@linux-foundation-easycla
Copy link

CLA Not Signed

@azure-pipelines
Copy link

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@nmoray nmoray closed this Jun 25, 2025
@nmoray nmoray deleted the tacacs_encrypt branch June 25, 2025 12:03
@nmoray
Copy link
Author

nmoray commented Jun 25, 2025

@anders-nexthop @qiluo-msft This PR has been closed and created a new one with updated changes. #3936.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.