-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Mellanox SDK and PRM Sniffer Utility CLI Design for SONiC
| Rev | Date | Author | Change Description |
|---|---|---|---|
| 0.1 | Liu Kebo | Initial version |
This document is intended to provide information about the Mellanox SDK/FW sniffer utilities and how to implement a CLI to use these utilities in SONiC system.
SDK sniffer will record the RPC calls from the Mellanox SDK user API library to the sx_sdk task into a .pcap file. This .pcap file can be replayed afterward to get the exact same state on SDK and FW to reproduce and investigate issues.
In some case if we want to detect the interaction between the SDK and FW, we can enable the PRM sniffer to record the communication to human readable format log file, then MLNX support team can analyze this log file to identify where the problem is.
These two sniffers are independent and can work simultaneously.
To enable these two sniffers, need to set some specific environment variable and restart the SDK again.
The new CLI shall provide a user interface to expose the above SDK and PRM sniffer debug utilities in SONiC system.
The enable/disable of SDK and PRM sniffer are controlled by some environment variable which will be passed to SDK task during the startup.
In SONiC case, SDK task resides in syncd container, thus to enable/disable the sniffers need to set/unset some environment variables of syncd container. One possible way is to remove the old container first and then pass the expected environment variable to a new container with docker run command.
For the convenience of debugging, the sniffer file shall be stored in the host file system instead of in the container, to achieve this volume will be used to bind a directory of the host file system to a directory of the container. This also can be done with add volume bind options to docker run command.
A new folder will be created to store the sniffer files: "/var/log/mellanox/sniffer/"
For the SDK sniffer, result file will be stored in a .pcap file, which includes a time stamp of the starting time in the file name, for example, "sx_sdk_sniffer_20180224081306.pcap".
PRM sniffer result file name will also contain a starting timestamp, like "prm_recording_20180225111422.log".
So the major work of this CLI will be composed of a set of actions which manipulate the syncd docker container and restart the related services:
- Stop the old syncd container
- Remove the old syncd container
- Recreate the container, set/reset desired ENV variable and volume when running it
- Restart the swss service to reload all the related modules/services
Whenever enable or disable the sniffer, need to stop and remove the original syncd docker container first, it can be achieved by following docker command:
docker stop syncd
docker rm syncd
After running these two commands, the system is ready to have a new container that with extra environment variable and volume.
If any of previous two steps failed, need to restart the SWSS service to restore the system to normal status.
The start and stop of syncd container is controlled by the script "/usr/bin/syncd.sh". Inside this script we can find the original command to create syncd container:
docker run -d --net=host --privileged -t -v /host/machine.conf:/etc/machine.conf -v /etc/sonic:/etc/sonic:ro \
--log-opt max-size=2M --log-opt max-file=5 \
-v /var/run/redis:/var/run/redis:rw \
-v /usr/share/sonic/device/$PLATFORM:/usr/share/sonic/platform:ro \
-v /usr/share/sonic/device/$PLATFORM/$HWSKU:/usr/share/sonic/hwsku:ro \
--tmpfs /tmp \
--tmpfs /var/tmp \
--name=syncd docker-syncd-mlnx-rpc:latest
Sniffer file will be stored in the directory /var/log/sniffer/ of the host, to set the volume, one option need to be added:
-v /var/log/mellanox/sniffer:/var/log/mellanox/sniffer:rw
To enable the SDK/PRM sniffer need to pass 4 environment variables to the syncd container to have the SDK started with sniffer enabled:
-e SX_SNIFFER_ENABLE
-e SX_SNIFFER_TARGET
-e PRM_SNIFFER
-e PRM_SNIFFER_FILE_PATH
Above mentioned new options will be added "syncd.sh" for Mellanox platform by manipulating the building template.
To recreate the new syncd container, set the expected environment variables first, then run command:
/usr/bin/syncd.sh start
To have the whole system work properly after syncd container recreated, some related modules and service also need to be restarted. SWSS service restart can guarantee all the impacted modules and services be restarted in the proper sequence. The command is :
service swss restart
Sniffer CLI will be implemented to run the commands mentioned above to enable or disable the SDK/PRM sniffer, or both of them.
SONiC:# config platform mlnx sniffer ?
Usage: sniffer [OPTIONS] COMMAND [ARGS]...
SONiC command line - 'Sniffer' command
Options:
-?, -h, --help Show this message and exit.
Commands:
sdk sdk sniffer
prm prm sniffer
all all sniffers
SONiC# config platform mlnx sniffer sdk ?
Usage: sniffer sdk [OPTIONS] COMMAND [ARGS]...
SDK Sniffers
Options:
-?, -h, --help Show this message and exit.
Commands:
enable Enable SDK sniffer
disable Disable SDK sniffer
SONiC# config platform mlnx sniffer prm ?
Usage: sniffer disable [OPTIONS] COMMAND [ARGS]...
PRM Sniffers
Options:
-?, -h, --help Show this message and exit.
Commands:
enable Enable PRM sniffer
disable Disable PRM sniffer
SONiC# config platform mlnx sniffer all ?
Usage: sniffer disable [OPTIONS] COMMAND [ARGS]...
SDK and PRM Sniffers
Options:
-?, -h, --help Show this message and exit.
Commands:
enable Enable SDK and PRM sniffer
disable Disable SDK and PRM sniffer
When sniffer enable/disable command are issued, a prompt for SWSS service restart will be showed and user needs to agree to proceed, or the command will be canceled.
Sniffer files names will also be shown after issuing the command.
Will log rotate be required?
For the PRM sniffer, it will generate a log file by default. Multi-files should not impact the analysis, maybe can consider doing log rotate for PRM sniffer file.
-
For Users
-
For Developers
-
Subgroups/Working Groups
-
Presentations
-
Join Us