-
Notifications
You must be signed in to change notification settings - Fork 26
[PERFSCALE-4024] [RoCE] ib_write_bw support #199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
pkg/drivers/ibwritebw.go
Outdated
| config.Show(nc, i.driverName) | ||
|
|
||
| // ib_write_bw client command: "ib_write_bw -d mlx5_0 -x 3 -F $server_ip" | ||
| cmd := []string{"stdbuf", "-oL", "-eL", "ib_write_bw", "-d", "mlx5_0", "-x", "3", "-F", serverIP} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we take the -d out? or have the user provide that? Looking at the docs, we could simply remove it and it pick the first ib device.... I just worry about hard coding specifics...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
This is the new format --ib-write-bw=mlx5_0:3
|
@josecastillolema checking in here |
|
Sorry @jtaleric I was prioritizing other stuff. |
|
@josecastillolema rebase pls |
throughtput doesn't seem too high |
45bea8e to
d7e7a8a
Compare
@rsevilla87 I have double checked and for this particular test and packet size the data is correct, this comes from the output of ib_write_bw: |
|
Don't know what is going on with the linter, is complaining about files this PR does not modify, i.e.: |
|
@rsevilla87 @jtaleric can I have some eyes here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR has now many more content than initially, i.e: all the changes in kubernetes.go, can you move those code optimization changes to a new PR?
The changes of |
| log.Fatal("flags --udnl2 and --udnl3 are mutually exclusive; please set only one") | ||
| } | ||
| if ibWriteBwEnabled && (!privileged || !hostNetOnly) { | ||
| log.Fatalf("😭 ib_write_bw driver requires both --privileged and --hostNet flags") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than aborting, why don't you set these flags when ib_write_bw is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think aborting and requiring the proper command from the user is the proper way to go.
Otherwise we would have to deal with multiple invalid combinations that are valid without the privileged and hostNetwork flags, i.e.: --vm, --udn, etc.
|
|
||
| // If a specific driver is explicitly requested, disable the default netperf driver | ||
| if (iperf3 || uperf || ibWriteBwEnabled) && !cmd.Flags().Changed("netperf") { | ||
| netperf = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to support combining netperf with other drivers, the logic starting in line 270 has that intention
| rootCmd.Flags().BoolVar(&netperf, "netperf", true, "Use netperf as load driver") | ||
| rootCmd.Flags().BoolVar(&iperf3, "iperf", false, "Use iperf3 as load driver") | ||
| rootCmd.Flags().BoolVar(&uperf, "uperf", false, "Use uperf as load driver") | ||
| rootCmd.Flags().StringVar(&ibWriteBw, "ib-write-bw", "", "Use ib_write_bw as load driver, requires nic:gid format (e.g., mlx5_0:0, requires --privileged and --hostNet)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if at this point is better to use a StringSlice flag
rootCmd.Flags().StringSliceVar(&RequestedDrivers, "drivers", []string{"netperf"}, fmt.Sprintf("Comma separated list of metrics drivers to use, supported values: %s", supportedDrivers))so we can run k8s-netperf with
k8s-netperf run --drivers netperf,uperf,ib_write_bw --ib-write-bw=mlx5_0:0 --config bla.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using this logic would save some lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will break local CI, Prow, etc.
Are we sure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, we can address that in a different PR
| return stdout, err | ||
| } | ||
| } else { | ||
| retry := 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the hardcoded retry logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is exactly the same approach (different number, iperf uses 10), as iperf implements: https://github.com/cloud-bulldozer/k8s-netperf/blob/main/pkg/drivers/iperf.go#L129
e298803 to
6f3966a
Compare
0583bc5 to
870a3dc
Compare
Signed-off-by: Jose Castillo Lema <josecastillolema@gmail.com>
Signed-off-by: Jose Castillo Lema <josecastillolema@gmail.com>
Signed-off-by: Jose Castillo Lema <josecastillolema@gmail.com>
Signed-off-by: Jose Castillo Lema <josecastillolema@gmail.com>
Signed-off-by: Jose Castillo Lema <josecastillolema@gmail.com>
Signed-off-by: Jose Castillo Lema <josecastillolema@gmail.com>
Signed-off-by: Jose Castillo Lema <josecastillolema@gmail.com>
Signed-off-by: Jose Castillo Lema <josecastillolema@gmail.com>
Signed-off-by: Jose Castillo Lema <josecastillolema@gmail.com>
Signed-off-by: Jose Castillo Lema <josecastillolema@gmail.com>
Signed-off-by: Jose Castillo Lema <josecastillolema@gmail.com>
Signed-off-by: Jose Castillo Lema <josecastillolema@gmail.com>
Signed-off-by: Jose Castillo Lema <josecastillolema@gmail.com>
Signed-off-by: Jose Castillo Lema <josecastillolema@gmail.com>
Signed-off-by: Jose Castillo Lema <josecastillolema@gmail.com>
870a3dc to
f73648d
Compare
Updated ibwritebw driver to match the new ExtractUdnIp function signature that requires a network name parameter. Also added support for Cudn networks. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Jose Castillo Lema <josecastillolema@gmail.com>
Signed-off-by: Jose Castillo Lema <josecastillolema@gmail.com>
Signed-off-by: Jose Castillo Lema <josecastillolema@gmail.com>
Signed-off-by: Jose Castillo Lema <josecastillolema@gmail.com>
Signed-off-by: Jose Castillo Lema <josecastillolema@gmail.com>
Signed-off-by: Jose Castillo Lema <josecastillolema@gmail.com>
Signed-off-by: Jose Castillo Lema <josecastillolema@gmail.com>
Signed-off-by: Jose Castillo Lema <josecastillolema@gmail.com>
|
@rsevilla87 ready for review, I have addressed most of the comments |
jtaleric
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work @josecastillolema !!!
Type of change
Description
New
ib_write_bwdriver:It only supports
UDP_STREAMand it does not support services.It also adds some logic to be more selective in the containers that runs in the server pod (vs the old model where all the drivers were run independently of the test). This is because the
ib_write_bwneeds specific hw so it makes no sense to run in unconditionally.