Skip to content

Conversation

@ardo-nordic
Copy link
Contributor

Bsim harness in twister has been modified to be able to run tests. The main changes it entailed are as following:

  • For bsim tests, compile/build stage needs to be done before tests are run, that requires rearranging tasks in twister pipeline
  • Several bsim harness parameters added (optional), as well as option to specify common part of the harness config
  • To avoid unnecessary recompiling of tests, in some test suites there are build-only tests that prepare required binaries, and test only ('no_build' option) tests which rely on other build tests to prepare binaries
  • All shell scripts that were used to run the tests are no longer needed

Atm, only host is extensively using bsim harness, and thus is the only one affected.

@aescolar
Copy link
Member

aescolar commented Feb 11, 2025

I'm quite surprised by this PR. Not just because it appeared quite out of the blue for me, but also because there has been quite related work by others (CC @gchwier ) in these area this overlaps with:

We have had this want for very long, but so far we have not succeeded. So instead of doing the same we have done in previous PRs where we discussed some details and ended up spending time polishing something which was rejected on principal issues, I would suggest we have a talk about it between the more involved and relevant people, try to reach an agreement on the principal, and if we do, then look at the particular details.

@ardo-nordic ardo-nordic force-pushed the run_bsim_with_twister branch 2 times, most recently from 87f98dd to 8f1a143 Compare February 14, 2025 10:01
@aescolar aescolar added this to the v4.2.0 milestone Feb 14, 2025
@aescolar
Copy link
Member

I just added the 4.2 milestone, as freeze for 4.1 is today, and this is neither ready not urgent. So to make it clear this is not trying to get into this release.

@Thalley
Copy link
Contributor

Thalley commented Feb 14, 2025

@ardo-nordic next time consider opening such PRs as draft until they pass CI to avoid unnecessary noise for reviewers :)

@ardo-nordic ardo-nordic marked this pull request as draft February 14, 2025 10:46
@ardo-nordic ardo-nordic force-pushed the run_bsim_with_twister branch 2 times, most recently from f747a95 to dcd499b Compare February 18, 2025 13:12
@ardo-nordic ardo-nordic marked this pull request as ready for review February 19, 2025 16:56
@ardo-nordic ardo-nordic changed the title Run bsim tests with twister [RFC] Run bsim tests with twister Feb 20, 2025

def _generate_commands(self):
def rs():
return f'-rs={random.randint(0, 2**10 - 1)}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be against having a random seed in tests unless the test ask for it. And even then, it would be better if the random value was printed somehow so that the test can be reproduced easily.

We discussed it with @alwa-nordic and it may be better to use the device ID (same as -d) as the random seed.

Copy link
Member

@aescolar aescolar Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The harness should not set the random seed on its own, and less to an actual random value.

We discussed it with @alwa-nordic and it may be better to use the device ID (same as -d) as the random see

In the executable, the random seed already defaults to a value that depends on the device number. There is no need to do that again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the executable, the random seed already defaults to a value that depends on the device number. There is no need to do that again.

Okay, yeah I was pretty sure there was already something like that implemented but I wanted to test it today to confirm it ^^

Comment on lines 1128 to 1126
for file in [f for f in files if os.path.getctime(f) > self._start_time]:
os.remove(file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for file in [f for f in files if os.path.getctime(f) > self._start_time]:
os.remove(file)
for file in files:
if os.path.getctime(file) > self._start_time:
os.remove(file)

I think it's more readable like that.

But I am wondering why you are looking at the change time? What is the reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the files that were created by the test. Using generators with conditions is a standard practice in Python.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you are right list comprehension are common practice, mixing the operations (filtering and the iteration) isn't a good practice.

You could do something like that if you want to keep the list comprehension:

Suggested change
for file in [f for f in files if os.path.getctime(f) > self._start_time]:
os.remove(file)
recent_files = [f for f in files if os.path.getctime(f) > self._start_time]
for file in recent_files:
os.remove(file)


def expand_args(dev_id):
try:
args = [str(eval(v)[dev_id]) if v.startswith('[') else v for v in extra_args]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit hard for me to understand why you are using eval() here. But this is something you should avoid if possible. It may execute arbitrary code.

@alwa-nordic is suggesting to use the str.format method instead.

Copy link
Contributor

@alwa-nordic alwa-nordic Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have misunderstood what this does when I gave my suggestion. I assumed it's some kind of variable interpolation. Now I see it's for defining separate bsim_options per device, right?

I suggest we transpose bsim_exe_name, bsim_test_ids, and bsim_options in the config, so it looks like a script. What do you think? Example below:

devices:
    - exe: foo.exe
      bsim_testid: central
      custom_args:
        - -flash=bar.nvm.bin
        - -argstest
        - 10
    - exe: foo.exe
      bsim_testid: peripheral
      custom_args:
        - -flash=baz.nvm.bin
        - -argstest
        - 10

self._start_time = time.time()

def _get_exe_path(self, index):
return self._exe_paths[index if index < len(self._exe_paths) else 0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if returning the first element when the index is out of range is safe to do. Is it necessary? Seems like a bad practice that is hiding bugs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serves the purpose of handling the case (quite common one) when the same executable runs on all simulated devices.


def wait_bsim_ready(self):
start_time = time.time()
while time.time() - start_time < Bsim.BSIM_READY_TIMEOUT_S:
Copy link
Member

@aescolar aescolar Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nicer to have something smarter than this. As an early stopgap it may be ok though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this code do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's here to ensure that executable file needed for the test is compiled and ready.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So twister does not necessarily complete all build jobs before running? Won't this potentially suffer from a deadlock, e.g. when -j1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you for example run a test suite with one build test and a few run-only tests, and have 8 threads for this, run-only tests need this wait mechanism because they'll be attempted to launch at the same time as build test. With one thread, there will be no waiting in this function because when it's going to be called all binaries will be available.


def _generate_commands(self):
def rs():
return f'-rs={random.randint(0, 2**10 - 1)}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no way we can have this.
a) we do not want actual randomness.
b) we have known issues that are triggered or not depending on the random seed, so this would mean CI would start failing at random in some runs. This is not something we can have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove it.


def _generate_commands(self):
def rs():
return f'-rs={random.randint(0, 2**10 - 1)}'
Copy link
Member

@aescolar aescolar Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The harness should not set the random seed on its own, and less to an actual random value.

We discussed it with @alwa-nordic and it may be better to use the device ID (same as -d) as the random see

In the executable, the random seed already defaults to a value that depends on the device number. There is no need to do that again.

Comment on lines +16 to +13
bsim_options:
- -RealEncryption=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the - -?

Could RealEncryption have been it's own entry in yaml like real_encryption: true?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

real_encryption: true

I would really not go that way. Which options you can pass to an executable is an unbound and unknown set. They depend on the test, configuration and target, and are not the problem of the twister harness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's a pretty long list of arguments which will easily double the size of the schema if we add all of them.

Comment on lines 18 to 14
- -argstest

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doens't -argstest describe that the next arguments are test arguments? Where are the arguments here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the common options are prepended to bsim_options below. I also think this is awkward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Common section contains common options, allows to shorten the list when there is a lot of duplication so that it's possible to clearly see only the options which are different. However, that's not mandatory to use.

Comment on lines 23 to 38
bluetooth.host.adv.extended.build_scanner:
build_only: true
harness_config:
bsim_exe_name: tests_bsim_bluetooth_host_adv_extended_prj_scanner_conf
bsim_exe_name:
- tests_bsim_bluetooth_host_adv_extended_prj_scanner_conf
extra_args:
CONF_FILE=prj_scanner.conf
bluetooth.host.adv.extended.ext_adv:
no_build: true
harness_config:
bsim_exe_name:
- tests_bsim_bluetooth_host_adv_extended_prj_advertiser_conf
- tests_bsim_bluetooth_host_adv_extended_prj_scanner_conf
bsim_test_ids:
- ext_adv_advertiser
- ext_adv_scanner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's interesting that you have a test here that is a build-only and one that is a no-build.

What's preventing this from just being a test that is being built and run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each test here requires two different executables, one for scanner and one for advertising. Moving them to a separate build only test allows to build them only once and use the executables in subsequent tests without rebuilding.

Comment on lines 19 to 30
bluetooth.host.id.settings_dut2:
no_build: true
harness_config:
bsim_exe_name: tests_bsim_bluetooth_host_id_settings_prj_conf
bsim_test_ids:
- dut2
bsim_options:
- -flash_rm
bluetooth.host.id.settings_dut1:
no_build: true
harness_config:
bsim_test_ids:
- dut1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a dependency between these tests that's not expressed here. The dut1 test leaves behind a file that is needed by the dut2 test.

I discussed this with @theob-pro. The way of least resistance is to make harness_config be a list of simulations that share simulation id and working directory and execute in sequence.

  bluetooth.host.id.settings:
    no_build: true
    harness_config:
      - bsim_test_ids:
          - dut1
      - bsim_test_ids:
          - dut2
        bsim_options:
          - -flash_rm

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting one. That really goes against the grain of testing when you make a test that relies on the result of a previous test. Ideally, it shouldn't be. So best solution is to redesign this test in my opinion, and not do anything like that again. As it stands now, the test will pass if it's run as a bunch (pipeline queue is lifo, hence reverse order here in testcase.yaml). What I really don't want to do, is to add this very special case to the framework.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can say from the mesh perspective. The problem is that there is currently (unless I'm mistaken) no way to restart a bsim executable inside the test (basically the executable itself). For the test cases where we test that mesh correctly restores data from a persistent storage, we need to simulate a device reboot. And this is achieved by running several tests in a row using same simulation id:

# SKIP=(persistence_provisioning_data_save)
overlay=overlay_pst_conf
RunTestFlash mesh_pst_prov_data_check persistence_provisioning_data_save -flash_erase
# SKIP=(persistence_provisioning_data_load)
overlay=overlay_pst_conf
RunTestFlash mesh_pst_prov_data_check persistence_provisioning_data_load -flash_rm
overlay="overlay_pst_conf_overlay_ss_conf_overlay_psa_conf"
RunTestFlash mesh_pst_prov_data_check_psa persistence_provisioning_data_save -flash_erase
overlay="overlay_pst_conf_overlay_ss_conf_overlay_psa_conf"
RunTestFlash mesh_pst_prov_data_check_psa persistence_provisioning_data_load -flash_rm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That really goes against the grain of testing when you make a test that relies on the result of a previous test. Ideally, it shouldn't be. So best solution is to redesign this test in my opinion, and not do anything like that again

I think there is a bit of a misunderstanding here. These tests where something is run generating flash content, and then something else is run which uses that flash content, are not different tests. They are just sub-steps of the same test.
Meaning, many tests consists of just running a set of devices where all connect to one phy. But that is just the typical way, not the only way.
The tests @alwa-nordic refers to, run 2 or 3 sets of devices with phys in sequence (the whole sequence is one test). But we also have tests in tree running with the EDTT; and tests where only some of the devices executables connect to the Phy.
And that is in tree. Out of tree, we have tests that run postprocessing scripts on the simulation results and other programs to check what has happened.

One option would be to say that, by now, this twister harness does not support running those types of tests, but overall we need to keep supporting them.

files = glob.glob(os.path.join(self._bsim_out_path, '*.log'))
# files += glob.glob(os.path.join(self._bsim_out_path, '*.bin'))
try:
for file in [f for f in files if os.path.getctime(f) > self._start_time]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that we are using ctime here. Won't this pick up files from parallel runs? Is there no better way to select files for cleaning?


for t in threads:
t.join(timeout=timeout)
except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use a naked except Exception: clause, you might end up catching exceptions other than the ones you expect to catch. This can hide bugs or make it harder to debug programs when unrelated errors are hidden.

https://pylint.readthedocs.io/en/v3.3.3/user_guide/messages/warning/broad-exception-caught.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understandable, but the purpose here is not to crash the whole session regardless of what happens, and use the status to just fail the test, and allow the session of move over to the next one.

[os.path.join(self._bsim_out_path, exe_name) for exe_name in exe_names]

def clean_exes(self):
self._set_start_time()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does clean_exes call _set_start_time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's essentially the first actual harness function that is executed, at least for tests that are compiling.

self._set_start_time()

try:
for exe_path in [self._get_exe_path(i) for i in range(len(self._exe_paths))]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for exe_path in [self._get_exe_path(i) for i in range(len(self._exe_paths))]:
for exe_path in self._exe_paths:

Comment on lines 183 to 213
"bsim_test_ids":
type: seq
required: false
sequence:
- type: str
"bsim_verbosity":
type: int
required: false
"bsim_sim_length":
type: float
required: false
"bsim_options":
type: seq
required: false
sequence:
- type: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate we cannot use the schema required field because the real type of harness_config is dependent on harness. What do you think about creating a bsim container field under harness_config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very fond of it as well, but that's the current design, and I don't want to bloat the scope further.

if new_exe_name:
new_exe_name = f'bs_{self.instance.platform.name}_{new_exe_name}'
else:
new_exe_name = f'bs_{self.instance.name}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between self.instance.name and self.instance.platform.name?

Comment on lines 978 to 979
if self._exe_path:
return self._exe_path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standard library has functools.cached_property that could be used here. I'll leave it up to you if you want to use it.

@nashif nashif assigned nashif and unassigned aescolar Mar 7, 2025
Bsim harness now can also be used to run tests.

Signed-off-by: Artur Dobrynin <[email protected]>
BSIM tests might be reusing the same binary files, so it makes
sense to be able to split tests into build-only, which will
only build and copy executable files, and run only, which will
run them whenever needed. This requires more sequential pipeline
to avoid race conitions.

Signed-off-by: Artur Dobrynin <[email protected]>
Adding functionality to split harness_config into common and test-case
specific part, which are joined together for any given test.

Signed-off-by: Artur Dobrynin <[email protected]>
Adding Bsim phy-specific options to harness config. Clean up leftover log
files. Adding option to expand extra args with individual values for
different simulated devices. Adding random seed value to every sim device.

Signed-off-by: Artur Dobrynin <[email protected]>
Changing bsim_exe_name values to a list due to respective schema changes.

Signed-off-by: Artur Dobrynin <[email protected]>
Adding necessary changes to testcase.yaml files in all Bsim host tests
to make them directly runnable by twister's bsim harness.

Signed-off-by: Artur Dobrynin <[email protected]>
Shell scripts are no longer needed since twister can be used to build
and run host bsim tests.

Signed-off-by: Artur Dobrynin <[email protected]>
Increasing test timeouts for better margins. Removing missed obsolete
shell running files.

Signed-off-by: Artur Dobrynin <[email protected]>
Chaning timeouts in testcase.yaml files to be the same as they were
in the shell running scripts.

Signed-off-by: Artur Dobrynin <[email protected]>
Fixing minor compliance errors and issues popping up after rebase.
Fixing unit tests.

Signed-off-by: Artur Dobrynin <[email protected]>
With added support of running bsim tests with Twister, separate
building command is no longer need as tests can be build and
run with single command.

Signed-off-by: Artur Dobrynin <[email protected]>
Changed bsim test options in a way that allows explicit configuration
of simulated devices, including names (test ids), options and exe
names, while still retaining common options and exe name. Removed
random seed generation for devices. Minor review updates.

Signed-off-by: Artur Dobrynin <[email protected]>
@ardo-nordic ardo-nordic force-pushed the run_bsim_with_twister branch from 0dd6198 to a2b3205 Compare March 10, 2025 13:01
Bsim options schema was updated with explicit individual device
configuration, thus updating all test config files in host.

Signed-off-by: Artur Dobrynin <[email protected]>
@ardo-nordic ardo-nordic force-pushed the run_bsim_with_twister branch from a2b3205 to 03cc943 Compare March 10, 2025 13:14
Updating bsim harness documentation after adding functionality to
run BabbleSim tests.

Signed-off-by: Artur Dobrynin <[email protected]>
@zephyrbot zephyrbot added the area: Testsuite Testsuite label Mar 10, 2025
New bsim test that was added upstream needed to be converted
to be run with Bsim harness.

Signed-off-by: Artur Dobrynin <[email protected]>
exe_names[exe_name] = replacer(f'bs_{self.instance.platform.name}_{exe_name}')

if not exe_names:
exe_names[None] = f'bs_{replacer(self.instance.name)}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the goal of doing that? Why None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be an empty string, doesn't matter. That's handling of a very common case when there is one binary file listed as 'bsim_exe_name' and doesn't have individual exes for bsim devices.

@github-actions
Copy link

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label May 14, 2025
@github-actions github-actions bot closed this May 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants