-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Fix for debugging when trying to load to ram. #87423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for debugging when trying to load to ram. #87423
Conversation
| if runner_cls.capabilities().reset and '--no-reset' not in final_argv: | ||
| if runner_cls.capabilities().reset and '--no-reset' not in final_argv and len(board_image_count): | ||
| if board_image_count is not None: | ||
| reset = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nordicjm your PR #69748 made the initial change here with respect to board_image_count and the forced addition of reset. What we found is that the test here for not None is always true because board_image_count is instantiated but doesn't have any elements in it... so not None. This results in the runner always adding a reset to the gdb commands. I suspect, from your comment block above, that this wasn't your intention.
We didn't notice this change because in the normal debug situation, you wouldn't care if the gdb did reset, load, reset if the image was being put in flash. But if you are targeting an image to a RAM space, the reset, load, reset causes the bootrom to load the flash image preventing you from debugging the image you created for RAM loading/debugging.
In parallel to this PR being added, I just saw this other PR [#87138] (@VynDragon) that just merged to force the larger logic to NOT ignore the no-reset runner option. We could have done the same thing now that I see this but I'm still questioning the logic of "if board_image_count not None" when combined with the comment block about multiple images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
board_image_count is init in the common function, it should not not ever be none. As for why I put the none check in, I do not remember
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so in this particular case, the len(board_image_count) was 0 and we needed to have this check to keep the rest of your logic from doing another reset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nordicjm Do you have any more comments on this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the change here doesn't make sense, it should never have a len of 0 so you need to look into what is causing that, this change is not acceptable though without some in depth analysis showing there is a bug and given this code is in use and works fine on multiple boards by multiple vendors... I'm not seeing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nordicjm it is working because likely most platforms have no problem with back to back resets from the debugger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can disable reset so that's not an issue
removing block because lost track of this PR
cd5a1a8 to
edae439
Compare
edae439 to
add46b5
Compare
|
Re-assigned as I can't comment on the changes. |
|
After some time has passed, I noticed some context has been lost on this PR allow me to give a quick summary. This did not seem to work for Linkserver but I noticed it also did not work for jlink, this PR solves both issues. |
dea5d4f to
440b5fa
Compare
decsny
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am having a hard time understanding the commit message, and this PR looks like it is fixing 3 separate things, should it be 3 separate commits ?
You make a good point, I certainly can clean up the commit message and I will test to see if nothing breaks when I split these up, if not I can separate them into different commits. |
440b5fa to
ab87489
Compare
| # If the flash node points to ram, linserver treats | ||
| # the ram as inaccessible and does not flash. | ||
| gdb_cmd += ['-ex', 'set mem inaccessible-by-default off'] | ||
| gdb_cmd += ['-ex', 'monitor reset', '-ex', 'load'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the order of load, reset being reversed to reset, load?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to reset, we would want it to be before a load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I do see a typo I will fix in the next push if I don't get further comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to reset, we would want it to be before a load.
Yes, I see that is what you are doing, you are restating what it is that I said that am observing, my question was why. You should explain in the commit message also why you are reversing this order. I don't know why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the commit message, let me know if that helps!
72b66e7 to
b983d94
Compare
scripts/west_commands/run_common.py
Outdated
| # reset or not. If this is not specified in the board/soc file, leave it up to | ||
| # the runner's default configuration to decide if a reset should occur. | ||
| if runner_cls.capabilities().reset and '--no-reset' not in final_argv: | ||
| if runner_cls.capabilities().reset and '--no-reset' not in final_argv and len(board_image_count): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code change is unexplained and does not make sense so requires a valid justification for why this would ever be none, there is a --no-reset argument for west flash already that can be used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be completely fair this PR has nothing to do with me justifying that this value should be zero, I am only adding a check to ensure that this value is NOT zero. The fact is right now the value literally is zero during a flash and probably is for multiple platforms. The check only stops it from running the code intended for multi-image flashing. It's probably never been caught before since the mutli image code probably can handle a single image too. The thing that breaks right now and that is making this apparent is the additional reset that is added when going down this path for the "second" image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you read and applied https://docs.zephyrproject.org/latest/build/flashing/configuration.html ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing this out. I was not aware of the configuration pointed out by this documentation, but maybe I am not being clear enough on the issue as the documentation seems to be calling out a workaround for sysbuild. So explicitly just before this function now I have added a
just before this if statement occurs in the python.
I built then flash hello world using nrf5340dk/nrf5340/cpuapp and python hit the breakpoint and
a print out shows board_image_count == 0
Is this not an issue? I only bring up the nrf5340dk to show that I see this with multiple platforms.
I would not be surprised if this is happening to all the platforms in zephyr? But please let me know if I am
misunderstanding or missing some context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the issue? You are flashing with one image and without --no-reset and the runner supports reset, so it is doing a reset which is exactly what people would expect it to do. Can see the folow here:
diff --git a/scripts/west_commands/run_common.py b/scripts/west_commands/run_common.py
index b13ca502eb6..0022691a395 100644
--- a/scripts/west_commands/run_common.py
+++ b/scripts/west_commands/run_common.py
@@ -421,8 +421,13 @@ def do_run_common_image(command, user_args, user_runner_args, used_cmds,
# the board has enabled this functionality, check if the board should be
# reset or not. If this is not specified in the board/soc file, leave it up to
# the runner's default configuration to decide if a reset should occur.
+ print("aa")
+ print(len(board_image_count))
+ print(board_image_count)
if runner_cls.capabilities().reset and '--no-reset' not in final_argv:
+ print("bb")
if board_image_count is not None:
+ print("cc")
reset = True
for cmd in used_cmds:
which for a sysbuild project gives:
-- west flash: using runner nrfutil
aa
2
defaultdict(<class 'run_common.ImagesFlashed'>, {'nrf5340dk/nrf5340/cpunet': ImagesFlashed(flashed=1, total=1), 'nrf5340dk/nrf5340/cpuapp': ImagesFlashed(flashed=0, total=1)})
bb
cc
-- runners.nrfutil: reset after flashing requested
Using board 001050097030
-- runners.nrfutil: Flashing file: /tmp/bb/zephyr/samples/subsys/mgmt/mcumgr/smp_svr/_AA/merged_CPUNET.hex
-- runners.nrfutil: Erasing address ranges touched by firmware
-- runners.nrfutil: Programming image
-- runners.nrfutil: Verifying image
-- runners.nrfutil: Reset
-- runners.nrfutil: Board(s) with serial number(s) 1050097030 flashed successfully.
-- west flash: using runner nrfutil
aa
2
defaultdict(<class 'run_common.ImagesFlashed'>, {'nrf5340dk/nrf5340/cpunet': ImagesFlashed(flashed=1, total=1), 'nrf5340dk/nrf5340/cpuapp': ImagesFlashed(flashed=1, total=1)})
bb
cc
-- runners.nrfutil: reset after flashing requested
-- runners.nrfutil: Flashing file: /tmp/bb/zephyr/samples/subsys/mgmt/mcumgr/smp_svr/_AA/merged.hex
-- runners.nrfutil: Erasing address ranges touched by firmware
-- runners.nrfutil: Programming image
-- runners.nrfutil: Verifying image
-- runners.nrfutil: Reset
-- runners.nrfutil: Board(s) with serial number(s) 1050097030 flashed successfully.
and for a single image for nrf5340dk//cpunet gives:
-- west flash: rebuilding
ninja: no work to do.
-- west flash: using runner nrfutil
aa
0
defaultdict(<class 'run_common.ImagesFlashed'>, {})
bb
cc
-- runners.nrfutil: reset after flashing requested
Using board 001050097030
-- runners.nrfutil: Flashing file: /tmp/bb/zephyr/samples/hello_world/_AA/merged.hex
-- runners.nrfutil: Erasing address ranges touched by firmware
-- runners.nrfutil: Programming image
-- runners.nrfutil: Verifying image
-- runners.nrfutil: Reset
-- runners.nrfutil: Board(s) with serial number(s) 1050097030 flashed successfully.
and for the same as before but with --no-reset gives:
-- west flash: rebuilding
ninja: no work to do.
-- west flash: using runner nrfutil
aa
0
defaultdict(<class 'run_common.ImagesFlashed'>, {})
Using board 001050097030
-- runners.nrfutil: Flashing file: /tmp/bb/zephyr/samples/hello_world/_AA/merged.hex
-- runners.nrfutil: Erasing address ranges touched by firmware
-- runners.nrfutil: Programming image
-- runners.nrfutil: Verifying image
-- runners.nrfutil: Board(s) with serial number(s) 1050097030 flashed successfully.
so it works fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for confirming this on your side, so it seems board_image_count can be 0. This was confusing to me because here I read "it should never be zero". #87423 (comment)
So I started to worry that this might be an intended issue.
I have just tested with the specific case of adding '--no-reset' and this does seem to work for me. But this PR is specific to the case that is dependent of setting zephyr,flash to a ram sector. As a result, the user will have to manually have to add --no-reset as I am not sure yet if I can make this automatic using the yml files.
But as you pointed out, this will not need to depend on the len(board_image_count) so I am going to drop this change.
Thank you for working with me to try and figure out if this check is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made an update to add if(NOT SYSBUILD) in the board.cmake instead, gives the same effect I was trying to accomplish.
b983d94 to
317b0cf
Compare
| board_runner_args(jlink "--no-reset") | ||
| board_runner_args(linkserver "--no-reset") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 space indent for cmake, is also double space here between e.g. jlink and "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double check on this, you did mean for me to put two spaces to indent these lines correct? The rest of the file is using a single space but due to the suggestion I used double space here, I may need to update the rest of the file in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 space indent for cmake e.g.:
if(BLAH)
set()
1 space delimiter for cmake e.g.:
set(my_var "my_val")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, my recent update fixed this, but I may need to update the rest of the file in a different PR.
Updated board.cmake to check for single image builds to indicate rather an additional reset is needed for cases such as sysbuild. Signed-off-by: Emilio Benavente <[email protected]>
Remove cmake code that updated the linkserver memory view, however the values passed here are wrong and linkserver already includes the correct space e.g location = 0x0 && size = 0x00080000 Signed-off-by: Emilio Benavente <[email protected]>
Update gdb arguments to enable linkserver debugging when app does not reset after load. This is useful for when the app is loading to ram. In this case if gdb reset after loaded the ram would be flushed when debugging and there would be nothing no image in ram to debug. Signed-off-by: Emilio Benavente <[email protected]>
317b0cf to
9dc3b5d
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in my mimxrt1170_evk board rev A, west flash does not, but west debug works
((.venv) ) ubuntu@ubuntu-OptiPlex-7050:/home/shared/disk/zephyr_project/zephyr_test/zephyr/tests/drivers/flash/common$ LinkServer --version
LinkServer v25.3.31 [Build 31] [2025-03-26 14:12:20]
what I change is
diff --git a/boards/nxp/mimxrt1170_evk/mimxrt1170_evk_mimxrt1176_cm7.dts b/boards/nxp/mimxrt1170_evk/mimxrt1170_evk_mimxrt1176_cm7.dts
index f6216018957..71ef4f19443 100644
--- a/boards/nxp/mimxrt1170_evk/mimxrt1170_evk_mimxrt1176_cm7.dts
+++ b/boards/nxp/mimxrt1170_evk/mimxrt1170_evk_mimxrt1176_cm7.dts
@@ -26,7 +26,7 @@
zephyr,shell-uart = &lpuart1;
zephyr,canbus = &flexcan3;
zephyr,flash-controller = &is25wp128;
- zephyr,flash = &is25wp128;
+ zephyr,flash = &itcm;
zephyr,code-partition = &slot0_partition;
zephyr,uart-mcumgr = &lpuart1;
zephyr,cpu1-region = &ocram;
run with
west flash --runner linserver --probe=<linkserver id>
Nc: Debug protocol: SWD. RTCK: Disabled. Vector catch: Disabled.
Ns: Content of CoreSight Debug ROM(s):
Nc: RBASE E00FD000: CID B105100D PID 000008E88C ROM (type 0x1)
Nc: ROM 1 E00FE000: CID B105100D PID 04000BB4C8 ROM (type 0x1)
Nc: ROM 2 E00FF000: CID B105100D PID 04000BB4C7 ROM (type 0x1)
Nc: ROM 3 E000E000: CID B105E00D PID 04000BB00C Gen SCS (type 0x0)
Nc: ROM 3 E0001000: CID B105E00D PID 04000BB002 Gen DWT (type 0x0)
Nc: ROM 3 E0002000: CID B105E00D PID 04000BB00E Gen (type 0x0)
Nc: ROM 3 E0000000: CID B105E00D PID 04000BB001 Gen ITM (type 0x0)
Nc: ROM 2 E0041000: CID B105900D PID 04001BB975 CSt ARM ETMv4.0 type 0x13 Trace Source - Core
Nc: ROM 2 E0042000: CID B105900D PID 04004BB906 CSt type 0x14 Debug Control - Trigger, e.g. ECT
Nc: ROM 1 E0043000: CID B105900D PID 04001BB908 CSt CSTF type 0x12 Trace Link - Trace funnel/router
Nc: NXP: MIMXRT1176xxxxx
Nc: DAP stride is 1024 bytes (256 words)
Nc: Inspected v.2 External Flash Device on SPI using SFDP JEDEC ID MIMXRT1170_SFDP_QSPI.cfx
Nc: Image 'iMXRT1170_SFDP_FlexSPI1_A_QSPI Mar 25 2025 17:33:21'
Nc: Opening flash driver MIMXRT1170_SFDP_QSPI.cfx
Nc: Sending VECTRESET to run flash driver
Nc: Flash variant 'iMXRT1170_SFDP_FlexSPI1_A_QSPI Mar 25 2025 17:33:21' detected (16MB = 256*64K at 0x30000000)
Nc: Closing flash driver MIMXRT1170_SFDP_QSPI.cfx
Pc: ( 65) Chip Setup Complete
Pc: ( 70) License Check Complete
Nt: Loading 'zephyr.bin' Binary 0x00000000 len 0x109EC
Ed:05: File 'zephyr.bin' load failure: Ef(11). No flash configured at the specified address.
Pc: (100) Target Operation Failed
CRITICAL: Critical error
ERRMSG: Exception: Flash operation exited with code 1
FATAL ERROR: command exited with status 255: /usr/local/LinkServer/LinkServer flash --probe JTAQCQAR MIMXRT1176xxxxx:MIMXRT1170-EVK load --addr 0 /home/shared/disk/zephyr_project/zephyr_test/zephyr/tests/drivers/flash/common/build/zephyr/zephyr.bin
=== Traceback (enabled by -vvv):
Traceback (most recent call last):
File "/home/shared/disk/zephyr_project/.venv/lib/python3.12/site-packages/west/app/main.py", line 584, in run_command
self.run_extension(args.command, argv)
File "/home/shared/disk/zephyr_project/.venv/lib/python3.12/site-packages/west/app/main.py", line 739, in run_extension
self.cmd.run(args, unknown, self.topdir, manifest=self.manifest,
File "/home/shared/disk/zephyr_project/.venv/lib/python3.12/site-packages/west/commands.py", line 200, in run
self.do_run(args, unknown)
File "/home/shared/disk/zephyr_project/zephyr_test/zephyr/scripts/west_commands/flash.py", line 32, in do_run
do_run_common(self, my_args, runner_args, domain_file=domains_file)
File "/home/shared/disk/zephyr_project/zephyr_test/zephyr/scripts/west_commands/run_common.py", line 349, in do_run_common
prev_runner = do_run_common_image(command, user_args, user_runner_args, used_cmds,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/shared/disk/zephyr_project/zephyr_test/zephyr/scripts/west_commands/run_common.py", line 524, in do_run_common_image
runner.run(command_name)
File "/home/shared/disk/zephyr_project/zephyr_test/zephyr/scripts/west_commands/runners/core.py", line 743, in run
self.do_run(command, **kwargs)
File "/home/shared/disk/zephyr_project/zephyr_test/zephyr/scripts/west_commands/runners/linkserver.py", line 125, in do_run
self.flash(**kwargs)
File "/home/shared/disk/zephyr_project/zephyr_test/zephyr/scripts/west_commands/runners/linkserver.py", line 227, in flash
self.check_call(linkserver_cmd, **kwargs)
File "/home/shared/disk/zephyr_project/zephyr_test/zephyr/scripts/west_commands/runners/core.py", line 890, in check_call
subprocess.check_call(cmd, **kwargs)
File "/usr/lib/python3.12/subprocess.py", line 413, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/local/LinkServer/LinkServer', 'flash', '--probe', 'JTAQCQAR', 'MIMXRT1176xxxxx:MIMXRT1170-EVK', 'load', '--addr', '0', '/home/shared/disk/zephyr_project/zephyr_test/zephyr/tests/drivers/flash/common/build/zephyr/zephyr.bin']' returned non-zero exit status 255.
but if I using
west debug --runner linkserver --probe=<linkserver id>
it is OK
|
@hakehuang previous comment raises a good point. This behavior will happen in main too when changing the flash node |
|
@nordicjm looking for your input on this for 4.2 inclusion |
Is fixing a problem with a vendor's flash programming so could go in |





Fixes #87422
When users try to debug while flashing into ram,
JLink/Linkserver would fail to flash straight to ram. Each with their own issues. Both would reset after load which would flush the ram and point the PC to flash. Linkserver treated the ram as inaccessible, and the added memory region in board.cmake has the incorrect size for the region and is already included inside linkserver.