ci: Add remaining passing evergreen rdk unit tests#9216
ci: Add remaining passing evergreen rdk unit tests#9216oxve wants to merge 10 commits intoyoutube:mainfrom
Conversation
🤖 Gemini Suggested Commit Message💡 Pro Tips for a Better Commit Message:
|
There was a problem hiding this comment.
Code Review
This pull request adds more unit tests for the eg-rdk target, which is a great step towards improving test coverage. The changes to the CI workflow and test configurations are logical. However, I found a critical issue in the new script starboard/rdk/generate_runfile.py which is completely broken as written. I've provided a detailed comment with a full code suggestion to fix it.
|
/gemini-review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
The pull request enables more unit tests for the evergreen RDK target by adding new test executables and filter files for known failures. It also introduces a new RDK-specific runfile generator to handle platform-specific setup, and refactors a GitHub Action workflow for better clarity.
My review focuses on the new runfile generator script (starboard/rdk/generate_runfile.py), where I've identified a critical issue with resource cleanup, a high-severity issue with ASAN options that could mask memory errors, and a medium-severity issue regarding a fragile time.sleep. The other changes in CI configuration and test lists appear correct and improve the test infrastructure.
| try: | ||
| result = subprocess.run(command, check=False) | ||
| sys.exit(result.returncode) | ||
| except subprocess.CalledProcessError: | ||
| # A subprocess failed, so don't log the python traceback. | ||
| raise SystemExit(1) | ||
| except Exception as e: | ||
| print("An unexpected error occurred: " + str(e), file=sys.stderr) | ||
| sys.exit(1) | ||
|
|
||
| subprocess.run(['rdkDisplay', 'remove']) |
There was a problem hiding this comment.
The subprocess.run(['rdkDisplay', 'remove']) call on line 72 is unreachable because sys.exit() is called within the try block on line 64, which terminates the script prematurely. This will lead to leaked display resources on the test device.
Additionally, the except subprocess.CalledProcessError: block (lines 65-67) is dead code, as this exception is only raised when check=True is used with subprocess.run().
To ensure cleanup always runs and to remove the dead code, the logic should be restructured using a try...finally block.
| try: | |
| result = subprocess.run(command, check=False) | |
| sys.exit(result.returncode) | |
| except subprocess.CalledProcessError: | |
| # A subprocess failed, so don't log the python traceback. | |
| raise SystemExit(1) | |
| except Exception as e: | |
| print("An unexpected error occurred: " + str(e), file=sys.stderr) | |
| sys.exit(1) | |
| subprocess.run(['rdkDisplay', 'remove']) | |
| return_code = 1 | |
| try: | |
| result = subprocess.run(command, check=False) | |
| return_code = result.returncode | |
| except Exception as e: | |
| print("An unexpected error occurred: " + str(e), file=sys.stderr) | |
| # return_code is already 1 | |
| finally: | |
| subprocess.run(['rdkDisplay', 'remove'], check=False) | |
| sys.exit(return_code) |
| # TODO: Move? | ||
| "ASAN_OPTIONS": "exitcode=0", |
There was a problem hiding this comment.
Setting ASAN_OPTIONS to exitcode=0 will cause the process to exit with code 0 even if AddressSanitizer detects a memory error. This can mask critical bugs and cause failing tests to be reported as passing. This option should be removed to ensure that memory errors are properly reported as test failures. The associated TODO comment suggests uncertainty about this option, reinforcing that it should likely be removed.
There was a problem hiding this comment.
lol , might be good to have a proper bug assovciated with this ( more formal todo)
| if subprocess.run(['pgrep', 'westeros-init'], capture_output=True).returncode != 0: | ||
| subprocess.Popen(['/usr/bin/westeros-init']) | ||
| import time | ||
| time.sleep(2) |
There was a problem hiding this comment.
Using time.sleep(2) to wait for westeros-init to start can lead to flaky tests. The 2-second delay may not be sufficient on a heavily loaded system, causing subsequent commands to fail. A more robust approach would be to implement a polling mechanism that actively checks if the westeros-init service is ready before proceeding, for example by checking for a PID file or a listening socket.
| if subprocess.run(['pgrep', 'westeros-init'], capture_output=True).returncode != 0: | ||
| subprocess.Popen(['/usr/bin/westeros-init']) | ||
| import time | ||
| time.sleep(2) |
There was a problem hiding this comment.
can you add this as a variable and write a comment about why "2) was selcted ?
| "ASAN_OPTIONS": "exitcode=0", | ||
| }}) | ||
|
|
||
| # TODO: Remove this when westeros-init is started by default. |
There was a problem hiding this comment.
this should be tracked in a partner bug
| import subprocess | ||
| import sys | ||
|
|
||
| # TODO: Remove when lab devices are updated to RDK_V6_20251218+. |
There was a problem hiding this comment.
good to track in actual buganizer issue
|
|
||
| action("${original_target_name}_runfile_gen") { | ||
| script = "//starboard/build/util/generate_runfile.py" | ||
| # TODO: Investigate why starboard_target_platform is not defined for evergreen-arm. |
There was a problem hiding this comment.
can you add it to rdk's args.gn ?
bit surprised this didnt trigger: https://source.corp.google.com/h/github/youtube/cobalt/+/main:starboard/build/platform_path.gni;l=70?q=starboard_target_platform%20%3D&ss=h%2Fgithub%2Fyoutube%2Fcobalt%2F%2B%2Frefs%2Fheads%2Fmain
There was a problem hiding this comment.
this seems wrong , this file is suppsoed to be platform agnostic. Is there a better way to do this inside the RDK code ?
Enable remaining passing unit tests and add test filters for:
Also add an RDK-specific runfile generator. The new generator sets
environment variables and initializes the display system to work around
EGL init issues.
Bug: 484121488