Fix: vm-connector used obsolete software by hoh · Pull Request #594 · aleph-im/aleph-vm

hoh · 2024-04-10T13:25:47Z

The vm-connector service used to run with obsolete versions of the aleph-sdk and Python.

Solution: Use the latest the version of the SDK and pin the version of dependencies.

Solution: Modify hatch configuration to have the environmnent properly set up using the virtual environment builtin module https://hatch.pypa.io/1.3/plugins/environment/virtual/

It's a problem surfaced by another The visible problem was that the new exection test were hanging, inside the runtime, during the import at the line from aleph.sdk.chains.remote import RemoteAccount after some more investigative work, it was pin pointed to an inner import of eth_utils module (specifically eth_utils.network ) Second problem that made the first visible: in the runtime the pre-compiled bytecode, created during runtime creation in create_disk_image.sh was not used, which made the import of module slower. This surfaced the first problem. The cause of that second problem was that the init1.py code which run the user caude was not launched with the same optimization level as the pre-compiled bytecode and thus recompiled everything. (this is specified in the init1.py #! sheebang on the first line) Solution: Compile the bytecode with the same optimisation level (-o 2 ) as during run We haven't found out yet why the eth_utils.network import hang when it is not precompiler. But this fix the test hanging issue

…asyncio' Fix: async_sessionmaker was introduced in sqlachemy 2.0, ensure we have at least this version otherwhise it was using a older system package

A frozen copy of the requirements.txt extracted from different systems was present in the repository but not used nor maintained.

github-actions · 2024-04-10T15:32:48Z

Failed to retrieve llama text: POST 504:

504 Gateway Time-out

The server didn't respond in time.

codecov · 2024-04-15T09:08:09Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 64.50%. Comparing base (eff82dc) to head (f8eab86).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #594   +/-   ##
=======================================
  Coverage   64.50%   64.50%           
=======================================
  Files          78       78           
  Lines        7088     7088           
  Branches      598      598           
=======================================
  Hits         4572     4572           
  Misses       2314     2314           
  Partials      202      202

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Internet connectivity checks by the diagnostic VM relied on a single URL. If that endpoint was down, the internet connectivity of the system was assumed to be down. Solution: Check connectivity to multiple endpoints in parallel.

'aleph program' now need an 'update' argument. Solution: Update makefile and documentation

Problem: could not start Instances from command line Problem happened when launching with --run-fake-instance Solution: Adapt to new VMPool API that take a loop Also fix benchmarks function

Fix: Solve last CORS errors raised cause by duplication of headers returned.

We published multiple changes to the diagnostic VM recently but none of these was released. This provides a new diagnostic VM, based on a new runtime [1], with fixes: - Reading messages with the newer SDK - Better handling of IPv6 detection errors - Two different tests for signing messages (local and remote) - aleph-message version was not specified - fetching a single message was not tested

Point to it in the documentation and removed duplicated information here

Solution: Start by adding some simple tests We don't test the full allocation and deallocation here. just auth

Co-authored-by: nesitor <amolinsdiaz@yahoo.es>

When executed using `bash`, the `create_disk_image` was interrupted by a Python REPL due to the `python -OO` command being surrounded by backquotes.

Problem: Running system testing on DigitalOcean for every push consumed a lot of resources and failed frequently. We now start to have integration testing using `pytest`, which provides a better confidence that things actually work. Solution: Only test on DO open Pull Requests and not every push. In the future, consider only running when merged on `main`.

This bumps their versions to - Kubo 0.23.0 -> 0.33.1 The list of changes regarding Kubo too large to be mentioned here, but I mostly expect performance improvements as the main API has not changed much.

Due to an error reading the denylist. The error was: ``` Error: constructing the node (see log for full detail): error walking /home/ipfs/.config/ipfs/denylists: lstat /home/ipfs/.config/ipfs/denylists: permission denied ``` The [documentation](https://specs.ipfs.tech/compact-denylist-format/) mentions that: > Implementations SHOULD look in /etc/ipfs/denylists/ and > $XDG_CONFIG_HOME/ipfs/denylists/ (default: ~/.config/ipfs/denylists) > for denylist files. I am not sure why this only failed on Ubuntu 22.04 and not Debian 12 or Ubuntu 24.04. My first assumption would be a difference in Systemd.

Needed for some crn

* Make CI a bit more resilient Do not run export log if cancelled or not setup Attempt to always do the proper clean up Print more debug information in case the droplet ipv4 cannot be parsed * CI: Add timeouts for runtime workflow * CI: Document the runtime workflow * CI: Increase timeout to bring up droplet * CI: pytest. Prevent "Ouput modules" step failure The dep were not always installed depending on where the failure was Since this a debug step we don't want it to fail Do not run it if the workflow was cancelled * CI: Merge Package workflow and Droplet workflow This allow to reuse the packages build in the previous workflow for the droplet test, reducing the number of package build from 12 to 3, in theses workflow. This: * fix the issue of package not being able to be built because of rate limitation on other resources. * Reduce the chances of random errors. * Reduce the total CI times requirement. * Do not attempt to run the droplet test if the package building phase fail. (Previously all the package build were launched in parallel which mean they all failed unecessaryely) * Make it less costly and faster to run the failed jobs With theses chance the number of CI failure reduce greatly. And the cause of failure is more clear * CI: Wait till an ipv4 on droplet before proceeding It might be a change in the Digital Ocean API but it return before the network is setup and with empty setup info it didn't seem to occur befor so it might be an API change from their part * CI: Ensure we reuse the previously calculated IP * CI: Ensure we use the public IP v4 of the Droplet Digital Ocean droplet always have a private IP in addition to the public one. The API return them in random order so the CI job occasionally tried to use the internal one and failed. * CI: Merge the runtime test workflow with the package workflow Same operation as moving the Droplet workflow, we reuse the already build package. The resilience and speed advantage are the sames and add up. * CI: Remove the deleted workflow, update doc Rename the main workflow field Document more * improve doc * CI: Do not stop if Export aleph logs command fail * CI: Print commands run in install aleph step for debug * CI: Do not fetch whole repository where not needed In the run_on_droplet job we only require the .github/scripts dir * CI: Fix apt broken progress bar output

Previous way was preventing them from working inside tests

set up scaffolding to test reservation system

Jira: ALEPH-421

Part of Jira ALEPH-421

- Add supports_x_vga flag to QemuGPU model - Add system check to detect if a GPU supports x-vga feature - Only add x-vga parameter to QEMU command when supported - Fall back gracefully when professional GPUs don't support x-vga 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Test QemuVM device args generation with and without x-vga support - Test x-vga detection process in AlephQemuInstance - Test error handling for GPU x-vga detection - Test configuration flow with x-vga support detection 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Simplify x-vga support detection by using the GPU device class from lspci - Class 0300 (VGA compatible controller) supports x-vga - Class 0302 (3D controller) does not support x-vga - Remove the complex subprocess-based detection method in favor of this simpler approach - Update unit tests to test the device class-based detection This approach is faster, more reliable, and doesn't require running test QEMU commands. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

The test was trying to set attributes on HostGPU objects, but the HostGPU class doesn't have the supports_x_vga field. Changed to use GpuDevice which has the required field to fix the test. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

When aleph-vm-supervisor restarts, it properly loads existing VM interfaces but doesn't re-add their IPv6 ranges to the ndppd proxy configuration. This change explicitly re-adds ndp_proxy rules for existing interfaces during VM recovery. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

…t ndppd service for every VM.

of vm when calculate available disk space Jira: ALEPH-420 This will help the scheduler scheduler more properly the available resource on CRN Alternative approach to #780 as Sparse file were not properly created.

Don't call pyaleph if there is no execution for user Remove debug executions

The vm-connector service used to run with obsolete versions of the aleph-sdk and Python. Solution: Use the latest the version of the SDK and pin the version of dependencies.

olethanh and others added 5 commits April 10, 2024 17:25

Problem: hatch envs needed manual manipulation for testing

53b52b6

Solution: Modify hatch configuration to have the environmnent properly set up using the virtual environment builtin module https://hatch.pypa.io/1.3/plugins/environment/virtual/

Fix: Missing comments in workflow

64af5a1

Problem cannot import name 'async_sessionmaker' from 'sqlalchemy.ext.…

fe2e74f

…asyncio' Fix: async_sessionmaker was introduced in sqlachemy 2.0, ensure we have at least this version otherwhise it was using a older system package

Cleanup: Frozen requirements were not maintained

0f77070

A frozen copy of the requirements.txt extracted from different systems was present in the repository but not used nor maintained.

hoh force-pushed the hoh-update-vm-connector branch from cc6f432 to 6b1781f Compare April 15, 2024 09:03

nesitor and others added 22 commits April 16, 2024 09:27

Fix: Prevent diagnostic VM to fail if the ipv6 or ipv4 raises a Timeout.

57695a1

Fix: Solved CORS issues on PAYG creation.

8bbd65b

Fix: Internet diagnostic due to single endpoint

c74ed5a

Internet connectivity checks by the diagnostic VM relied on a single URL. If that endpoint was down, the internet connectivity of the system was assumed to be down. Solution: Check connectivity to multiple endpoints in parallel.

Problem: Makefile for publishing example were not working

b7d9202

'aleph program' now need an 'update' argument. Solution: Update makefile and documentation

Problem: could not start Instances from command line (#597)

54680ba

Problem: could not start Instances from command line Problem happened when launching with --run-fake-instance Solution: Adapt to new VMPool API that take a loop Also fix benchmarks function

Solve last CORS issues about duplicated headers (#604)

ab79b77

Fix: Solve last CORS errors raised cause by duplication of headers returned.

Fix: Pytest did not test legacy diagnostic

c7db4ef

Installation documentation was moved to aleph doc

8318c43

Point to it in the documentation and removed duplicated information here

CI check system usage endpoint

ab1b9cd

add unit test for system usage

ca8ae7b

Set up a fresh web_app for each test as required by aiohttp

d100df1

revert local compat change

4303223

Apparently CI also don't have matching arch

143112d

Fix test description

448d97d

Better model real usage in Droplet test

4725269

Add a test with mock

8ac2c1b

isort

47f1ab5

Problem: allocation endpoints was not tested

24f9339

Solution: Start by adding some simple tests We don't test the full allocation and deallocation here. just auth

Update docstring src/aleph/vm/orchestrator/views/__init__.py

9bd3c03

Co-authored-by: nesitor <amolinsdiaz@yahoo.es>

Fix: Backquote in shell script executed command

ffa2322

When executed using `bash`, the `create_disk_image` was interrupted by a Python REPL due to the `python -OO` command being surrounded by backquotes.

hoh added 2 commits March 12, 2025 09:27

Fix: Packages used obsolete Kubo

02db781

This bumps their versions to - Kubo 0.23.0 -> 0.33.1 The list of changes regarding Kubo too large to be mentioned here, but I mostly expect performance improvements as the main API has not changed much.

hoh force-pushed the hoh-update-vm-connector branch from 493337e to df70293 Compare March 12, 2025 08:43

hoh marked this pull request as ready for review March 12, 2025 09:46

hoh force-pushed the hoh-update-vm-connector branch from df70293 to 03b780b Compare March 12, 2025 09:46

olethanh and others added 23 commits March 18, 2025 15:38

Fix buggy settings aggregate cache implementation

7444c81

Allow /56 subnet for ipv6 pool (#761)

73013a2

Needed for some crn

Clean optional and list with hatch fmt

7060a90

hatch fmt on examples and tests/

4ae14ce

Mod: Transform pool.setup() to async func

57e73fb

Previous way was preventing them from working inside tests

Add testing for the listing of GPU

b356827

set up scaffolding to test reservation system

Add GPU reservation endpoints with tests

e67f2c3

Jira: ALEPH-421

Use reservation at vm creation

a14a78c

Part of Jira ALEPH-421

mod: Move Reservation class out of VMPool

bd20595

Fix: Solve code quality issues.

dff4df3

Fix: Solve issue on missing HostGPU field.

bc693c7

Fix: Update GPU x-vga support tests to use has_x_vga_support property

9f6351e

🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

Fix: Improved performance on restart avoiding to re-create and restar…

db49c0e

…t ndppd service for every VM.

Usage system: remove reserved usage for volumes

c45346c

of vm when calculate available disk space Jira: ALEPH-420 This will help the scheduler scheduler more properly the available resource on CRN Alternative approach to #780 as Sparse file were not properly created.

Task monitor_payments make less useless check

a98ad0e

Don't call pyaleph if there is no execution for user Remove debug executions

Do not run DB migratio if --do-not-run

eff82dc

Fix: vm-connector used obsolete software

f8eab86

The vm-connector service used to run with obsolete versions of the aleph-sdk and Python. Solution: Use the latest the version of the SDK and pin the version of dependencies.

olethanh force-pushed the hoh-update-vm-connector branch from 03b780b to f8eab86 Compare April 1, 2025 08:48

aliel force-pushed the main branch from a8aa651 to 7a0bff1 Compare February 18, 2026 09:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: vm-connector used obsolete software#594

Fix: vm-connector used obsolete software#594
hoh wants to merge 992 commits intomainfrom
hoh-update-vm-connector

hoh commented Apr 10, 2024

Uh oh!

github-actions Bot commented Apr 10, 2024

Uh oh!

codecov Bot commented Apr 15, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

hoh commented Apr 10, 2024

Uh oh!

github-actions Bot commented Apr 10, 2024

504 Gateway Time-out

Uh oh!

codecov Bot commented Apr 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

codecov Bot commented Apr 15, 2024 •

edited

Loading