Skip to content

Add CPU information (model, vendor, frequency) and memory details (clock, size, type) to API#544

Open
aliel wants to merge 794 commits intomainfrom
aliel-feature-add-cpu-mem-capability
Open

Add CPU information (model, vendor, frequency) and memory details (clock, size, type) to API#544
aliel wants to merge 794 commits intomainfrom
aliel-feature-add-cpu-mem-capability

Conversation

@aliel
Copy link
Copy Markdown
Member

@aliel aliel commented Feb 16, 2024

Update CPU and memory details by switching to lshw method instead of cpuinfo
Add CPU information (model, vendor, frequency) and memory details (clock, size, type) to API

olethanh and others added 30 commits November 17, 2023 09:31
After the vm proc and the related stdout and stderr were closed the task keep trying to read them, creating a busy loop
which wasted cpu.

Solution: end the task when the file descriptor is closed
This fix the error cascade that happened when opening the logs endpoint
not as websocket. This was problematic because it returned to the http
client a 500 error instead of the actual error code and message

To reproduce
* start aleph supervisor
* open in Firefox: http://localhost:4020/vm/3fc0aa9569da840c43e7bd2033c3c580abb46b007527d6d20f2d4e98e867f7af
* then in another tab http://localhost:4020/control/machine/3fc0aa9569da840c43e7bd2033c3c580abb46b007527d6d20f2d4e98e867f7af/logs

Error that happened before
2023-11-16 16:16:58,089 | ERROR | Error handling request
Traceback (most recent call last):
  File "/home/ubuntu/remote-aleph/src/aleph/vm/orchestrator/views/operator.py", line 147, in stream_logs
    await ws.prepare(request)
  File "/home/ubuntu/.virtualenvs/aleph-vm/lib/python3.10/site-packages/aiohttp/web_ws.py", line 137, in prepare
    protocol, writer = self._pre_start(request)
  File "/home/ubuntu/.virtualenvs/aleph-vm/lib/python3.10/site-packages/aiohttp/web_ws.py", line 232, in _pre_start
    headers, protocol, compress, notakeover = self._handshake(request)
  File "/home/ubuntu/.virtualenvs/aleph-vm/lib/python3.10/site-packages/aiohttp/web_ws.py", line 149, in _handshake
    raise HTTPBadRequest(
aiohttp.web_exceptions.HTTPBadRequest: Bad Request

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/remote-aleph/src/aleph/vm/orchestrator/views/operator.py", line 157, in stream_logs
    await ws.close()
  File "/home/ubuntu/.virtualenvs/aleph-vm/lib/python3.10/site-packages/aiohttp/web_ws.py", line 337, in close
    raise RuntimeError("Call .prepare() first")
RuntimeError: Call .prepare() first

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/.virtualenvs/aleph-vm/lib/python3.10/site-packages/aiohttp/web_protocol.py", line 433, in _handle_request
    resp = await request_handler(request)
  File "/home/ubuntu/.virtualenvs/aleph-vm/lib/python3.10/site-packages/aiohttp/web_app.py", line 504, in _handle
    resp = await handler(request)
  File "/home/ubuntu/.virtualenvs/aleph-vm/lib/python3.10/site-packages/aiohttp/web_middlewares.py", line 117, in impl
    return await handler(request)
  File "/home/ubuntu/remote-aleph/src/aleph/vm/orchestrator/supervisor.py", line 46, in server_version_middleware
    resp: web.StreamResponse = await handler(request)
  File "/home/ubuntu/remote-aleph/src/aleph/vm/orchestrator/views/operator.py", line 159, in stream_logs
    execution.vm.fvm.log_queues.remove(queue)
ValueError: list.remove(x): x not in list
Solution: Print a dot "." instead.
This made representation in the log like this
<aleph.vm.hypervisors.firecracker.microvm.MicroVM object at 0x7fa7f4f1ca90> 2092.023134 |V DEBUG | Init received msg

Solution : Add a __str__ method on MicroVM
Solution: Decouple VMType class to be outside orchestrator path.
…rocess.

Solution: Create an Instance CLI command tool to manage it.
Cancelling the expiration must be done after the execution becomes ready.

Solution: First wait for the execution to become ready before marking it as persistent and cancelling its expiration.
Problem: Checking the existence of the execution in one corouting and creating it in another could lead to an inconsistent state where two executions are starting for the same VM.

Solution: Check for the existence of the execution or register it on the pool inside the same coroutine, without any `await` between
Problem: The method `prepare()` of a VM could be launched multiple times in parallel.

Solution: Use a lock and skip the preparation if it has already been done.
Use renamed, latest release of rate-my-pr/difficulty
Solution: don't try to create the file that we won't use anyway when not using jailer

This is a regression introduced in #454
This uses pydantic classes in control endpoints.

The reboot functionnality of VMs is now implemented.

Authentication schemes are now fully used and functional.
* Change HTTP response to 403 where applicable

* Clarify is_token_still_valid()

* Fix typo
olethanh and others added 28 commits May 3, 2024 13:42
Solution: Start by adding some simple tests
We don't test the full allocation and deallocation here. just auth
Co-authored-by: nesitor <amolinsdiaz@yahoo.es>
When executed using `bash`, the `create_disk_image` was interrupted by a Python REPL due to the `python -OO` command being surrounded by backquotes.
Problem: Running system testing on DigitalOcean for every push consumed a lot of resources and failed frequently.

We now start to have integration testing using `pytest`, which provides a better confidence that things actually work.

Solution: Only test on DO open Pull Requests and not every push. In the future, consider only running when merged on `main`.
* Feature: Added automatic tests to check if QEmu runs.

* Fix: Added code quality fixes.

* Fix: Changed runtime generation script name.

* Fix: Solve conflicts with main branch.
This deploys the main branch automatically on the staging servers for system testing.
Problem: Many crawlers called URLs that do not exist on CRNs.

The current implementation raises an error when the hash of the VM cannot be found, which fills the logs on Sentry.

Solution: Return an HTTP Not Found status instead.
Use the standard jwcrypto package (apt install python3-jwcrypto) instead of the jwskate wrapper for ECDSA verification.
To use in the auth code for checking the owner.

Co-authored-by: Bonjour Internet <BjrInt@users.noreply.github.com>
Co-authored-by: Hugo Herter <git@hugoherter.com>
Merged with previous test_jwk. Retook one of the old tests
was causing compat problem with other package
Solution :reimplement their hex method to ensure it works on all version
restore eth-account version that was reverted by error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BLUE This PR is simple and straightforward.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants