Fix: Could not save multiple execution records for the same execution#226
Fix: Could not save multiple execution records for the same execution#226
Conversation
This allows VM publishers to specify other volumes to mount on the VM. Volumes must be formatted as SquashFS partitions.
These builds run the entrypoints from the example example_fastapi_2 to check that the code is working properly.
Warning: Concurrent access to the same partition is not prevented
This should solve issues regarding concurrent access to the same partition
…eached This exception can be used to diagnose when the init system may have crashed and should be investigated.
`self.__annotations__` did not include attributes with implicit type annotation (the type of the default value), and those were therefore not displayed when using `--print-settings`
Firecracker supports Linux kernels 4.20 and 5.10. This updates the Linux kernel used to 5.10. The configuration is based on the Firecracker config [1] on which `make menuconfig` is run to disable the following fields: CONFIG_INPUT_KEYBOARD CONFIG_INPUT_MISC CONFIG_INPUT_FF_MEMLESS and CONFIG_SERIO. [1] https://github.com/firecracker-microvm/firecracker/blob/0fa080b137fd29e5bcd95073473b0a57c3868d86/resources/guest_configs/microvm-kernel-x86_64-5.10.config
Subscribers did not get triggered on 'item_hash' from message with a 'ref'.
# Problem: The endpoint '/status/check/fastapi' is monitored by multiple operators and therefore called very frequently. The crash handling check results in a new virtual machine being started at every call, and memory leaks (likely pipes accumulating) can cause the supervisor to crash after some time with 'Too many open files'. # Solution: In the short term, avoid calling the crash check in the monitored endpoint. In the longer term, investigate and fix the accumulating file descriptors.
Due to a change in GitHub Actions
Changed to make a single command working for both Linux/macOs
This would have been the source of many duplicate asyncio tasks.
|
|
||
| uuid = Column(String, primary_key=True) | ||
|
|
||
| execution_uuid = Column(String, nullable=False) |
There was a problem hiding this comment.
You recreate the DB on startup, right? Otherwise adding a column will crash.
There was a problem hiding this comment.
Indeed, the database would require a migration.
|
Related to #235 |
|
Ah right, this is the example I was looking for in #235. Why is it needed to have multiple ExecutionRecords for a single execution? |
foxpatch-aleph
left a comment
There was a problem hiding this comment.
This is a large refactoring PR that bundles many new features (firecracker config-file approach, guest API, metrics, pub/sub, reactor, status checks) with the stated fix for duplicate execution record UUIDs. The fix itself is correct — using uuid.uuid4() as the primary key while keeping execution_uuid as a separate reference column prevents primary-key conflicts on repeated saves for the same execution. However, several issues warrant attention before merging: an uninitialized attribute that causes AttributeError on teardown, a shell-injection risk, a hardcoded program hash that will crash the supervisor on startup in production, and a complete absence of tests for all the new code paths.
firecracker/microvm.py (line 67): _unix_socket: Server is declared as a class annotation without a default value. If teardown() is called before wait_for_init() (e.g., when start() raises), the check if self._unix_socket: in teardown() will raise AttributeError instead of silently skipping. Initialize it to None in __init__ and annotate it as Optional[Server] = None.
vm_supervisor/vm/firecracker_microvm.py (line 348): subprocess.run(f"chown jailman:jailman {vsock_path}", shell=True, check=True) uses shell=True with a variable path. While vsock_path is currently derived from a controlled template (/tmp/v.sock_53), shell=True is unnecessary and brittle. Prefer the list form: subprocess.run(["chown", "jailman:jailman", vsock_path], check=True) which avoids any shell-injection risk if the path construction ever changes.
vm_supervisor/tasks.py (line 116): A hardcoded message hash (cad11970efe9b7478300fd04d7cc91c646ca0a792b9cc718650f86e1ccfac73e) is fetched unconditionally on every supervisor startup. If this message is unavailable (network outage, hash changes, or test environment), the startup task will fail. This also couples every deployment to one specific program. Either make this configurable via settings, guard it with a try/except so that failure is non-fatal, or remove it and document how programs subscribe to message events.
vm_supervisor/vm/firecracker_microvm.py (line 255): self.guest_api_process._popen accesses a private attribute of multiprocessing.Process. This relies on CPython internals and may break across Python versions. Use the public API instead: self.guest_api_process.is_alive() to check whether the process is still running before terminating it.
vm_supervisor/metrics.py (line 1): The module-level Session variable is set only when setup_engine() is called. save_record and get_execution_records will raise NameError if called before setup_engine(). Consider raising a clear error or using a lazy initialisation pattern (e.g., a module-level Optional[sessionmaker] = None with an explicit guard).
vm_supervisor/models.py (line 195): The record_usage method duplicates almost identical save_record(ExecutionRecord(...)) blocks — one for when process info is available, one when it is not. The only difference is the cpu_time_* and io_* fields. Extract a single call with cpu_time_user=pid_info["process"]["cpu_times"].user if pid_info else None etc. to eliminate the duplication.
vm_supervisor/views.py (line 95): The secret token is accepted in the URL query string (?token=...) and echoed into a cookie without httponly or secure flags. Query parameters appear in server logs and browser history, which leaks the token. Consider accepting the token only via the cookie on subsequent requests, and set response.cookies['token']['httponly'] = True and response.cookies['token']['secure'] = True for production use.
examples/example_django/example_django/settings.py (line 28): ALLOWED_HOSTS contains "vm.demo.okeso.fr" — a specific developer domain that appears to have been left in accidentally. Example files should use example.org or a placeholder, not real hostnames. The hardcoded SECRET_KEY is acceptable here given the django-insecure- prefix, but should have a comment clarifying it must be replaced in any real deployment.
vm_supervisor/run.py (line 1): No tests exist for any of the new run.py, metrics.py, pool.py, pubsub.py, reactor.py, or tasks.py modules. Given that this PR introduces the core VM lifecycle, metrics collection, and event-driven execution paths, at minimum unit tests for VmPool, VmExecution.record_usage, PubSub, and Reactor.trigger should accompany these changes.
vm_supervisor/metrics.py (line 47): The uuid primary key column is a plain String. The comment in the PR title mentions fixing duplicate-record saves by switching from execution_uuid to a new uuid.uuid4() per save. This is the correct fix. However, note that uuid.uuid1() (used for VmExecution.uuid) embeds the MAC address and timestamp, which may be a privacy consideration on multi-tenant nodes. Consider uuid.uuid4() there as well, or document the rationale.
No description provided.