Skip to content

Switch to using the local IPFS Kubo daemon for IPFS files#779

Open
hoh wants to merge 973 commits intomainfrom
hoh-use-kubo
Open

Switch to using the local IPFS Kubo daemon for IPFS files#779
hoh wants to merge 973 commits intomainfrom
hoh-use-kubo

Conversation

@hoh
Copy link
Copy Markdown
Member

@hoh hoh commented Mar 12, 2025

IPFS Kubo is now present on all CRNs but unused.

Downloading files from this service would be more reliable than using
third party and provide content integrity checks.

This bypasses the vm-connector when fetching resources from IPFS.

Related Jira tickets : ALEPH-36

Self proofreading checklist

  • The new code clear, easy to read and well commented.
  • New code does not duplicate the functions of builtin or popular libraries.
  • An LLM was used to review the new code and look for simplifications.
  • New classes and functions contain docstrings explaining what they provide.
  • All new code is covered by relevant tests.
  • Documentation has been updated regarding these changes.
  • Dependencies update in the project.toml have been mirrored in the Debian package build script packaging/Makefile

Changes

  • Fixes how the configuration of Kubo is applied (previous version did not work anymore).
  • Adds setting IPFS_SERVER in aleph-vm.
  • Refactors relevant code to download from the IPFS gateway instead of the vm-connector

How to test

Stop the services, purge all application data, then restart the services.

sudo systemctl stop aleph-vm-supervisor
sudo systemctl stop ipfs
sudo rm -fr /var/lib/aleph/vm/ /var/cache/aleph/ /var/lib/ipfs/
sudo systemctl start ipfs
sudo systemctl start aleph-vm-supervisor

Then check that the index page of the CRN.

hoh and others added 30 commits April 9, 2024 14:17
Use f-strings, black, ...
The result sent was always a string containing encoded JSON.

Solution: Use the argument `text` instead of `data` to pass the data as a UTF-8 string.
Problem: Posting invalid item hashes to the `update_allocations` API failed far after the initial validation, with:

 ```   File "/opt/aleph-vm/aleph_message/models/item_hash.py", line 25, in from_hash
     raise UnknownHashError(f"Could not determine hash type: '{item_hash}'")
aleph_message.exceptions.UnknownHashError: Could not determine hash type: '${{ matrix.check_vm.item_hash }}'
```

Solution: Require ItemHash as part of the `Allocation` schema, in order to fail validation earlier, in the section dedicated to the validation of the uploaded data.
Solution: Add a test that creates a new VM execution and checks that it starts properly.
Solution: Accept a `pathlib.Path` as argument and convert as string.
Co-authored-by: Olivier Le Thanh Duong <olivier@lethanh.be>
This tests that a VM from the aleph.im network can be downloaded and launched.

Co-authored-by: ajin <ajin@Aleph.ajin.virtualbox.org>
Solution: Modify hatch configuration to have the environmnent properly set up using the virtual environment builtin module

https://hatch.pypa.io/1.3/plugins/environment/virtual/
It's a problem surfaced by another

The visible problem was that the new exection test were hanging, inside the runtime, during the import at the line
 from aleph.sdk.chains.remote import RemoteAccount

after some more investigative work, it was pin pointed to an inner import of eth_utils module (specifically eth_utils.network )

Second problem that made the first visible: in the runtime the pre-compiled bytecode, created during runtime creation in create_disk_image.sh was not used, which made the import of module slower. This surfaced the first problem. The cause of that second problem was that the init1.py code which run the user caude was not launched with the same optimization level as the pre-compiled bytecode and thus recompiled everything. (this is specified in the init1.py #! sheebang on the first line)

Solution: Compile the bytecode with the same optimisation level (-o 2 )
as during run

We haven't found out yet why the eth_utils.network import hang when it
is not precompiler. But this fix the test hanging issue
…asyncio'

Fix: async_sessionmaker was introduced in sqlachemy 2.0, ensure we have at least this version
otherwhise it was using a older system package
A frozen copy of the requirements.txt extracted from different systems
was present in the repository but not used nor maintained.
Internet connectivity checks by the diagnostic VM relied on a single URL. If that endpoint was down, the internet connectivity of the system was assumed to be down.

Solution: Check connectivity to multiple endpoints in parallel.
'aleph program' now need an 'update' argument.
Solution: Update makefile and documentation
Problem: could not start Instances from command line

Problem  happened when launching with --run-fake-instance
Solution: Adapt to new VMPool API that take a loop
Also fix benchmarks function
Fix: Solve last CORS errors raised cause by duplication of headers returned.
olethanh and others added 16 commits February 21, 2025 15:04
This fix issue reported by node owner where if they enable a ipv6 and it's not working, it also break the diagnostic /internet check
Error in CI https://github.com/aleph-im/aleph-vm/actions/runs/13458482852/job/37788353003

 pip3 install --progress-bar off --target ./aleph-vm/opt/aleph-vm/ 'aleph-message==0.6' 'eth-account==0.10' 'sentry-sdk==1.31.0' 'qmp==1.1.0' 'aleph-superfluid~=0.2.1' 'sqlalchemy[asyncio]>=2.0' 'aiosqlite==0.19.0' 'alembic==1.13.1' 'aiohttp_cors==0.7.0' 'pyroute2==0.7.12' 'python-cpuid==0.1.0' 'solathon==1.0.2' 'protobuf==5.28.3'
Collecting aleph-message==0.6
  Downloading aleph_message-0.6.0-py3-none-any.whl (17 kB)
Collecting eth-account==0.10
  Downloading eth_account-0.10.0-py3-none-any.whl (109 kB)
Collecting sentry-sdk==1.31.0
  Downloading sentry_sdk-1.31.0-py2.py3-none-any.whl (224 kB)
Collecting qmp==1.1.0
  Downloading qmp-1.1.0-py3-none-any.whl (11 kB)
Collecting aleph-superfluid~=0.2.1
  Downloading aleph_superfluid-0.2.1.tar.gz (20 kB)
  Preparing metadata (setup.py) ... 25l-� �done
25hCollecting sqlalchemy[asyncio]>=2.0
  Downloading SQLAlchemy-2.0.38-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
Collecting aiosqlite==0.19.0
  Downloading aiosqlite-0.19.0-py3-none-any.whl (15 kB)
Collecting alembic==1.13.1
  Downloading alembic-1.13.1-py3-none-any.whl (233 kB)
Collecting aiohttp_cors==0.7.0
  Downloading aiohttp_cors-0.7.0-py3-none-any.whl (27 kB)
Collecting pyroute2==0.7.12
  Downloading pyroute2-0.7.12-py3-none-any.whl (460 kB)
Collecting python-cpuid==0.1.0
  Downloading python-cpuid-0.1.0.tar.gz (21 kB)
  Installing build dependencies ... 25l-� �\� �|� �/� �-� �done
25h  Getting requirements to build wheel ... 25l-� �error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [40 lines of output]

      An error occurred while building the project, please ensure you have the most updated version of setuptools, setuptools_scm and wheel with:
         pip install -U setuptools setuptools_scm wheel

      Traceback (most recent call last):
        File /usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py, line 363, in <module>
          main()
        File /usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py, line 345, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File /usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py, line 130, in get_requires_for_build_wheel
          return hook(config_settings)
        File /usr/lib/python3/dist-packages/setuptools/build_meta.py, line 162, in get_requires_for_build_wheel
          return self._get_build_requires(
        File /usr/lib/python3/dist-packages/setuptools/build_meta.py, line 143, in _get_build_requires
          self.run_setup()
        File /usr/lib/python3/dist-packages/setuptools/build_meta.py, line 158, in run_setup
          exec(compile(code, __file__, 'exec'), locals())
        File setup.py, line 18, in <module>
          setup(
        File /usr/lib/python3/dist-packages/setuptools/__init__.py, line 153, in setup
          return distutils.core.setup(**attrs)
        File /usr/lib/python3/dist-packages/setuptools/_distutils/core.py, line 109, in setup
          _setup_distribution = dist = klass(attrs)
        File /usr/lib/python3/dist-packages/setuptools/dist.py, line 459, in __init__
          _Distribution.__init__(
        File /usr/lib/python3/dist-packages/setuptools/_distutils/dist.py, line 293, in __init__
          self.finalize_options()
        File /usr/lib/python3/dist-packages/setuptools/dist.py, line 836, in finalize_options
          for ep in sorted(loaded, key=by_order):
        File /usr/lib/python3/dist-packages/setuptools/dist.py, line 835, in <lambda>
          loaded = map(lambda e: e.load(), filtered)
        File /usr/lib/python3/dist-packages/pkg_resources/__init__.py, line 2464, in load
          self.require(*args, **kwargs)
        File /usr/lib/python3/dist-packages/pkg_resources/__init__.py, line 2487, in require
          items = working_set.resolve(reqs, env, installer, extras=self.extras)
        File /usr/lib/python3/dist-packages/pkg_resources/__init__.py, line 782, in resolve
          raise VersionConflict(dist, req).with_context(dependent_req)
      pkg_resources.VersionConflict: (setuptools 59.6.0 (/usr/lib/python3/dist-packages), Requirement.parse('setuptools>=61'))
      [end of output]
Since Python code runs asynchronously in the same process, sharing the global sys.stdout, prints from an
individual call cannot be isolated from other calls.
Was causing issue when instances failed to start and there was no
persistent VM

Jira ticket ALEPH-436 (part of ALEPH-436 )
Wants=

    Configures (weak) requirement dependencies on other units.

Requires=

    Similar to Wants=, but declares a stronger requirement dependency.

    If this unit gets activated, the units listed will be activated as well. If one of the other units fails to activate, and an ordering dependency After= on the failing unit is set, this unit will not be started. Besides, with or without specifying After=, this unit will be stopped (or restarted) if one of the other units is explicitly stopped (or restarted).

Documentation: https://www.freedesktop.org/software/systemd/man/latest/systemd.unit.html
Server api1.aleph.im is very slow and outdated
(Core i7 from 2018, up to 40 seconds to respond
to `/metrics` in the monitoring).

We suspect that this causes issues in the
monitoring and performance of the network.

This branch removes all references to api1 and
replaces them with api3 where relevant.

Co-authored-by: Bram <cortex@worlddomination.be>
Co-authored-by: Olivier Le Thanh Duong <olivier@lethanh.be>
Bumps [sentry-sdk](https://github.com/getsentry/sentry-python) from 1.31 to 2.8.0.
- [Release notes](https://github.com/getsentry/sentry-python/releases)
- [Changelog](https://github.com/getsentry/sentry-python/blob/master/CHANGELOG.md)
- [Commits](getsentry/sentry-python@1.31.0...2.8.0)

---
updated-dependencies:
- dependency-name: sentry-sdk
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Fix: Put the correct aleph-message version on the package generation script.
Adding type annotation for a better code clarity and safety.
This has been done on the migration of pydantic branch, but adding it first because it should not
break the code and facilitate the merge of the pydantic PR
This bumps their versions to
- Kubo 0.23.0 -> 0.33.1

The list of changes regarding Kubo too large to be
 mentioned here, but I mostly expect performance
 improvements as the main API has not changed much.
Due to an error reading the denylist.

The error was:
```
Error: constructing the node (see log for full detail):
error walking /home/ipfs/.config/ipfs/denylists:
lstat /home/ipfs/.config/ipfs/denylists: permission denied
```

The [documentation](https://specs.ipfs.tech/compact-denylist-format/)
mentions that:

> Implementations SHOULD look in /etc/ipfs/denylists/ and
> $XDG_CONFIG_HOME/ipfs/denylists/ (default: ~/.config/ipfs/denylists)
> for denylist files.

I am not sure why this only failed on Ubuntu 22.04
and not Debian 12 or Ubuntu 24.04. My first assumption
would be a difference in Systemd.
@hoh hoh requested a review from olethanh March 12, 2025 16:56
@hoh hoh self-assigned this Mar 18, 2025
Comment thread packaging/aleph-vm/etc/systemd/system/ipfs.service Outdated
Comment thread src/aleph/vm/storage.py
tmp_path.unlink(missing_ok=True)


async def download_file_from_ipfs_or_connector(ref: str, cache_path: Path, filetype: str) -> None:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async def download_file_from_ipfs_or_connector(ref: str, cache_path: Path, filetype: str) -> None:
async def download_file_from_ipfs_or_connector(item_hash: ItemHash, cache_path: Path, filetype: str) -> None:

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 19, 2025

Codecov Report

❌ Patch coverage is 70.75027% with 807 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.41%. Comparing base (b6053f6) to head (dadad88).
⚠️ Report is 1289 commits behind head on main.

Files with missing lines Patch % Lines
src/aleph/vm/orchestrator/views/operator.py 57.14% 77 Missing and 13 partials ⚠️
src/aleph/vm/garbage_collector.py 0.00% 78 Missing ⚠️
src/aleph/vm/orchestrator/views/__init__.py 16.66% 70 Missing ⚠️
...c/aleph/vm/hypervisors/qemu_confidential/qemuvm.py 29.82% 40 Missing ⚠️
src/aleph/vm/orchestrator/tasks.py 50.81% 27 Missing and 3 partials ⚠️
...aleph/vm/controllers/qemu_confidential/instance.py 58.57% 29 Missing ⚠️
src/aleph/vm/models.py 57.81% 24 Missing and 3 partials ⚠️
src/aleph/vm/pool.py 27.02% 27 Missing ⚠️
src/aleph/vm/conf.py 78.51% 11 Missing and 15 partials ⚠️
src/aleph/vm/orchestrator/status.py 25.00% 24 Missing ⚠️
... and 38 more
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #779       +/-   ##
===========================================
+ Coverage   35.09%   63.41%   +28.32%     
===========================================
  Files          53       77       +24     
  Lines        4893     6877     +1984     
  Branches      583      578        -5     
===========================================
+ Hits         1717     4361     +2644     
+ Misses       3154     2326      -828     
- Partials       22      190      +168     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

hoh added 2 commits March 19, 2025 15:49
IPFS Kubo is now present on all CRNs but unused.

Downloading files from this service would be more reliable than using
third party and provide content integrity checks.

This bypasses the vm-connector when fetching resources from IPFS.
Kubo service could also use unlimited memory.
@hoh
Copy link
Copy Markdown
Member Author

hoh commented Mar 19, 2025

@olethanh I moved the resource limits into this branch.

@hoh hoh removed their assignment May 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants