Skip to content

Conversation

@alexcos20
Copy link
Member

@alexcos20 alexcos20 commented Oct 1, 2024

Breaking Changes proposed in this PR (requires other repos to be updated):

Changes proposed in this PR:

  • new freeStartCompute and getComputeStreamableLogs commands
  • c2d using docker engine

All additional work should be based on PRs based against this one

Usage:

 export DOCKER_SOCKET_PATH='/var/run/docker.sock'

and then do a directCommand:

{
    "command": "freeStartCompute",
    "consumerAddress": "0xC7EC1970B09224B317c52d92f37F5e1E4fF6B687",
    "nonce": 1,
    "signature": "0x123",
    "datasets": [
        {
            "fileObject": {
                "type": "url",
                "url": "SOME_DATASET_URL",
                "method": "GET"
            }
        }
    ],
    "algorithm": {
        "fileObject": {
            "type": "url",
            "url": "SOME_ALGO_URL",
            "method": "GET"
        },
        "meta": {
            "container": {
                "image": "SOME_CONTAINER",
                "tag": "latest",
                "entrypoint": "python $ALGO'"
            }
        }
    }
}

@paulo-ocean
Copy link
Contributor

paulo-ocean commented Oct 24, 2024

  • Changes: instead of having dataset and additionalDatasets, we merged them all under datasets

FYI, i've created a PR on Ocean JS related with the breaking changes:
oceanprotocol/ocean.js#1867

other changes also included in this PR here: #735 #737 #736 #739

@paulo-ocean paulo-ocean marked this pull request as ready for review October 24, 2024 12:29
@paulo-ocean paulo-ocean marked this pull request as draft October 24, 2024 12:30
@paulo-ocean paulo-ocean marked this pull request as ready for review October 24, 2024 13:37
@paulo-ocean paulo-ocean marked this pull request as draft October 24, 2024 14:41
This was linked to issues Oct 25, 2024
@paulo-ocean
Copy link
Contributor

paulo-ocean commented Nov 6, 2024

Breaking Changes proposed in this PR (requires other repos to be updated):

Changes proposed in this PR:

  • new freeStartCompute and getComputeStreamableLogs commands
  • c2d using docker engine

All additional work should be based on PRs based against this one

Usage:

 export DOCKER_SOCKET_PATH='/var/run/docker.sock'

and then do a directCommand:

{
    "command": "freeStartCompute",
    "consumerAddress": "0xC7EC1970B09224B317c52d92f37F5e1E4fF6B687",
    "nonce": 1,
    "signature": "0x123",
    "datasets": [
        {
            "fileObject": {
                "type": "url",
                "url": "SOME_DATASET_URL",
                "method": "GET"
            }
        }
    ],
    "algorithm": {
        "fileObject": {
            "type": "url",
            "url": "SOME_ALGO_URL",
            "method": "GET"
        },
        "meta": {
            "container": {
                "image": "SOME_CONTAINER",
                "tag": "latest",
                "entrypoint": "python $ALGO'"
            }
        }
    }
}

Hi @alexcos20 , what if the 'fileObject' is not present on the datasets or on the algorithm ? We need to get it from the service files object? I mean, get the ddos & decrypt the service.files to get these right?
Its also not clear to me, how we pass this, for instance from the CLI -> start compute ... We have no idea to know the fileObject info, without decrypting the service files.. or am i missing something here?
thanks

@jamiehewitt15
Copy link
Contributor

We should update the documentation to explain how this works and how to set it up.

@jamiehewitt15
Copy link
Contributor

I tried to test it via the directCommand endpoint but I found it causes the node to crash. I'll give it another go when it's marked as ready for review, as changes are still being made.

@jamiehewitt15
Copy link
Contributor

Update: it seems to work via the direct command now but I will do some more testing to see if there are any more bugs

@paulo-ocean
Copy link
Contributor

paulo-ocean commented Nov 19, 2024

Update: it seems to work via the direct command now but I will do some more testing to see if there are any more bugs

sure, but this is a draft in progress :-) expect it

@@ -0,0 +1 @@
declare module 'docker-registry-client'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file seems to have a typo. Presumably it should have been src/@types/docker-registry-client.ts

import {
ComputeGetEnvironmentsHandler,
ComputeStartHandler,
// ComputeStartHandler,
Copy link
Contributor

@jamiehewitt15 jamiehewitt15 Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of imports and variables in this file that are commented out. Can we just remove them?

@@ -0,0 +1,236 @@
import { C2DDatabase } from '../../components/database/C2DDatabase.js'
// import { existsEnvironmentVariable, getConfiguration } from '../../utils/config.js'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// import { existsEnvironmentVariable, getConfiguration } from '../../utils/config.js'

C2DStatusText,
ComputeAlgorithm,
ComputeAsset,
// ComputeEnvironment,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ComputeEnvironment,

DBComputeJob,
RunningPlatform
} from '../../@types/C2D/C2D.js'
// import { computeAsset } from '../data/assets'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// import { computeAsset } from '../data/assets'

@paulo-ocean
Copy link
Contributor

we need to fix the conflicts and eventually merge this

Copy link
Contributor

@jamiehewitt15 jamiehewitt15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there was a bug introduced in #860. Now the job gets stuck on publishing results status, which wasn't happening before:

Logs
Checking job status...

Job status: {owner: '0x4c495310EF3259830FEd12ea90DbB0B3D39103FD', did: null, jobId: '540f8543-ef1a-4623-9f5a-cef34a8c1cbb', dateCreated: '1742294240.728', dateFinished: null, …}

Status text: Job started

Checking job status...

Job status: {owner: '0x4c495310EF3259830FEd12ea90DbB0B3D39103FD', did: null, jobId: '540f8543-ef1a-4623-9f5a-cef34a8c1cbb', dateCreated: '1742294240.728', dateFinished: null, …}

Status text: Configuring volumes

Checking job status...

Job status: {owner: '0x4c495310EF3259830FEd12ea90DbB0B3D39103FD', did: null, jobId: '540f8543-ef1a-4623-9f5a-cef34a8c1cbb', dateCreated: '1742294240.728', dateFinished: null, …}

Status text: Running algorithm 

Using nonce: 1742294255389

Generated result signature: 0xc183940019df1ed5b07c5fb24078ab61daf38b2d4708fe2528159eccb470ccb41bd3c689b27af3738cec07d40fff8a3f4c00439df418416c1ff66b99312debfd1c

Result signature valid: true

Response:  Response {size: 0, timeout: 0, Symbol(Body internals): {…}, Symbol(Response internals): {…}}

Response body:  PassThrough {_events: {…}, _readableState: ReadableState, _writableState: WritableState, allowHalfOpen: true, _maxListeners: undefined, …}

Checking job status...

Job status: {owner: '0x4c495310EF3259830FEd12ea90DbB0B3D39103FD', did: null, jobId: '540f8543-ef1a-4623-9f5a-cef34a8c1cbb', dateCreated: '1742294240.728', dateFinished: null, …}

Status text: Running algorithm 

Stream complete

Checking job status...

Job status: {owner: '0x4c495310EF3259830FEd12ea90DbB0B3D39103FD', did: null, jobId: '540f8543-ef1a-4623-9f5a-cef34a8c1cbb', dateCreated: '1742294240.728', dateFinished: null, …}

Status text: Publishing results

Checking job status...

Job status: {owner: '0x4c495310EF3259830FEd12ea90DbB0B3D39103FD', did: null, jobId: '540f8543-ef1a-4623-9f5a-cef34a8c1cbb', dateCreated: '1742294240.728', dateFinished: null, …}

Status text: Publishing results

Checking job status...

Job status: {owner: '0x4c495310EF3259830FEd12ea90DbB0B3D39103FD', did: null, jobId: '540f8543-ef1a-4623-9f5a-cef34a8c1cbb', dateCreated: '1742294240.728', dateFinished: null, …}

Status text: Publishing results

Job status: {owner: '0x4c495310EF3259830FEd12ea90DbB0B3D39103FD', did: null, jobId: '540f8543-ef1a-4623-9f5a-cef34a8c1cbb', dateCreated: '1742294240.728', dateFinished: null, …}

Status text: Publishing results

Checking job status...

Job status: {owner: '0x4c495310EF3259830FEd12ea90DbB0B3D39103FD', did: null, jobId: '540f8543-ef1a-4623-9f5a-cef34a8c1cbb', dateCreated: '1742294240.728', dateFinished: null, …}

Status text: Publishing results

Checking job status...

Job status: {owner: '0x4c495310EF3259830FEd12ea90DbB0B3D39103FD', did: null, jobId: '540f8543-ef1a-4623-9f5a-cef34a8c1cbb', dateCreated: '1742294240.728', dateFinished: null, …}

Status text: Publishing results

Checking job status...

Job status: {owner: '0x4c495310EF3259830FEd12ea90DbB0B3D39103FD', did: null, jobId: '540f8543-ef1a-4623-9f5a-cef34a8c1cbb', dateCreated: '1742294240.728', dateFinished: null, …}

Status text: Publishing results

Checking job status...

Job status: {owner: '0x4c495310EF3259830FEd12ea90DbB0B3D39103FD', did: null, jobId: '540f8543-ef1a-4623-9f5a-cef34a8c1cbb', dateCreated: '1742294240.728', dateFinished: null, …}

Status text: Publishing results

Checking job status...

Job status: {owner: '0x4c495310EF3259830FEd12ea90DbB0B3D39103FD', did: null, jobId: '540f8543-ef1a-4623-9f5a-cef34a8c1cbb', dateCreated: '1742294240.728', dateFinished: null, …}

Status text: Publishing results

Checking job status...

Job status: {owner: '0x4c495310EF3259830FEd12ea90DbB0B3D39103FD', did: null, jobId: '540f8543-ef1a-4623-9f5a-cef34a8c1cbb', dateCreated: '1742294240.728', dateFinished: null, …}

Status text: Publishing results

Checking job status...

Job status: {owner: '0x4c495310EF3259830FEd12ea90DbB0B3D39103FD', did: null, jobId: '540f8543-ef1a-4623-9f5a-cef34a8c1cbb', dateCreated: '1742294240.728', dateFinished: null, …}

Status text: Publishing results

Checking job status...

Job status: {owner: '0x4c495310EF3259830FEd12ea90DbB0B3D39103FD', did: null, jobId: '540f8543-ef1a-4623-9f5a-cef34a8c1cbb', dateCreated: '1742294240.728', dateFinished: null, …}

Status text: Publishing results

Job status: {owner: '0x4c495310EF3259830FEd12ea90DbB0B3D39103FD', did: null, jobId: '540f8543-ef1a-4623-9f5a-cef34a8c1cbb', dateCreated: '1742294240.728', dateFinished: null, …}

Status text: Publishing results

Checking job status...

Job status: {owner: '0x4c495310EF3259830FEd12ea90DbB0B3D39103FD', did: null, jobId: '540f8543-ef1a-4623-9f5a-cef34a8c1cbb', dateCreated: '1742294240.728', dateFinished: null, …}

Status text: Publishing results

Checking job status...

Job status: {owner: '0x4c495310EF3259830FEd12ea90DbB0B3D39103FD', did: null, jobId: '540f8543-ef1a-4623-9f5a-cef34a8c1cbb', dateCreated: '1742294240.728', dateFinished: null, …}

Status text: Publishing results

Checking job status...

Job status: {owner: '0x4c495310EF3259830FEd12ea90DbB0B3D39103FD', did: null, jobId: '540f8543-ef1a-4623-9f5a-cef34a8c1cbb', dateCreated: '1742294240.728', dateFinished: null, …}

Status text: Publishing results

Checking job status...

Job status: {owner: '0x4c495310EF3259830FEd12ea90DbB0B3D39103FD', did: null, jobId: '540f8543-ef1a-4623-9f5a-cef34a8c1cbb', dateCreated: '1742294240.728', dateFinished: null, …}

Status text: Publishing results

Checking job status...

Job status: {owner: '0x4c495310EF3259830FEd12ea90DbB0B3D39103FD', did: null, jobId: '540f8543-ef1a-4623-9f5a-cef34a8c1cbb', dateCreated: '1742294240.728', dateFinished: null, …}

Status text: Publishing results

Comment on lines +349 to +353
DOCKER_SOCKET_PATH: {
name: 'DOCKER_SOCKET_PATH',
value: process.env.DOCKER_SOCKET_PATH,
required: false
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not using this anymore, can we remove it?

Comment on lines 127 to 140
read -p "Enter the docker socket path: " DOCKER_SOCKET_PATH
DOCKER_SOCKET_PATH=${DOCKER_SOCKET_PATH:-''}
read -p "Enter the docker protocol: " DOCKER_PROTOCOL
DOCKER_PROTOCOL=${DOCKER_PROTOCOL:-''}
read -p "Enter the docker host: " DOCKER_HOST
DOCKER_HOST=${DOCKER_HOST:-''}
read -p "Enter the docker port: " DOCKER_PORT
DOCKER_PORT=${DOCKER_PORT:-0}
read -p "Enter the docker certificate authority path: " DOCKER_CA_PATH
DOCKER_CA_PATH=${DOCKER_CA_PATH:-''}
read -p "Enter the docker certificate path: " DOCKER_CERT_PATH
DOCKER_CERT_PATH=${DOCKER_CERT_PATH:-''}
read -p "Enter the docker key path: " DOCKER_KEY_PATH
DOCKER_KEY_PATH=${DOCKER_KEY_PATH:-''}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, we are not using any of these anymore, can we remove them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make socker path hardcoded if you want and remove the rest

@jamiehewitt15
Copy link
Contributor

There seems to be something wrong with the deployed version we have. Sometimes, I get this error:

Free start compute error:  AxiosError {code: 'ETIMEDOUT', errors: Array(2), message: '', name: 'AggregateError', config: {…}, …}

Never seen this error before when I'm running the node on my vm.

@mariacarmina
Copy link
Contributor

It looks like there was a bug introduced in #860. Now the job gets stuck on publishing results status, which wasn't happening before:

Logs

I can confirm this behaviour, tested with vscode-extension, now the Node URL points to the latest C2D modifications (including #860) and it gets stuck to the Publishing results phase:
Screenshot 2025-03-18 at 17 34 43
I'll create a separate issue for this that can be tackled separately of this PR.

@mariacarmina
Copy link
Contributor

Let's also fix the conflicts here because it might cause the stucking problem when publishing results in vscode-extension.
These logs keep popping up when publishing thre results:

DATABASE:	Could not find any running C2D jobs!�[39m
2025-03-18T08:50:00.590Z �[32minfo�[39m: �[32mDATABASE:	Could not find any jobs for the specified enviroment:

@mariacarmina
Copy link
Contributor

Testing with the updated branch still generates an infinite loop when publishing results in both scenarios (success and failure of the algo execution).
Btw we should address in further tasks for vscode-extension to not hardcode the docker image and tags because ocean-node will configure volumes based on branin docker image (which happens right now), but running a different algorithm code:

algorithm: {
    meta: {
      rawcode: 'import time\n' +
        'import asyncio\n' +
        'import os\n' +
        '\n' +
        '# Constants for timing (in seconds)\n' +
        'TOTAL_DURATION = 10  # 10 seconds\n' +
        'LOG_INTERVAL = 1     # 1 second\n' +
        '\n' +
        'async def run_logging():\n' +
        "    print('RAW CODE: Starting logging process...')\n" +
        '    \n' +
        '    start_time = time.time()\n' +
        '    current_iteration = 1\n' +
        '    results = []\n' +
        '    \n' +
        '    while True:\n' +
        '        elapsed_time = time.time() - start_time\n' +
        '        \n' +
        "        log_entry = f'Log iteration {current_iteration}: {elapsed_time:.3f} seconds elapsed'\n" +
        '        print(log_entry)\n' +
        '        results.append(log_entry)\n' +
        '        current_iteration += 1\n' +
        '        \n' +
        '        if elapsed_time >= TOTAL_DURATION:\n' +
        "            print('Completed')\n" +
        '            \n' +
        "            # Create the output directory if it doesn't exist\n" +
        "            output_dir = '/data/outputs'\n" +
        '            os.makedirs(output_dir, exist_ok=True)\n' +
        '            \n' +
        '            # Save results to a text file\n' +
        "            txt_file = f'{output_dir}/results.txt'\n" +
        '            \n' +
        "            with open(txt_file, 'w') as f:\n" +
        "                f.write(f'PY Algorithm Results\\n')\n" +
        "                f.write(f'Total time: {elapsed_time:.3f} seconds\\n')\n" +
        "                f.write(f'Total iterations: {current_iteration - 1}\\n')\n" +
        '            \n' +
        '            print(f"Results saved as {txt_file}")\n' +
        "            return 'completed'\n" +
        '            \n' +
        '        await asyncio.sleep(LOG_INTERVAL)\n' +
        '\n' +
        'if __name__ == "__main__":\n' +
        '    asyncio.run(run_logging())',
      container: [Object]
    }
  },
  assets: [],
  isRunning: true,
  isStarted: false,
  containerImage: 'oceanprotocol/algo_dockers:python-branin',

@mariacarmina
Copy link
Contributor

If I selected to run branin algorithm, I do get the following "info" logging:

2025-03-18T20:50:18.750Z info: CORE:	CORE:	consumer address and nonce signature mismatch
Stopped

And still the job execution gets stuck at Publishing results.

@alexcos20
Copy link
Member Author

It looks like there was a bug introduced in #860. Now the job gets stuck on publishing results status, which wasn't happening before:

Logs

This is fixed now

@jamiehewitt15
Copy link
Contributor

This is fixed now

Yeah, I've tested it, and it works end to end now.

@jamiehewitt15
Copy link
Contributor

I'm still occasionally getting a timeout error:

Sending compute request with body: {command: 'freeStartCompute', consumerAddress: '0xE215efFDD674EA9922Dc69048cf8639f1F39a881', environment: '0xf7cc480d925bb41fc6e0f03124faa569d53d626e…32415e357dbf527a863b5c97fb3405e08462907193', nonce: 1742388345394, signature: '0x7fea164abfa1dbb232dc16204a04ad5fe5e22ae34…bef760bdc61174d970930b2b3a76681a2e98f6ed71c', …}
extensionHostProcess.js:178
Free start compute error:  AxiosError {code: 'ETIMEDOUT', errors: Array(2), message: '', name: 'AggregateError', config: {…}, …}
extensionHostProcess.js:178
Error details: AxiosError {code: 'ETIMEDOUT', errors: Array(2), message: '', name: 'AggregateError', config: {…}, …}

Not sure what could be causing this; it doesn't happen every time. When it occurs, there are no errors in the node logs. I've seen it on both https://1.c2d.nodes.oceanprotocol.com:8000/ and https://2.c2d.nodes.oceanprotocol.com:8000/

@alexcos20
Copy link
Member Author

I'm still occasionally getting a timeout error:

Sending compute request with body: {command: 'freeStartCompute', consumerAddress: '0xE215efFDD674EA9922Dc69048cf8639f1F39a881', environment: '0xf7cc480d925bb41fc6e0f03124faa569d53d626e…32415e357dbf527a863b5c97fb3405e08462907193', nonce: 1742388345394, signature: '0x7fea164abfa1dbb232dc16204a04ad5fe5e22ae34…bef760bdc61174d970930b2b3a76681a2e98f6ed71c', …}
extensionHostProcess.js:178
Free start compute error:  AxiosError {code: 'ETIMEDOUT', errors: Array(2), message: '', name: 'AggregateError', config: {…}, …}
extensionHostProcess.js:178
Error details: AxiosError {code: 'ETIMEDOUT', errors: Array(2), message: '', name: 'AggregateError', config: {…}, …}

Not sure what could be causing this; it doesn't happen every time. When it occurs, there are no errors in the node logs. I've seen it on both https://1.c2d.nodes.oceanprotocol.com:8000/ and https://2.c2d.nodes.oceanprotocol.com:8000/

any logs in logs folder?

alexcos20 and others added 2 commits March 20, 2025 16:06
* add some basic docs for c2d

Co-authored-by: giurgiur99 <giurgiur99@gmail.com>
@paulo-ocean paulo-ocean requested a review from giurgiur99 as a code owner March 21, 2025 13:09
@alexcos20 alexcos20 merged commit bc6ec5f into main Mar 21, 2025
13 checks passed
@alexcos20 alexcos20 deleted the feature/c2d_docker branch March 21, 2025 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

C2D.2: Execution layer Docker API C2D.2: Orchestration layer

6 participants