Skip to content

Conversation

@chungquantin
Copy link
Contributor

Milestone Delivery Checklist

  • The milestone-delivery-template.md has been copied and updated.
  • This pull request is being made by the same account as the accepted application.
  • I have disclosed any and all sources of reused code in the submitted repositories and have done my due diligence to meet its license requirements.
  • In case of acceptance, invoices must be submitted and payments will be transferred to the Polkadot AssetHub and/or fiat account provided in the application.
  • The delivery is according to the Guidelines for Milestone Deliverables.

Link to the application pull request: w3f/Grants-Program#2549 < please fill this in with the PR number of your application.

@chungquantin chungquantin changed the title feat: upload delivery proposal for agent kit milestone 1 Elastic Labs - Polkadot Agent Kit - Milestone 1 Jul 16, 2025
@chungquantin chungquantin marked this pull request as ready for review July 22, 2025 03:21
@chungquantin
Copy link
Contributor Author

Dear reviewers / curators.

In the official pull request of the application w3f/Grants-Program#2549, I requested to receive the payment onchain, however, I would love to change to receive fiat for the first proposal (10k$) and receive the rest payment of the proposal in DOT and USDC as we had some changes in our company's operational plan. May I know the proper process to make the changes?

@github-actions github-actions bot added the stale label Aug 5, 2025
@PieWol PieWol self-assigned this Aug 5, 2025
@PieWol
Copy link
Member

PieWol commented Aug 5, 2025

Hey @chungquantin ,
thanks for your delivery. As you can see here in our FAQ, we pay in both USDC and DOT. so it won't be possible to switch to FIAT. I hope you can make it work. Let me know if you have any other questions or concerns.

Since this PR is now ready for review we'll get to evaluating.

@PieWol PieWol removed the stale label Aug 5, 2025
@chungquantin
Copy link
Contributor Author

Hi @PieWol, thanks for letting me know, I was clarified in the main proposal PR and I'm happy with receiving stablecoins.

@chungquantin
Copy link
Contributor Author

Hi @PieWol may I know the status of the submission? It has been here for nearly a month.

Copy link
Contributor

@Lederstrumpf Lederstrumpf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @chungquantin, apologies for the wait.

Thanks for delivering the first milestone - I've started review of your submission.

Your unit/e2e test suite passes on my side, and the integration tests pass on an M2's GPU and an RTX5090. I have not studied the test suite in detail yet though, so may have more feedback here.

In my review, I've focused on your instructions for client side, see my instantiation here: https://github.com/Lederstrumpf/polkadot-agent-kit-testing
This does unfortunately not work yet:

/> pnpm install dist/src/quickstart.js
/> tsc
/> node dist/src/quickstart.js
node:internal/modules/cjs/loader:664
      throw e;
      ^

Error [ERR_PACKAGE_PATH_NOT_EXPORTED]: Package subpath './src/api/client' is not defined by "exports" in /home/lederstrumpf/polkadot/w3f-grants/evaluation-repos/polkadot-agent-kit-2549/polkadot-agent-kit-testing/node_modules/.pnpm/@[email protected]_@[email protected]_@[email protected]__89468b8fe91534884d194ff6c58c1ff8/node_modules/@polkadot-agent-kit/core/package.json
    at exportsNotFound (node:internal/modules/esm/resolve:314:10)
    at packageExportsResolve (node:internal/modules/esm/resolve:661:9)
    at resolveExports (node:internal/modules/cjs/loader:657:36)
    at Function._findPath (node:internal/modules/cjs/loader:749:31)
    at Function._resolveFilename (node:internal/modules/cjs/loader:1387:27)
    at defaultResolveImpl (node:internal/modules/cjs/loader:1057:19)
    at resolveForCJSWithHooks (node:internal/modules/cjs/loader:1062:22)
    at Function._load (node:internal/modules/cjs/loader:1211:37)
    at TracingChannel.traceSync (node:diagnostics_channel:322:14)
    at wrapModuleLoad (node:internal/modules/cjs/loader:235:24) {
  code: 'ERR_PACKAGE_PATH_NOT_EXPORTED'
}

Node.js v22.17.0

Adding your other @polkadot-agent-kit packages published via https://www.npmjs.com/~cqtin0903 did not address this issue.
If you have any changes to the sdk that are not published yet and fix this, please publish them.

Notwithstanding, please find below my preliminary concerns, in my subjective descending order of priority, tagged by applicable sub-milestone:

M1.2: transferNativeTool only supports polkadot & westend relay chains

The published version:

type DescriptorsRelayType = {
    polkadot: typeof polkadot;
    west: typeof west;
};

Very easy fixes: Please add kusama and paseo too. I see you have added paseo relay to your repo's implementation, but not published this, yet kusama is certainly still missing there too, as are both kusama & paseo asset hubs.

I also note that, of the parachains, only bifrost (DOT), hydra, and paseo's people chain are supported. Is there a material technical reason you've omitted the others? To my understanding, your grant proposal's milestones are for a general XCM agent solution (in particular here M1.1b&c).

Unspecific: Agent can only be instantiated by providing private key

This isn't specific to a particular M1 sub-milestone, but the grant application specified that the sdk would support agent creation using mnemonics. However, it seems that the interface currently only takes raw private keys as the first (non-optional) argument to the PolkadotAgentKit constructor. Please add support for instantiation using mnemonics.

M1.1b&c: incorrect/outdate transferNativeTool usage in quickstart & missing xcmTransferNativeTool call.

The transferNativeTool call requires a KnowChainId type arg, which doesn't reflect in the quickstart instructions. Likewise, the published version only has transferNativeTool and no xcmTransferNativeTool. Please address.

M1.0b: unclear/fragmented quickstart instructions

Please spell out the configuration steps for testers & users of your sdk fully.
For instance, a short stub on setting up ollama would be helpful so users unfamiliar with this process don't spend time troubleshooting unnecessarily.

It would be nice if you could create a template repo for instantiating polkadot agents that future users of your sdk could use as a starting point. Apologies if this already exists, and I missed it in your sdk/docs.

M1.0c: CI doesn't cover e2e nor integration tests

Only the unit tests are within your CI suite, at least on GH.

M1.0c: integration tests fail when running on CPU

Please a timeout bypass, or set a reasonable timeout for slow ollama instances, in particular when running the model outside a GPU.
This isn't just a nit: I don't imagine that unless you get GPU instances for the pnpm test:integration, you'll be able to add these tests (successfully) in GH CI without increasing the timeout bound.


I'll review further once you have addressed the above issues.

For future deliveries:
Please test your setup & testing instructions fully before sending them in for evaluation.

Look forward to receiving your updates - I think this could be an exciting tool for users :)

@Lederstrumpf
Copy link
Contributor

Lederstrumpf commented Aug 18, 2025

Hi @chungquantin, please see my initial evaluation here - it largely overlaps with my comment above.

Your unit/e2e test suite passes on my side, and the integration tests pass on an M2's GPU and an RTX5090. I have not studied the test suite in detail yet though, so may have more feedback here.

I've looked a bit into the details of your e2e/integration tests now.
Is my understanding correct that both the e2e tests and the integration tests don't check the test outcome for its intended consequence? The only test I see an actual constraint for is the balance verification.

For instance, your integration test should xcm transfer 0.001 WND from Westend Asset Hub to Westend using natural language should, in my mind, clearly also check that

  1. the sender's balance on Westend Asset Hub decreased by 0.001 WND (+ tx fees),
  2. the recipient's balance on Westend increased by 0.001 WND
  3. no other unexpected state mutations occurred

Point 3. might seem pedantic, but since the ollama agent could also be initiating transacts other than intended by the user query transfer 0.001 WND to ${RECIPIENT} from Westend Asset Hub to Westend, I deem it crucial to test that no such unintended side-effects occurred.

Likewise, as mentioned in my M1.2 eval notes, I feel the scope of your xcm tests is currently too narrow.

If I misunderstood the testing harness or missed some key code, then apologies for raising this comment's concerns.

@Lederstrumpf Lederstrumpf self-assigned this Aug 20, 2025
@chungquantin
Copy link
Contributor Author

HI @Lederstrumpf, below is our report for the issues you mentioned above:

M1.0a – Unable to run @polkadot-agent-kit/sdk

  • Issue: The SDK could not be run because an outdated version was being used.
  • Fix: Updated to the latest release (v1.1.3).

Find the v1.1.3 here: https://www.npmjs.com/package/@polkadot-agent-kit/sdk/v/1.1.3


M1.1a – transferNativeTool limited to Polkadot & Westend

  • Issue: transferNativeTool only supported Polkadot and Westend relay chains.
  • Fix: Extended support to additional networks (Kusama and Paseo).
  • Commit: 1e1f35b

M1.1b & M1.1c – Outdated usage examples & missing xcmTransferNativeTool

  • Issue: Documentation contained outdated transferNativeTool usage and lacked an example for xcmTransferNativeTool.
  • Fix: Provided updated examples and a testing repo to demonstrate step-by-step setup.
  • Reference: Testing Repository
  • Note: A template app will be released soon.

M1.2 – Agent instantiation requires private key only

  • Issue: Agents could only be instantiated using a private key.
  • Fix: Added support for mnemonic-based instantiation.
  • Commit: 3e1559d

M1.0b – Quickstart instructions unclear/fragmented

  • Issue: Quickstart documentation was fragmented and confusing.
  • Fix: Documentation is being gradually improved to provide clear guidance for developers.

M1.0c – CI coverage & test stability

  • Issue A: CI did not include E2E or integration tests.

    • Fix: Added E2E test coverage to CI.
    • Commit: 159e343
  • Issue B: Integration tests failed on CPU (timeouts with smoldot light client).

    • Fix: Increased timeout handling to improve stability.
    • Note: Some integration tests target mainnet, which cannot always run in CI frequently.

Let me know if there are any issues!

Co-authored-by: Robert Hambrock <[email protected]>
@chungquantin
Copy link
Contributor Author

HI @Lederstrumpf we have submitted our response for the evaluation report

0b. Instructions for client integration do not work yet, perhaps some will once latest sdk changes are published.

Done: This is template testing for polkadot agent kit - Client Integration https://github.com/CocDap/polkadot-agent-kit-testing
Future: we will update the latest docs in this website https://cocdap.github.io/agent-docs/

0c. E2E & integration tests not part of CI pipeline yet

We have added the E2E on CI pipeline, because we have perfomed some integration tests on mainnet , then we dont add it into CI

1b. Firstly: Integration tests don't test constrain test outcome by any criteria; see 2. Secondly: No parachain->parachain tests in E2E or integration tests. Thirdly want to test client integration before approving; see 0b.

PR for adding parachain and parachain testing: elasticlabs-org/polkadot-agent-kit#85

1c. Cannot assess this with certainty while E2E tests only assert that something™ happened - only checks for "success" & return of mock results

PR for clear assertion: make sure the agent call correct tool call : elasticlabs-org/polkadot-agent-kit#85

2. Firstly: Doesn't seem the integration test suite actually checks for successful results - only that something™ happened - ein nischt-nischt. Secondly: Missing both paseo & kusama relay & asset hub chains, as well as most parachains. Thirdly: Only three xcm test queries in integration test suite; these should at least be supplemented by equivalent tests for other parachain/relaychain pairings.

We have replied you on this comment #1270 (comment)

@chungquantin
Copy link
Contributor Author

Hi reviewers, cc @Lederstrumpf can you please let us know the next step? The submission has been in the review stage for more than a month and a half.

@Lederstrumpf
Copy link
Contributor

Hi @chungquantin,

Thank you for the updates to the delivery.
I appreciate in particular the improvement to the integration tests, i.e. that you now check that your tool is invoked with the correct parameters (matching the query).

0c. Unit tests + E2E tests guide + CI pipeline

from @chungquatin: Issue B: Integration tests failed on CPU (timeouts with smoldot light client).
Fix: Increased timeout handling to improve stability.

Thanks, note that even with the timeout increase, it still times out on an M2 Max CPU, so imagine it will likely do the same once in your CI, lest you get some GPU instances for the CI.

from @chungquatin: Note: Some integration tests target mainnet, which cannot always run in CI frequently.

Noted, and understand the mainnet concern. In this case, could you please separate out the mainnet integration tests and only run the testnet integration tests within CI by default?
I'd add a flag then to toggle also running the mainnet tests, and then suggest doing this at least before new releases.

2. Thorough test coverage for XCM functionalities

via https://github.com/elasticlabs-org/polkadot-agent-kit/blob/b4bd2b24371c885ec0c42146039b21afcb71f03b/packages/sdk/tests/integration-tests/sdk.itest.ts#L98-L112:

    expect(result.output).toBeDefined();
    expect(result.intermediateSteps).toBeDefined();
    expect(result.intermediateSteps.length).toBeGreaterThan(0);
    
    const xcmTransferCall = result.intermediateSteps.find((step: any) => 
      step.action?.tool === 'xcm_transfer_native_asset'
    );
    
    expect(xcmTransferCall).toBeDefined();
    expect(xcmTransferCall.action.toolInput).toMatchObject({
      amount: '0.001',
      to: RECIPIENT,
      sourceChain: 'AssetHubWestend',
      destChain: 'Westend'
    });

While checking that your tool receives the correct inputs is clearly a marked improvement over the prior test structure, I still deem it crucial though that you test for the on-chain result - this is a financial tool moving real value, after all.
From my prior comment:

Point 3. might seem pedantic, but since the ollama agent could also be initiating transacts other than intended by the user query transfer 0.001 WND to ${RECIPIENT} from Westend Asset Hub to Westend, I deem it crucial to test that no such unintended side-effects occurred.

To spell the above out: please check the onchain outcome of the tests. In the above example, this would be checking that

  1. the sender's balance on Westend AH has decreased by 0.001 WND (+ transfer fees - feel free to just add a reasonable upper bound for these when checking the balance value)
  2. the recipient's balance on Westend has increased by 0.001 WND
  3. no other side-effects occurred

Note also that 3 of 7 integration tests fail, see attached logs from both nixos & ubuntu testing (with an RTX5090); logs below:

For example:

@polkadot-agent-kit/sdk:test:integration: AssertionError: expected undefined to be defined
@polkadot-agent-kit/sdk:test:integration:  ❯ tests/integration-tests/sdk.itest.ts:161:22
@polkadot-agent-kit/sdk:test:integration:     159|     expect(checkBalanceCalls.length).toBeGreaterThanOrEqual(1);
@polkadot-agent-kit/sdk:test:integration:     160|
@polkadot-agent-kit/sdk:test:integration:     161|     expect(initCall).toBeDefined();
@polkadot-agent-kit/sdk:test:integration:        |                      ^
@polkadot-agent-kit/sdk:test:integration:     162|     expect(initCall.action.toolInput).toEqual({
@polkadot-agent-kit/sdk:test:integration:     163|       chainId: 'polkadot'

For future deliveries:
Please test your setup & testing instructions fully before sending them in for evaluation.

Reiterating this: please test your instructions on a clean system (and run all tests) before sending in for evaluation.
In addition to the failing test above, the delivery lacked a package import, see elasticlabs-org/polkadot-agent-kit#92

Copy link
Contributor

@Lederstrumpf Lederstrumpf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

via above comment

CocDap pushed a commit to elasticlabs-org/polkadot-agent-kit that referenced this pull request Sep 2, 2025
@CocDap
Copy link

CocDap commented Sep 11, 2025

  1. Enhanced Integration tests

PR: elasticlabs-org/polkadot-agent-kit#93

This change introduces pre- and post-execution balance assertions
for the agent and recipient within the integration tests.

The test suite was executed successfully over times to validate
stability and performance on an M1 macOS system with 32GB of RAM.

  1. Integration Test ‘s CI Issue

Setting up integration tests with Ollama in our CI pipeline is not feasible at the moment. My testing with an Ollama GitHub Action showed that large models like Llama 3.2 cause fetch failed errors and consume excessive CI resources.

We have two options:

  • Use a different, lighter model for CI, which doesn't suit our current testing needs.
  • Skip Ollama integration tests in CI and run them exclusively on our local machines.

For now, we will proceed with Option 2: integration tests involving Ollama will be performed locally and will not run in the CI pipeline.

image

CC: @chungquantin @Lederstrumpf

@chungquantin
Copy link
Contributor Author

Hi @Lederstrumpf, can you test it again?

Copy link
Contributor

@Lederstrumpf Lederstrumpf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @CocDap and @chungquantin,

Thanks for the updates. I appreciate that you followed through on my suggestions - especially the segregation into testnet and mainnet tests.
On a high level review of the changes from elasticlabs-org/polkadot-agent-kit#93, your changes look plausible in spirit.

The rest of my feedback is unfortunately not as positive; hopefully Thom Yorke will mellow it:

A. The Bends

The integration:testnet tests fail for me:
1.

    PolkadotAgentKit Integration with OllamaAgent > should call check_balance tool with correct chain parameter  51262ms
   × PolkadotAgentKit Integration with OllamaAgent > should call transfer_native tool with correct parameters 65403ms
      expected 13892884555414 to be less than 13892884555414
   × PolkadotAgentKit Integration with OllamaAgent > should call xcm_transfer_native_asset tool for Westend to Asset Hub transfer 209811ms
      expected 13892884555414 to be less than 13808401819036
    PolkadotAgentKit Integration with OllamaAgent > should call xcm_transfer_native_asset tool for Asset Hub to Westend transfer  575902ms
    PolkadotAgentKit Integration with OllamaAgent > should call xcm_transfer_native_asset tool for West Asset Hub to West People Chain transfer  257335ms
FAIL  tests/integration-tests/sdk.itest.ts > PolkadotAgentKit Integration with OllamaAgent > should call xcm_transfer_native_asset tool for Westend to Asset Hub transfer
AssertionError: expected { amount: '0.1', …(3) } to match object { amount: '0.1', …(3) }

- Expected
+ Received

  {
    "amount": "0.1",
-   "destChain": "AssetHubWestend",
+   "destChain": "WestendAssetHub",
    "sourceChain": "Westend",
    "to": "5D7jcv6aYbhbYGVY8k65oemM6FVNoyBfoVkuJ5cbFvbefftr",
  }

 ❯ tests/integration-tests/sdk.itest.ts:111:46
    109|
    110|     expect(xcmTransferCall).toBeDefined();
    111|     expect(xcmTransferCall.action.toolInput).toMatchObject({
       |                                              ^
    112|       amount: '0.1',
    113|       to: RECIPIENT2,

I could "fix" these failures by changing:

  1. s/toBeLessThan/toBeLessThanOrEqual/g in sdk.itest.ts
  2. sdk.itest.ts#L91 to:
- const userQuery = `transfer 0.1 WND to ${RECIPIENT2} from Westend to Westend Asset Hub`;
+ const userQuery = `transfer 0.1 WND to ${RECIPIENT2} from Westend to Asset Hub Westend`;

Please address failure 1. correctly, since I'm sure my hack isn't applicable to all cases.
For failure 2. I note that @CocDap has earlier today pushed a similar "fix" elasticlabs-org/polkadot-agent-kit@1233a39 that might get your test to pass.
But I'll insist here that
👏 massaging semantically valid user prompts to mirror the code is NOT an appropriate approach for building an LLM tool intended to turn natural language prompts into valid xcm👏:

B: The words are coming out all weird

The queries

  1. "transfer 0.1 WND to ${RECIPIENT2} from Westend to Westend Asset Hub"
  2. "transfer 0.1 WND to ${RECIPIENT2} from Westend to Asset Hub Westend"
  3. "transfer 0.1 WND to ${RECIPIENT2} from Westend's relay chain to Westend's Asset Hub"
    ...

are all semantically equivalent; all these queries align on the user's intent, and are expressed clearly.
In fact: The "fix" pushed today (changing the user prompt to "transfer 0.1 WND to ${RECIPIENT2} from Westend to Asset Hub") is strictly speaking not even equivalent to these, since it is ambiguous which Asset Hub is getting referred to:

  • Westend's Asset Hub?
  • Polkadot's Asset Hub?
  • Kusama's Asset Hub?
  • Paseo's Asset Hub?

It's most likely Westend's Asset Hub since the other options would require bridging, but this is not unambiguously clear, and we certainly can't assume it's clear for the user.

The failure of parsing the test prompt correctly should tell you that your system prompt for xcm transfer rules is not doing its job, and hence that should be fixed instead of massaging a very reasonable user prompt into a semantically equivalent form that your LLM backend.

If the expectation is that users feed in "natural language" that matches the word ordering of your prompts exactly, then what's the gain for them against simply writing the xcm on their own / using polkadot.js?

C. Where do we go from here?

Note: I have not run your mainnet tests yet since the testnet ones already had issues. So far, I've specifically spelled out the issues I found in your code, but especially given the time put into the evaluation on my part so far, I cannot continue doing so. Generally, it concerns me that W3F is requested to evaluate submissions where the tests don't pass yet.

Important note: I expect that, from now on, all tests pass straight away on my system.
If they do not, then subject to the severity of the issue, I may either

  1. respond with something like "does not work" without providing further context
  2. close the PR

To avoid any false alarms in my next evaluation, I suggest that you

  1. when asking me to review again, also link the exact git revisions of the repos you tested with. Having said that, it is also reasonable to expect that the integration tests continue to pass after said revision on the main branch. If they do not, I will probably assume you're not running the integration tests before pushing changes to the main.
  2. run your tests not only on a MacBook, but also other reasonable LLM hosts (Nvidia/AMD GPUs and CPU Ollama) to have some degree of confidence that your users won't run into significant issues.

@CocDap
Copy link

CocDap commented Sep 30, 2025

1. Problem: Root Cause of Test Case Failure

I've identified the root cause for the failure of the should call xcm_transfer_native_asset tool for Westend to Asset Hub transfer test case.

The issue stems from the qwen3:latest model's inability to correctly map internal chain IDs to their corresponding chain names. Despite being provided with clear conversion rules in the prompt (https://github.com/elasticlabs-org/polkadot-agent-kit/blob/main/packages/llm/src/prompt/index.ts#L113) , the model confuses the chain names, leading to an incorrect tool call and subsequent failure.

2. Solution: Specialized Prompts

To resolve this, I have refactored the prompts. Instead of a single, large prompt for all tools, I have created smaller, specialized prompts for each distinct task (e.g., assets, xcm). This reduces ambiguity and leads to more accurate tool calls, especially for less advanced models.

3. Evidence: Model Performance Comparison

Link Testing: https://github.com/CocDap/polkadot-agent-kit-testing

To demonstrate the reasoning limitations of qwen3:latest compared to more advanced models, I created a test script. The results clearly show the difference:

  • With Ollama qwen3:latest:

    • Query:
    runAgent("transfer 0.1 WND to 5Ccmxb84eREZmtSkrLJSYp6QxJwNvmNbrfBm4p5B5VnKrB8z from Westend to West Asset Hub");
    • Result: Failure
    - Tool Result (xcm_transfer_native_asset): {"content":"{\"success\":false,\"error\":{\"code\":1005,\"message\":\"Unknown error occurred\",\"name\":\"GenericError\"},\"tool\":\"xcm_transfer_native_asset\",\"timestamp\":\"2025-09-26T08:38:40.061Z\"}","tool_call_id":"xcm_transfer_native_asset_error_1758875920061"}
  • With gemini-2.0-flash:

    • Query:
    runAgent("transfer 0.1 WND to 5Ccmxb84eREZmtSkrLJSYp6QxJwNvmNbrfBm4p5B5VnKrB8z from Westend to West Asset Hub", "gemini");
    • Result: Success
    - Tool Result (xcm_transfer_native_asset): {"content":"{\"success\":true,\"data\":\"Tx Hash Successful: 0xef3e18771dcd73baea22fc1fd0e8588e4a90f9fec90e67d33c04865145622eee\",\"tool\":\"xcm_transfer_native_asset\",\"timestamp\":\"2025-09-26T08:31:11.293Z\"}","tool_call_id":"xcm_transfer_native_asset_1758875471293"}

This confirms that while the user's query is correct, a less capable model can produce an incorrect outcome. The solution for these models is to use more specific, less complex prompts.

4. Additional Improvements

Link PR: elasticlabs-org/polkadot-agent-kit#109

  • Hardware Limitations: Regarding testing on other LLM hosts (Nvidia/AMD GPUs, CPU Ollama), we currently do not have the necessary hardware resources to perform these tests.

5. Noted:

Kindly checking the integration tests on this PR (elasticlabs-org/polkadot-agent-kit#109)

Command to run :

pnpm run test:integration:testnet

6. Conclusion

I've spammed the tests a lot, and all were successful. You can check the results on Subscan with this wallet: 5CwaVNAmJ6hWvu9tgxAPgxLD27SK36Vk4hgDcSZGK7z2eYiE

image

CC: @chungquantin @Lederstrumpf

PS: Thank you for your careful review and thoughtful suggestion

slrgax34358453anthonyallen added a commit to slrgax34358453anthonyallen/polkadot-agent-kit that referenced this pull request Oct 10, 2025
@chungquantin
Copy link
Contributor Author

chungquantin commented Oct 10, 2025

Hi @Lederstrumpf, I understand the situation but our team has pushed this proposal so hard and also covered parts that was not in our declared approved proposal. This took us more 2 months for the very first milestone. Could you please take a quick look and see if what is the best way to unblock this? Because this takes time for both curator and developer. And we seriously want to push this project ahead not just blocked by the integration test - which is already covered as can be seen.

Copy link
Contributor

@Lederstrumpf Lederstrumpf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @CocDap (cc. @chungquantin),

Thank you for the updates, and apologies for my delayed re-review.
I appreciate the detailed explanation+justification you provided, the increased testing scope of phrasing variations, and that the instructions for review in your latest comment were much clearer.

The new submission unfortunately also doesn't work: wrt testnet tests, potentially for me, but for the mainnet test, certainly in general too.
Given my delay in responding, I'll give more details as to what's failing than I intended to:

  1. I recognize that the testnet integration tests must have passed for you (assuming on elasticlabs-org/polkadot-agent-kit#109 / elasticlabs-org/polkadot-agent-kit@ca8ba58 based on the test count in the screenshot), but they don't for me, either on elasticlabs-org/polkadot-agent-kit@ca8ba58 or its two followup commits. Please find below logs of all the failures I encountered in said revisions at the bottom of this comment. When exactly did you run the tests again? Asset Hub migration for Paseo was ~ a month before your resubmission (https://forum.polkadot.network/t/asset-hub-migration-2025/11129/71), so I don't believe this is related.

  2. More critical than the testnet integration test failure, the mainnet test doesn't work, and given the changes made in elasticlabs-org/polkadot-agent-kit#109, I must assume you did not run this test again before requesting my re-review.

  3. The correction I pushed for 2. is only the first step. Please also test the observation of that test: the observation observation: {"content":"{\\"success\\":true,\\"data\\":\\"Cross-chain swap failed: 0.1 DOT to USDt from polkadot to polkadot_asset_hub. Error: Cannot read properties of undefined (reading 'otherAssets')\\",\\"tool\\":\\"swap_tokens\\",\\"timestamp\\":\\"2025-10-15T21:11:57.842Z\\"}","tool_call_id":"swap_tokens_1760562717842"} is currently passing and obviously should not.

  4. Please also extend the testing for mainnet operation - that single DOT/USDt swap test is not sufficient (very narrow scope) - I'd expect something comparable to the testnet scope - especially given your recent move to a more "handwired" solution with context-specialized prompts, as you noted in #1270 (comment).


Test logs of 1.:

@CocDap
Copy link

CocDap commented Oct 18, 2025

Hi @chungquantin @Lederstrumpf ,

Thank you for the detailed review and for running the integration tests. I've investigated the failures and would like to provide some context and clarification on the issues you've encountered.

1. Integration Test Failures on PR-109 (elasticlabs-org/polkadot-agent-kit#109)

I believe the test failures you observed are not related to the code in this PR but are due to external factors.

image
  • Paseo Network Instability: I strongly suspect the root cause is the intermittent instability of the Paseo network's RPCs. This isn't just an isolated issue; I have confirmed with a core contributor from the Paraspell project that they have also been experiencing similar network problems.
  • Evidence of Instability: As a specific example, two or 3 days ago, it was impossible to perform an XCM transfer from Asset Hub Paseo to the Paseo relay chain network-wide. Today, that functionality is working again. This confirms that the testnet itself can be unreliable.

Action Requested:

Could you please try running the integration tests for PR #109 again? It is likely the network is stable now.

If the Paseo Asset Hub to Paseo test continues to fail due to network issues, I recommend switching that specific test case to use the more reliable Westend Asset Hub and Westend, as the underlying XCM logic being tested is identical.

  // utils.ts
export const RECIPIENT7 = '5HWWpa7SnP81UdxKGD5neyqMbjRdi324txAFZ2z4LgcBpC21';
export const RECIPIENT8 = '5G8uJUCZMhihBdBMooxBE1pWhmvvyr2nWvaJNUKb6dNiuEjS';
  ...
  describe('2. Parachain to Relay Chain Transfers', () => {
    const parachainToRelayCases = [
      {
        name: 'Westend Asset Hub to Paseo 1 ',
        query: `transfer 0.1 WND to ${RECIPIENT7} from Westend Asset Hub to Westend`,
        recipient: RECIPIENT78,
        sourceChainId: 'west_asset_hub',
        destChainId: 'west',
        expectedSourceChain: 'AssetHubWestend',
        expectedDestChain: 'Westend'
      },
      {
        name: 'Paseo Asset Hub to Paseo 2',
        query: `transfer 0.1 WND to ${RECIPIENT8} from Asset Hub Westend to Westend`,
        recipient: RECIPIENT8,
        sourceChainId: 'west_asset_hub',
        destChainId: 'west',
        expectedSourceChain: 'AssetHubWestend',
        expectedDestChain: 'Westend'
      }
    ];

    parachainToRelayCases.forEach(({ name, query, recipient, sourceChainId, destChainId, expectedSourceChain, expectedDestChain }) => {
      it(`should call xcm_transfer_native_asset tool for "${name}"`, async () => {
        await testXcmTransfer(name, query, recipient, sourceChainId, destChainId, expectedSourceChain, expectedDestChain);
      }, 3500000);
    });
  });

2. Clarification on Mainnet Swap Tests

I saw your notes on the mainnet swap tests failing. I am aware of this issue. However, please note that this functionality is explicitly a requirement for Milestone 2, not Milestone 1.

I will address these swap-related issues in a separate PR as part of the work for the next milestone.

3. Clarification on Failures in the main Branch

You reported a Cannot read properties of undefined (reading 'points') error when running tests on the main branch. This is expected behavior under a specific condition.

  • Root Cause: This error occurs because I recently added integration tests for nomination pools. The tests require the agent's account to already be a member of a pool. If you run the tests with a new agent that has never joined a pool, the test that queries pool information will fail, causing a cascade failure in subsequent tests.

  • Solution (In Progress): I have already developed a fix for this scenario in a separate PR-114(feat: add gemini model choice to integration tests  elasticlabs-org/polkadot-agent-kit#114) . This PR adds logic to handle both cases:

    • If the agent has already joined a pool -> Tests pass.
    • If the agent has never joined a pool -> The agent automatically finds and joins the best pool before proceeding.

However, please note that this enhanced nomination pool logic is also a deliverable for Milestone 2.


To summarize, the issues you've flagged appear to be caused by temporary testnet instability or are related to features that are out of scope for the current milestone. The code within PR #109 is correct and meets all requirements for Milestone 1.

Let me know if you have any other questions. Thank you!

@CocDap
Copy link

CocDap commented Oct 20, 2025

To provide proof for my assertion, I manually observed the balances, and they correctly match the asserted before and after states

    expect(balanceRecipientAfter.data.free).toBeLessThan(balanceRecipientBefore.data.free + amountParsed);

    expect(balanceAgentAfter.data.free).toBeLessThan(balanceAgentBefore.data.free - amountParsed - feeXCM.fee);

My manual check:

Before Balance of Recipient on Paseo
{
  nonce: 2
  consumers: 0
  providers: 1
  sufficients: 0
  data: {
    free: 85,290,420,868
    reserved: 0
    frozen: 0
    flags: 170,141,183,460,469,231,731,687,303,715,884,105,728
  }
}

Before Balance of Agent on Paseo Asset Hub 
{
  nonce: 281
  consumers: 1
  providers: 2
  sufficients: 0
  data: {
    free: 49,521,302,891,008
    reserved: 432,419,832,838
    frozen: 0
    flags: 170,141,183,460,469,231,731,687,303,715,884,105,728
  }
}

Amount Transfer  : 0.1 PAS -> 1000000000

Agent Transfer 0.1 PAS to RECIPIENT from Paseo Asset Hub to Paseo 


Tx Hash : 0xbaba9ad237cfdf6ad83132111f948db159f452e341b068709126f66d843f19db


After Balance of Recipient on Paseo

{
  nonce: 2
  consumers: 0
  providers: 1
  sufficients: 0
  data: {
    free: 86,266,959,406
    reserved: 0
    frozen: 0
    flags: 170,141,183,460,469,231,731,687,303,715,884,105,728
  }
}

After Balance of Agent on Paseo Asset Hub 

{
  nonce: 282
  consumers: 1
  providers: 2
  sufficients: 0
  data: {
    free: 49,519,982,023,644
    reserved: 432,419,832,838
    frozen: 0
    flags: 170,141,183,460,469,231,731,687,303,715,884,105,728
  }
}

My Integration Tests Assertion to check after and before balance 

balance recipient after < balance recipient before + amount 
86,266,959,406 < 85,290,420,868 + 1000000000(86,290,420,868) -> correct 


balance agent after < balance agent before - amount - xcm fee 
49,519,982,023,644 < 49,521,302,891,008 - 1000000000 - 18375000 (49,520,284,516,008) -> correct 

I trust that the core Parity team has already thoroughly tested the before-and-after balance logic also

@Lederstrumpf
Copy link
Contributor

Hi @CocDap,

Thank you for your detailed investigation.

Could you please try running the integration tests for PR #109 again? It is likely the network is stable now.

Thanks - I have tried your commit #109 on Paseo again today and it worked, so I agree with you this must have been a network over the last week (the tests added after said commit still fail for me).

I'll review the rest of your update the coming days - just sharing this quick followup to avoid that you unnecessarily fret about the Paseo test failures anymore. Thanks again for looking into it.

@Lederstrumpf
Copy link
Contributor

Hi @CocDap,

Hopefully my at-most penultimate followup here ;)

  1. Clarification on Mainnet Swap Tests
    I saw your notes on the mainnet swap tests failing. I am aware of this issue. However, please note that this functionality is explicitly a requirement for Milestone 2, not Milestone 1.
    I will address these swap-related issues in a separate PR as part of the work for the next milestone.

Milestone 2 does not mention mainnet tests anywhere and M1.2 promises integration tests with "Thorough test coverage for XCM functionalities" already. So I really read this as being part of milestone 1.
But I'm fine if you only address the mainnet failures in milestone 2 - the timing is not so important since it seems clear to me you'll deliver M2 as well.

  1. Clarification on Failures in the main Branch

Acknowledged - and the nomination pools are indeed only part of milestone 2.


Please find my updated evaluation over here:
You'll note that it now accepts all deliveries except for M1.0b

M1.0b – Quickstart instructions unclear/fragmented

Issue: Quickstart documentation was fragmented and confusing.
Fix: Documentation is being gradually improved to provide clear guidance for developers.

Unfortunately, the docs linked in your PR (https://cocdap.github.io/agent-docs/quickstart.html) have still not been updated - the latest state is still from July: elasticlabs-org/polkadot-agent-kit-docs@a97343d

And your interfaces have changed quite a bit in the meantime - you'll see this once you try running those instructions.
So: please update the quickstart instructions so that they allow a developer to ... well.. get started quickly.
I reckon it makes sense to reuse parts of https://github.com/CocDap/polkadot-agent-kit-testing for this.

Once you update the quickstart instructions and they all work smoothly in my review, I'm happy to approve your PR.

@CocDap
Copy link

CocDap commented Nov 3, 2025

Hi @Lederstrumpf ,

I'm hoping this will be the final round of feedback and fixes for this milestone 😃

I've made several updates based on your feedback.

1. Documentation Updates

Link Docs : elasticlabs-org/polkadot-agent-kit-docs@97dd0f5

  • Regarding your suggestion to use the testing script for the "Basic Usage" section, I've opted to link directly to the full examples (like the Telegram Bot instead. I believe these complete examples provide a more practical and comprehensive starting point for developers.
  • I also added a note clarifying that the agent kit is a customizable set of Polkadot tool-calling APIs designed to be compatible with any LLM provider.

2. Mainnet Integration Test

I have successfully updated and run the mainnet integration tests, including the changes from PR #114.

@polkadot-agent-kit/sdk:test:integration:mainnet: Swap Tokens Query Result (Polkadot → Polkadot Asset Hub): {
@polkadot-agent-kit/sdk:test:integration:mainnet:   input: 'swap 0.2 DOT from Polkadot to USDC on Polkadot Asset Hub',
@polkadot-agent-kit/sdk:test:integration:mainnet:   output: '<think>\n' +
@polkadot-agent-kit/sdk:test:integration:mainnet:     'Okay, let me process this. The user wanted to swap 0.2 DOT from Polkadot to USDC on Polkadot Asset Hub. I used the swap_tokens function with the converted chain names. The response from the tool indicates success with a transaction hash. Now, I need to relay this information back to the user in a clear way.\n' +
@polkadot-agent-kit/sdk:test:integration:mainnet:     '\n' +
@polkadot-agent-kit/sdk:test:integration:mainnet:     "First, confirm the swap was successful. Mention the amount, the currencies, and the chains involved. Include the transaction hash for reference. Also, offer further assistance in case they need anything else. Keep the tone friendly and professional. Make sure the chain names are correctly formatted as per the conversion table. Check if all parameters were correctly applied: from Polkadot, to AssetHubPolkadot, DOT to USDC. Yep, that's all there. Alright, time to put it all together in a concise response.\n" +
@polkadot-agent-kit/sdk:test:integration:mainnet:     '</think>\n' +
@polkadot-agent-kit/sdk:test:integration:mainnet:     '\n' +
@polkadot-agent-kit/sdk:test:integration:mainnet:     'The cross-chain swap of **0.2 DOT** from **Polkadot** to **USDC** on **Polkadot Asset Hub** was successful! 🎉  \n' +
@polkadot-agent-kit/sdk:test:integration:mainnet:     '\n' +
@polkadot-agent-kit/sdk:test:integration:mainnet:     '**Transaction Hash:**  \n' +
@polkadot-agent-kit/sdk:test:integration:mainnet:     '`0x8309d914bbcb3123d9d99aabba63665f2fe2e8176ac54ef183e870458a11aab8`  \n' +
@polkadot-agent-kit/sdk:test:integration:mainnet:     '\n' +
@polkadot-agent-kit/sdk:test:integration:mainnet:     'Let me know if you need further assistance!',
@polkadot-agent-kit/sdk:test:integration:mainnet:   intermediateSteps: [
@polkadot-agent-kit/sdk:test:integration:mainnet:     {
@polkadot-agent-kit/sdk:test:integration:mainnet:       action: [Object],
@polkadot-agent-kit/sdk:test:integration:mainnet:       observation: '{"content":"{\\"success\\":true,\\"data\\":\\"Cross-chain swap successful: 0.2 DOT to USDC from Polkadot to AssetHubPolkadot. Tx Hash: 0x8309d914bbcb3123d9d99aabba63665f2fe2e8176ac54ef183e870458a11aab8\\",\\"tool\\":\\"swap_tokens\\",\\"timestamp\\":\\"2025-11-03T04:01:34.154Z\\"}","tool_call_id":"swap_tokens_1762142494154"}'
@polkadot-agent-kit/sdk:test:integration:mainnet:     }
@polkadot-agent-kit/sdk:test:integration:mainnet:   ],
@polkadot-agent-kit/sdk:test:integration:mainnet:   provider: 'ollama',
@polkadot-agent-kit/sdk:test:integration:mainnet:   model: 'qwen3:latest'
@polkadot-agent-kit/sdk:test:integration:mainnet: }

image

3. How to Run the Tests

To ensure the tests run correctly on your end, please note these two critical requirements:

  1. Node.js Version: You must use Node.js v22 or higher. This is required due to package upgrades in PR114
  2. Test Command: Please use the following command to run the mainnet-specific tests:
    pnpm run test:integration:mainnet

Thanks again for your careful review. Your feedback is helping us build a more stable and approachable SDK for all developers. 🏆

@CocDap
Copy link

CocDap commented Nov 3, 2025

That's great news, thank you for the approval! We just wanted to add a special thanks for the high-quality review. Your @Lederstrumpf feedback was clear, fair, and has definitely helped us improve.
❤️
#1293
CC: @chungquantin

Copy link
Contributor

@Lederstrumpf Lederstrumpf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @CocDap,

Thanks for the final fixes & instructions for them.
As noted in the update of my evaluation, I think the quickstart tutorial should still be improved since it doesn't document how to actually use the tools, neither in the snippet https://cocdap.github.io/agent-docs/quickstart.html#basic-usage, nor in the API navigation https://cocdap.github.io/agent-docs/quickstart.html#tools-call-api-navigation. Both only show how to instantiate them.

I understand that, for this purpose, you added the examples https://cocdap.github.io/agent-docs/examples/. Those are sufficient for me, but in general, the quickstart should actually get the dev from a blank slate to a working environment for building their own tool.

In any case, as noted in the evaluation, I'm fine to defer this to the milestone 2 evaluation, which you have already opened #1299 for.

So, luckily this is my last update on this specific PR - thanks for bearing with me - I'm sure this also felt like pulling teeth for you ;)
I'm confident this will be simpler in future.

@Lederstrumpf Lederstrumpf merged commit ce9d23c into w3f:master Nov 17, 2025
3 checks passed
@github-actions
Copy link

🪙 Please fill out the invoice form in order to initiate the payment process. Please make sure that you follow the instructions and requirements as laid out in the form as well as our Terms & Conditions. Thank you!

@github-actions
Copy link

Congratulations on completing the first milestone of this grant! As part of the Grants Program, we want to help grant recipients acknowledge their grants publicly. To that end, we've created a badge for projects that successfully deliver their first milestone. Please use the badge only in reference to the work that has been completed as part of this grant, so please do not display it on your team or project's homepage unless accompanied by a short description of the grant. Furthermore, you're now welcome to announce the grant publicly. Please remember to observe the foundation's guidelines in doing so. If you haven't already, reach out to [email protected] for feedback on your announcement and cross-promotion.

Thank you for your contribution, and good luck! If you have any remaining milestone, let us know if you encounter any delays by leaving a comment on the application PR or submitting an amendment.

@CocDap
Copy link

CocDap commented Nov 18, 2025

I really appreciate your detailed feedback. The SDK has become significantly more user-friendly and maintainable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants