[feat] add Serializer for rpc server #566

CormickKneey · 2025-11-12T10:26:55Z

Description

Implement Flask-based RPC server with serialization and corresponding tests

Related Issue

Task Issue

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not
work as expected)
Documentation update
Code refactoring (no functional changes)
Performance improvement
Test coverage improvement

Checklist

I have read the Contributing Guide
I have run formatting tools (pre-commit or manual)
I have run relevant unit tests and they pass
I have added tests for new functionality
I have updated documentation if needed
My branch is up to date with main
This PR introduces breaking changes (if yes, fill out details below)
If this PR changes documentation, I have built and previewed it locally with
jb build docs
No critical issues raised by AI reviewers (/gemini review)

Breaking Change Details (if applicable):

Additional Context

Need help? Check the Contributing Guide or ask in
GitHub Discussions!

garrett4wade

Thank you for the contribution! The implementation looks good overall, but we should take special care about the error handling and data serialization support.

areal/scheduler/rpc/app.py

areal/scheduler/rpc/serializer.py

garrett4wade · 2025-11-12T12:24:15Z

areal/scheduler/rpc/serializer.py

+# Custom extension type codes for msgspec
+CUSTOM_TYPE_PICKLE = 0
+CUSTOM_TYPE_CLOUDPICKLE = 1
+CUSTOM_TYPE_RAW_VIEW = 2


Create an enum class for this.

areal/scheduler/rpc/serializer.py

garrett4wade · 2025-11-14T02:46:24Z

areal/scheduler/rpc/serializer.py

+        with tempfile.TemporaryDirectory() as tmpdir:
+            tokenizer.save_pretrained(tmpdir)


Can we just use the original model path from which the tokenizer is loaded? A temp dir may not be synchronized across different nodes.

If we must use temp dir, ensure that it locates under cluster.fileroot. See cli_args.py for details.

Actually, the Tokenizer be serialized here may be modified by some internal logic after creation, like: calling add_tokens() , add_special_tokens .
So my solution is: save it into the a temp directory and package it with zip into the IO to be sent.

garrett4wade · 2025-11-14T03:45:14Z

areal/scheduler/rpc/serializer.py

+            compression = (
+                zipfile.ZIP_STORED if total_size < 512 * 1024 else zipfile.ZIP_DEFLATED
+            )
+            with zipfile.ZipFile(
+                zip_buffer, "w", compression=compression, compresslevel=6
+            ) as zf:
+                for root, _, files in os.walk(tmpdir):
+                    for f in files:
+                        zf.write(
+                            os.path.join(root, f),
+                            arcname=os.path.relpath(os.path.join(root, f), tmpdir),
+                        )


I don't think zip-unzip will decrease loading time though.

Yeah... Let me find a replacement~

garrett4wade · 2025-11-14T05:29:05Z

areal/scheduler/rpc/server.py

+            else:
+                return Response(
+                    success=False, message=f"Unknown engine name: {config.class_name}"
+                )


On the client side, we can automatically convert the class to string because they are all built-in classes in AReaL. The server can accept string path and run dynamic importing. Using enums and if-else is not very extensible.

FYI I had implemented such logic in PR #528: https://github.com/inclusionAI/AReaL/pull/528/files#diff-b441cd1ab2d7679b06541869f7a78cc4330650c65ff8dff07bbfe5b568fd337eR124-R130. You don't have to following my original implementation, but we'd better not drop any error handling and extensibility designs.

CormickKneey · 2025-11-14T07:56:03Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new Flask-based RPC server and client, along with a custom serializer for handling complex data types like PyTorch tensors. While the new components are well-structured and the serializer itself is well-tested, there's a critical issue: the new serializer is not actually integrated into the RPC client and server. This means the new RPC implementation currently only supports JSON-serializable types, and will fail when trying to transmit complex objects like tensors, which seems to defeat the main purpose of introducing a custom serializer. I've provided detailed comments on this and a few other points.

areal/scheduler/rpc/client.py

areal/scheduler/rpc/serializer.py

areal/tests/test_rpc_integration.py

pyproject.toml

CormickKneey · 2025-11-17T02:14:36Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant improvement by refactoring the RPC server to use Flask and implementing a robust serialization mechanism for complex data types. The new serialization module is well-designed and thoroughly tested. The RPC server is now more structured with clear endpoints and better error handling. I have a few suggestions regarding performance, maintainability, and robustness that could further enhance the implementation.

areal/scheduler/rpc/rpc_server.py

areal/scheduler/rpc/serialization.py

garrett4wade reviewed Nov 12, 2025

View reviewed changes

CormickKneey force-pushed the refactor/rpc_framework branch 2 times, most recently from 7640d4d to 4c7e433 Compare November 13, 2025 11:55

garrett4wade reviewed Nov 14, 2025

View reviewed changes

CormickKneey force-pushed the refactor/rpc_framework branch 3 times, most recently from 8a98f89 to db035ec Compare November 14, 2025 07:48

CormickKneey marked this pull request as ready for review November 14, 2025 07:56

gemini-code-assist bot reviewed Nov 14, 2025

View reviewed changes

CormickKneey force-pushed the refactor/rpc_framework branch from 421dd56 to 8213732 Compare November 16, 2025 14:29

refactor server

4c17880

CormickKneey force-pushed the refactor/rpc_framework branch from 8213732 to d4fe507 Compare November 16, 2025 14:31

merge: integrate changes from f67dd60

616f6de

CormickKneey force-pushed the refactor/rpc_framework branch from d4fe507 to 616f6de Compare November 16, 2025 15:04

gemini-code-assist bot reviewed Nov 17, 2025

View reviewed changes

fix for cr

2f72bb6

CormickKneey force-pushed the refactor/rpc_framework branch from 59f60b0 to 2f72bb6 Compare November 17, 2025 03:09

		with tempfile.TemporaryDirectory() as tmpdir:
		tokenizer.save_pretrained(tmpdir)

[feat] add Serializer for rpc server #566

Are you sure you want to change the base?

[feat] add Serializer for rpc server #566

Conversation

CormickKneey commented Nov 12, 2025

Description

Related Issue

Type of Change

Checklist

Additional Context

Uh oh!

garrett4wade left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

garrett4wade Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

garrett4wade Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

CormickKneey Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

garrett4wade Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

CormickKneey Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

garrett4wade Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

CormickKneey commented Nov 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CormickKneey commented Nov 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants