Skip to content

Handle Uuid FullLoader with yaml de/serializer#310

Merged
unkcpz merged 3 commits intoaiidateam:masterfrom
unkcpz:handle-uuid-yml-in-message-respond
Feb 21, 2025
Merged

Handle Uuid FullLoader with yaml de/serializer#310
unkcpz merged 3 commits intoaiidateam:masterfrom
unkcpz:handle-uuid-yml-in-message-respond

Conversation

@unkcpz
Copy link
Member

@unkcpz unkcpz commented Jan 30, 2025

In #221, the yaml loaders are changed to UnsafeLoader to solve the problem after new (at that time it is new, it has been three years now) PyYAML, by default it won't de/se customize type.

The changes in #221 touched two things, one was the deserializer for bundle type which the change is needed, as detail discussion in aiidateam/aiida-core#3709 where issue was originally manifested.

But the change of using UnsafeLoader also made for the kiwipy message decoder, which is not necessary. The reason that the rmq.test_communications:test_launch_nowait test was hanging is that the response ack message from channel is a PID which has type uuid.UUID. It is not a basic type that covered by yaml decoder.

The more fine tuning change I think should be just adding the UUID representer and constructor explicitly.

Here I also explicitly use UnsafeLoader instead of Loader which are the same, but Loader is for backward compatibility and UnsafeLoader is the new API and give the information that it is unsafe operation.

@superstar54 @giovannipizzi, for the workgraph the unsafe_deserializer is what wrapped for the similar purpose I guess. So I'll let you think about whether you can do the same thing by adding all customize types into the range for the yaml deserializer to avoid unsafe keyword. I think for an API exposed to end user, it should never "unsafe" in terms of data/computer security, but the "unsafe" is left for developers to find out where in the source code that cause memory issue or potential security issue.

@superstar54
Copy link
Member

@unkcpz , thanks for ping me. I will check the usage of the unsafe loader in WorkGraph.

@superstar54
Copy link
Member

Hi @unkcpz , is this PR ready for review?

@unkcpz
Copy link
Member Author

unkcpz commented Feb 18, 2025

Yes, it is ready. It would be nice if you can review it @superstar54

@superstar54 superstar54 self-requested a review February 18, 2025 19:47
@superstar54
Copy link
Member

Could you clarify why UnsafeLoader is still necessary despite the UUID constructor being added?

@unkcpz
Copy link
Member Author

unkcpz commented Feb 19, 2025

Could you clarify why UnsafeLoader is still necessary despite the UUID constructor being added?

The uuid fix is for the loader of message brokers, where the message it self is se/de over the wire. The default loader in plumpy is the safe one. While the UnsafeLoader here is for checkpoints items.

@superstar54
Copy link
Member

The uuid fix is for the loader of message brokers, where the message it self is se/de over the wire. The default loader in plumpy is the safe one.

thanks! One thing is not clear to me: if the default loader in the plugin is the safe one, why is there no error before this PR. The safe loader will raise an error if the uuid is not serializable.

message_exchange=message_exchange,
task_exchange=task_exchange,
task_queue=task_queue,
decoder=functools.partial(yaml.load, Loader=yaml.Loader),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! One thing is not clear to me: if the default loader in the plugin is the safe one, why is there no error before this PR. The safe loader will raise an error if the uuid is not serializable.

@superstar54 This line matters. If I didn't add the uuid_representer and uuid_constructor, after remove this unsafeloader decoder. The test hang because it can not deserialize the message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@superstar54 I guess your previous comment was gonna to reply this one?

I updated, by passing explicitly the decoder and encoder. Here actually it is a test, not the real code path used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still not clear to me. I am guessing you are adding the uuid_representer and uuid_constructor for safe load. But where is the test for them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uuid_representer and uuid_constructor

If I didn't add these and I remove the decoder line in loop_communicator, the test will fail (hang actually, because RMQ never get the correct message).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are adding a constructor, and your goal is to have the safe load, as suggested in my comment, it would be good to use yaml.SafeLoader instead of yaml.FullLoader, unless there is still another message type that also needs a constructor.

If there is another message type, you can add a constructor for it, as you already did for the uuid. It would be good if you could make a list of the possible message types.

If you think yaml.FullLoader is already safe enough, I am also fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think yaml.FullLoader is already safe enough, I am also fine.

The scope if this PR is to revert the original unnecessary change and using the default loader of kiwipy.
The plan for the message passing part is to use msgpack to replace. So for this one I'll not changing the test above.

Copy link
Member

@superstar54 superstar54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification! I would suggest explicitly using the yaml.SafeLoader, to show it is safe, as well as easy to understand which representers and constructors are used.

Code similar to

class PlumpySaveLoader(yaml.SafeLoader):
    """"""

PlumpySaveLoader.add_representer(uuid.UUID, uuid_representer)
PlumpySaveLoader.add_constructor('!uuid', uuid_constructor)

decoder=functools.partial(yaml.load, Loader=PlumpySaveLoader),

@unkcpz unkcpz requested a review from superstar54 February 20, 2025 11:51
Copy link
Member

@superstar54 superstar54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@unkcpz unkcpz merged commit 98b3e93 into aiidateam:master Feb 21, 2025
6 of 7 checks passed
@unkcpz unkcpz deleted the handle-uuid-yml-in-message-respond branch February 21, 2025 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants