Handle Uuid FullLoader with yaml de/serializer#310
Conversation
|
@unkcpz , thanks for ping me. I will check the usage of the unsafe loader in WorkGraph. |
|
Hi @unkcpz , is this PR ready for review? |
|
Yes, it is ready. It would be nice if you can review it @superstar54 |
|
Could you clarify why |
The uuid fix is for the loader of message brokers, where the message it self is se/de over the wire. The default loader in plumpy is the safe one. While the |
thanks! One thing is not clear to me: if the default loader in the plugin is the safe one, why is there no error before this PR. The safe loader will raise an error if the uuid is not serializable. |
| message_exchange=message_exchange, | ||
| task_exchange=task_exchange, | ||
| task_queue=task_queue, | ||
| decoder=functools.partial(yaml.load, Loader=yaml.Loader), |
There was a problem hiding this comment.
thanks! One thing is not clear to me: if the default loader in the plugin is the safe one, why is there no error before this PR. The safe loader will raise an error if the uuid is not serializable.
@superstar54 This line matters. If I didn't add the uuid_representer and uuid_constructor, after remove this unsafeloader decoder. The test hang because it can not deserialize the message.
There was a problem hiding this comment.
@superstar54 I guess your previous comment was gonna to reply this one?
I updated, by passing explicitly the decoder and encoder. Here actually it is a test, not the real code path used.
There was a problem hiding this comment.
It's still not clear to me. I am guessing you are adding the uuid_representer and uuid_constructor for safe load. But where is the test for them?
There was a problem hiding this comment.
uuid_representer and uuid_constructor
If I didn't add these and I remove the decoder line in loop_communicator, the test will fail (hang actually, because RMQ never get the correct message).
There was a problem hiding this comment.
Since you are adding a constructor, and your goal is to have the safe load, as suggested in my comment, it would be good to use yaml.SafeLoader instead of yaml.FullLoader, unless there is still another message type that also needs a constructor.
If there is another message type, you can add a constructor for it, as you already did for the uuid. It would be good if you could make a list of the possible message types.
If you think yaml.FullLoader is already safe enough, I am also fine.
There was a problem hiding this comment.
If you think yaml.FullLoader is already safe enough, I am also fine.
The scope if this PR is to revert the original unnecessary change and using the default loader of kiwipy.
The plan for the message passing part is to use msgpack to replace. So for this one I'll not changing the test above.
superstar54
left a comment
There was a problem hiding this comment.
Thanks for the clarification! I would suggest explicitly using the yaml.SafeLoader, to show it is safe, as well as easy to understand which representers and constructors are used.
Code similar to
class PlumpySaveLoader(yaml.SafeLoader):
""""""
PlumpySaveLoader.add_representer(uuid.UUID, uuid_representer)
PlumpySaveLoader.add_constructor('!uuid', uuid_constructor)
decoder=functools.partial(yaml.load, Loader=PlumpySaveLoader),
In #221, the yaml loaders are changed to UnsafeLoader to solve the problem after new (at that time it is new, it has been three years now) PyYAML, by default it won't de/se customize type.
The changes in #221 touched two things, one was the deserializer for bundle type which the change is needed, as detail discussion in aiidateam/aiida-core#3709 where issue was originally manifested.
But the change of using
UnsafeLoaderalso made for the kiwipy message decoder, which is not necessary. The reason that thermq.test_communications:test_launch_nowaittest was hanging is that the response ack message from channel is a PID which has typeuuid.UUID. It is not a basic type that covered by yaml decoder.The more fine tuning change I think should be just adding the UUID representer and constructor explicitly.
Here I also explicitly use
UnsafeLoaderinstead ofLoaderwhich are the same, but Loader is for backward compatibility andUnsafeLoaderis the new API and give the information that it is unsafe operation.@superstar54 @giovannipizzi, for the workgraph the
unsafe_deserializeris what wrapped for the similar purpose I guess. So I'll let you think about whether you can do the same thing by adding all customize types into the range for the yaml deserializer to avoidunsafekeyword. I think for an API exposed to end user, it should never "unsafe" in terms of data/computer security, but the "unsafe" is left for developers to find out where in the source code that cause memory issue or potential security issue.