-
Notifications
You must be signed in to change notification settings - Fork 205
Description
Describe the bug
When an IdGenerator is created from a JSON big enough, the IdGenerator actor creation may take a long time, as the constructor parameters are slow to serialize/deserialize. So long that the actor is not constructed and registered when used in the rest of the pipeline, such as FuzzyDeduplicationWorkflow.
Steps/Code to reproduce bug
Loading a ~3M json serialization file reproduces the issue 90% of the time on my AnyScale/GCS setup. Error happens next time the IdGenerator is used, with the error "Failed to look up actor with name 'curator_deduplication_id_generator'".
Expected behavior
As the actor was created before, that should not happen.
Additional context
The error is the following: actor creation on Ray is not synchronous. When IdGenerator.remote(....) is called, the actor will be registered in Ray only when the constructor is done. If one search for the actor too soon, the call could fail. Usual fix is to add a function def ready() -> bool: return True in the actor, and calls that before returning the function. Will send a fix soon.