-
Notifications
You must be signed in to change notification settings - Fork 15
refactor!: Make Actor
initialization stricter and more predictable
#576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nforce explicit configuration adn services
9a3543f
to
8bd59fd
Compare
Actor
initialization stricter and more predictable
await proxy_configuration.initialize() | ||
|
||
assert len(caplog.records) == 1 | ||
assert len(caplog.records) == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Soo what was added here? 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will throw this warning:
"'No configuration set, implicitly creating and using default Configuration.'"
Which is fine. Normally, the proxy configuration would be used within the context of the Actor and it would set the configuration.
It still works even this way, it just warns that no one set the configuration before and thus it is doing it now implicitly.
config = Configuration.get_global_configuration() | ||
|
||
async with ApifyEventManager(config): | ||
async with ApifyEventManager(Configuration()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary, but I wanted to avoid Configuration.get_global_configuration() as it warns that it is setting the implicit default configuration as no configuration was set explicitly.
I can also properly set the configuration before the test or just pass the configuration directly and avoid the service_locator related side effects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first batch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another batch (from our discussion), I'll wait for your updates now.
name: str | None = None, | ||
alias: str | None = None, | ||
configuration: Configuration | None = None, | ||
configuration: CrawleeConfiguration | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I might just be out of the loop, but wasn't this supposed to work with some "stored" configuration so that we could just remove this parameter? Is there some reason why we need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that approach initially, but that led to more problems related to the order in which services can be set in the service locator. Since one service is dependent on another service, but they can be set independently...
So imagine you do:
service_locator.set_configuration(Configuration1)
service_locator.set_storage_client(SomeStorageClient(Configuration2))
What now? Better to prevent this from even happening
return cls.from_configuration(global_configuration) | ||
|
||
@classmethod | ||
def from_configuration(cls, configuration: CrawleeConfiguration) -> Configuration: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result should be cached so that two calls to get_global_configuration always return the exact same object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a warning to the only potential path where this could cause a problem. If someone gets there, it is not by intention, but for the sake of backward compatibility, it will still work in most cases as expected.
src/apify/_actor.py
Outdated
|
||
@cached_property | ||
def charging_manager_implementation(self) -> ChargingManagerImplementation: | ||
return ChargingManagerImplementation(self.config, self.apify_client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should call _raise_if_not_initialized too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/apify/_actor.py
Outdated
"""Retrieve the charging manager to access granular pricing information.""" | ||
self._raise_if_not_initialized() | ||
return self._charging_manager | ||
return self.charging_manager_implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this method now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Public usage of charging_manager should be allowed only if Actor is initialized and Actor is not initialized before await self._charging_manager_implementation.__aenter__()
Co-authored-by: Vlada Dusek <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Actor
are initialized inasync init,
not in__init__
.Actor
is considered finalized afterActor.init
was run. This also means that the same configuration used by theActor
is set in the globalservice_locator
.service_locator
before theActor.init
service_locator
and set it throughActor.(configuration=...)
and runningActor.init()
service_locator
and no configuration passed toActor
will create and set implicit default configurationApifyFileSystemStorageClient
as local client to support pre-existing input file.ApifyStorageClient
based ontoken
andapi_public_url
and update NDU storage handling.Issues
Rated to: #513, #590
Testing