-
Notifications
You must be signed in to change notification settings - Fork 47
add job database implementation that uses stac #619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
soxofaan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is quite a large PR and I didn't go through all of it already, just some initial comments
to prepare for future extensions, e.g. #619
soxofaan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some more notes
| df = pd.DataFrame(series) | ||
| if len(series) == 0: | ||
| # TODO: What if default columns are overwritten by the user? | ||
| df = MultiBackendJobManager._normalize_df( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using private MultiBackendJobManager._normalize_df from get_by_status looks a bit problematic.
It's also used from initialize_from_df, which is ok at the moment, but subject to change (see #667)
Using it in another context than initialize_from_df, like here, seems to indicate that we have to rethink all this initialization business. Not sure yet what to do instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed I was hesitant to use it here first, but I see no other way to return an empty dataframe here that has all columns required by the MultiBackendJobManager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way to get the MultiBackendJobManager._normalize_df out of initialize_from_df, might be to do a read, normalize, persist at the start of MultiBackendJobManager.run_jobs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave it as MultiBackendJobManager._normalize_df for now, due to lack of better alternative.
We just have to be sure it's included in changes related to #667.
So in that regard, it's important that this MultiBackendJobManager._normalize_df stuff is properly covered by unit tests, so that we immediately see if we would be breaking something
|
I'm a bit short on time to review this deeply, let alone try running some use cases. It's good that half of the PR is unit tests |
To solve still: