Skip to content

Conversation

@jdries
Copy link
Collaborator

@jdries jdries commented Sep 15, 2024

To solve still:

  • how to detect authentication settings
  • use aggregate api -> won't do, that api is still on dev and not clear if it works
  • filtering by status doesn't work yet
  • avoid dependency on external stac builder library

@jdries jdries self-assigned this Oct 2, 2024
@VincentVerelst VincentVerelst marked this pull request as ready for review December 5, 2024 17:35
Copy link
Member

@soxofaan soxofaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is quite a large PR and I didn't go through all of it already, just some initial comments

soxofaan added a commit that referenced this pull request Dec 6, 2024
Copy link
Member

@soxofaan soxofaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some more notes

df = pd.DataFrame(series)
if len(series) == 0:
# TODO: What if default columns are overwritten by the user?
df = MultiBackendJobManager._normalize_df(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using private MultiBackendJobManager._normalize_df from get_by_status looks a bit problematic.

It's also used from initialize_from_df, which is ok at the moment, but subject to change (see #667)

Using it in another context than initialize_from_df, like here, seems to indicate that we have to rethink all this initialization business. Not sure yet what to do instead

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed I was hesitant to use it here first, but I see no other way to return an empty dataframe here that has all columns required by the MultiBackendJobManager

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to get the MultiBackendJobManager._normalize_df out of initialize_from_df, might be to do a read, normalize, persist at the start of MultiBackendJobManager.run_jobs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave it as MultiBackendJobManager._normalize_df for now, due to lack of better alternative.
We just have to be sure it's included in changes related to #667.

So in that regard, it's important that this MultiBackendJobManager._normalize_df stuff is properly covered by unit tests, so that we immediately see if we would be breaking something

@VincentVerelst VincentVerelst self-assigned this Dec 12, 2024
@soxofaan
Copy link
Member

I'm a bit short on time to review this deeply, let alone try running some use cases. It's good that half of the PR is unit tests
The PR also just adds a new class and doesn't touch existing code paths. So maybe we can just merge this as is

@VincentVerelst VincentVerelst merged commit 7064cf3 into master Dec 18, 2024
15 checks passed
@VincentVerelst VincentVerelst deleted the stac_jobdb branch December 18, 2024 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants