-
Notifications
You must be signed in to change notification settings - Fork 149
[DEV-12772] - Treasury Account Spark Downloads #4429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DEV-12772] - Treasury Account Spark Downloads #4429
Conversation
…into ftr/dev-11771-spark-account-download-from-table
…park-account-download-from-table [DEV-11771] - Spark account download from unfiltered table
…nflitered-account-download-delta-table [DEV-11770] [DEV-11771] - Add account_download create and load commands, Add generate spark download command
…ccount-download-delta-to-postgres [DEV-12234] - Account Download delta to postgres
…able-based-account-download [DEV-12235] - Add generate_postgres_download command
…12574-account-download-dynamic-filters
…ccount-download-dynamic-filters [DEV-12574] - Spark Account Download with Dynamic Filters
…dev-12772-treasury-account-download
|
|
||
| @property | ||
| @abstractmethod | ||
| def source_df(self) -> DataFrame: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Black and Flake8 disagree on this. I vote we override flake8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More detail here: psf/black#4173
…uilder to handle multiple submission types
…hub.com/fedspendingtransparency/usaspending-api into ftr/dev-12772-treasury-account-download
sethstoudenmier
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving because the changes look good to me, but I do have a conceptual question about structure as build this out more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only question I have is around file structure and maybe you have already thought about this. The File A and B downloads will be different columns than the File C (Award Financial). This means that ideally, they could re-use the same dataframe filter logic, but the columns will differ. This gets me to my actual question:
Should we have this structured such that the Abstract classes are broken apart from the specific download implementations? For example, right now you have the Federal and Treasury account implementations in this file for the File C download. But if File A and B are going to need different implementations, would we want to group them into this same file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The organization of these classes feels like it could evolve. It seems like the overall decision tree looks something like
graph LR;
A((Account Level)) --> B[Federal Account];
A((Account Level)) --> C[Treasury Account];
B[Federal Account] --> D((Submission Type));
C[Treasury Account] --> E((Submission Type));
D((Submission Type)) --> F[Account Download];
D((Submission Type)) --> G[Object Class];
D((Submission Type)) --> H[Award Financial];
E((Submission Type)) --> I[Account Download];
E((Submission Type)) --> J[Object Class];
E((Submission Type)) --> K[Award Financial];
In some of my WIP on File A/B downloads I have this on the abstract class:
@property
@abstractmethod
def account_balances(self) -> DataFrame: ...
@property
@abstractmethod
def object_class_program_activity(self) -> DataFrame: ...
@property
@abstractmethod
def award_financial(self) -> DataFrame: ...
@property
def source_dfs(self) -> list[DataFrame]:
return [getattr(self, submission_type) for submission_type in self.submission_types]Many of the filtering methods and other useful functions can be reused across the different submission types.
Description:
This PR adds Treasury Account Downloads to the generate spark downloads command.
Technical details:
The treasury account downloads were able to use the account download delta table with the addition of a few columns specific to the treasury download.
Requirements for PR merge: