fedspendingtransparency · zachflanders-frb · Jul 9, 2025 · Apr 10, 2025 · Apr 10, 2025 · Jun 2, 2025
@@ -1,27 +1,27 @@
-**Description:**
-High level description of what the PR addresses should be put here. Should be detailed enough to communicate to a PO what this PR addresses without diving into the technical nuances
+## Description:
+<!-- High level description of what the PR addresses should be put here. Should be detailed enough to communicate to a PO what this PR addresses without diving into the technical nuances. -->
 
-**Technical details:**
-The technical details for the knowledge of other developers. Any detailed caveats or specific deployment steps should be outlined here.
 
-**Requirements for PR merge:**
+
+## Technical Details:
+<!-- The technical details for the knowledge of other developers. Any detailed caveats or specific deployment steps should be outlined here. -->
+
+
+
+## Requirements for PR Merge:
+<!-- Items that aren't relevant should be marked as N/A and explained below as needed. -->
 
 1. [ ] Unit & integration tests updated
-2. [ ] API documentation updated
-3. [ ] Necessary PR reviewers:
-    - [ ] Backend
-    - [ ] Frontend <OPTIONAL>
-    - [ ] Operations <OPTIONAL>
-    - [ ] Domain Expert <OPTIONAL>
-4. [ ] Matview impact assessment completed
-5. [ ] Frontend impact assessment completed
-6. [ ] Data validation completed
-7. [ ] Appropriate Operations ticket(s) created
-8. [ ] Jira Ticket [DEV-123](https://federal-spending-transparency.atlassian.net/browse/DEV-123):
-    - [ ] Link to this Pull-Request
-    - [ ] Performance evaluation of affected (API | Script | Download)
-    - [ ] Before / After data comparison
-
-**Area for explaining above N/A when needed:**
-```
-```
+2. [ ] API documentation updated (examples listed below)
+    1. API Contracts
+    2. API UI
+    3. Comments
+3. [ ] Data validation completed (examples listed below)
+    1. Does this work well with the current frontend? Or is the frontend aware of a needed change?
+    2. Is performance impacted in the changes (e.g., API, pipeline, downloads, etc.)?
+    3. Is the expected data returned with the expected format?
+4. [ ] Appropriate Operations ticket(s) created
+5. [ ] Jira Ticket(s)
+    1. [DEV-0](https://federal-spending-transparency.atlassian.net/browse/DEV-0)
+
+### Explain N/A in above checklist:
diff --git a/.github/workflows/pull-request-and-review-updates.yaml b/.github/workflows/pull-request-and-review-updates.yaml
@@ -0,0 +1,27 @@
+name: Pull Request and Review Updates
+
+on:
+  pull_request:
+    types: [opened]
+  pull_request_review:
+    types: [submitted]
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number }}-${{ github.actor_id }}
+  cancel-in-progress: true
+
+jobs:
+  Update-Pull-Request-Assignees:
+    name: Update Pull Request Assignees
+    runs-on: ${{ vars.RUNNER_VERSION }}
+    steps:
+      - name: Update Assignee
+        uses: actions/github-script@v7
+        with:
+          script: |
+            github.rest.issues.addAssignees({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              issue_number: context.issue.number,
+              assignees: [context.actor]
+            });
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # <p align="center"><img src="https://www.usaspending.gov/img/[email protected]" alt="USAspending API"></p>
 
-[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black) [![Pull Request Checks](https://github.com/fedspendingtransparency/usaspending-api/actions/workflows/pull-request-checks.yaml/badge.svg)](https://github.com/fedspendingtransparency/usaspending-api/actions/workflows/pull-request-checks.yaml) [![Test Coverage](https://codeclimate.com/github/fedspendingtransparency/usaspending-api/badges/coverage.svg)](https://codeclimate.com/github/fedspendingtransparency/usaspending-api/coverage) [![Code Climate](https://codeclimate.com/github/fedspendingtransparency/usaspending-api/badges/gpa.svg)](https://codeclimate.com/github/fedspendingtransparency/usaspending-api)
+[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black) [![Pull Request Checks](https://github.com/fedspendingtransparency/usaspending-api/actions/workflows/pull-request-checks.yaml/badge.svg?branch=staging)](https://github.com/fedspendingtransparency/usaspending-api/actions/workflows/pull-request-checks.yaml) [![Test Coverage](https://codeclimate.com/github/fedspendingtransparency/usaspending-api/badges/coverage.svg)](https://codeclimate.com/github/fedspendingtransparency/usaspending-api/coverage) [![Code Climate](https://codeclimate.com/github/fedspendingtransparency/usaspending-api/badges/gpa.svg)](https://codeclimate.com/github/fedspendingtransparency/usaspending-api)
 
 _This API is utilized by USAspending.gov to obtain all federal spending data which is open source and provided to the public as part of the DATA Act._
 

diff --git a/requirements/requirements-app.txt b/requirements/requirements-app.txt
@@ -2,6 +2,7 @@ asyncpg==0.29.*
 attrs==23.2.*
 boto3==1.34.*
 certifi==2024.7.4
+databricks-sdk==0.44.1  # Pinned because newer versions use v2.2 of the API which is not supported by PVC
 dataclasses-json==0.6.*
 dj-database-url==2.1.0
 django-cors-headers==4.3.*
@@ -28,6 +29,7 @@ psutil==5.9.*
 psycopg2==2.9.9  # Pinning exact version because this package will drop support for Python versions in patches
 py-gfm==2.0.0
 pydantic[dotenv]==1.9.*
+python-dateutil==2.9.*
 python-json-logger==2.0.7
 requests==2.31.*
 retrying==1.3.4

diff --git a/usaspending_api/api_contracts/contracts/v2/download/accounts.md b/usaspending_api/api_contracts/contracts/v2/download/accounts.md
@@ -21,8 +21,8 @@ Generate files and return metadata using filters on custom account
         + `account_level` (required, enum[string])
             The account level is used to filter for a specific type of file.
             + Members
-                + `treasury_account`
                 + `federal_account`
+                + `treasury_account`
         + `file_format` (optional, enum[string])
             The format of the file(s) in the zip file containing the data.
             + Default: `csv`
@@ -87,9 +87,20 @@ Generate files and return metadata using filters on custom account
 + `agency` (optional, string)
     The agency on which to filter.  This field expects an internal toptier agency identifier also known as the `toptier_agency_id`.
     + Default: `all`
++ `budget_function` (optional, string)
+    The budget function code on which to filter.
++ `budget_subfunction` (optional, string)
+    The budget subfunction code on whicn to filter
 + `federal_account`(optional, string)
     This field is an internal id.
-+ `submission_types` (required, array)
++ `submission_type` (optional, enum[string])
+    Either `submission_type` or `submission_types` is required.
+    + Members
+        + `account_balances`
+        + `object_class_program_activity`
+        + `award_financial`
++ `submission_types` (optional, array)
+    Either `submission_type` or `submission_types` is required.
     + (enum[string])
         + `account_balances`
         + `object_class_program_activity`

diff --git a/usaspending_api/awards/management/commands/generate_unlinked_awards_download.py b/usaspending_api/awards/management/commands/generate_unlinked_awards_download.py
@@ -144,10 +144,10 @@ def process_data_copy_jobs(self, zip_file_path):
             sql_file = None
             final_path = self._create_data_csv_dest_path(final_name)
             intermediate_data_file_path = final_path.parent / (final_path.name + "_temp")
-            data_file_names, count = self.download_to_csv(
+            download_metadata = self.download_to_csv(
                 sql_file, final_path, final_name, str(intermediate_data_file_path), zip_file_path, df
             )
-            if count <= 0:
+            if download_metadata.number_of_rows <= 0:
                 logger.warning(f"Empty data file generated: {final_path}!")
 
             self.filepaths_to_delete.extend(self.working_dir_path.glob(f"{final_path.stem}*"))
@@ -159,7 +159,7 @@ def complete_zip_and_upload(self, zip_file_path):
             upload_download_file_to_s3(zip_file_path, settings.UNLINKED_AWARDS_DOWNLOAD_REDIRECT_DIR)
             logger.info("Marking zip file for deletion in cleanup")
         else:
-            logger.warn("Not uploading zip file to S3. Leaving file locally")
+            logger.warning("Not uploading zip file to S3. Leaving file locally")
             self.filepaths_to_delete.remove(zip_file_path)
 
     @property

diff --git a/usaspending_api/broker/management/commands/update_table_value_from_broker.py b/usaspending_api/broker/management/commands/update_table_value_from_broker.py
@@ -30,7 +30,7 @@ def add_arguments(self, parser):
         parser.add_argument(
             "--load-field-type",
             type=str,
-            required=True,
+            required=False,
             default="text",
             help="Postgres data type of the field that will be copied from Broker",
         )
@@ -122,24 +122,28 @@ def run_update(self, min_id: int, max_id: int) -> None:
                                 'broker_server','(
                                     SELECT {self.broker_match_field}, {self.broker_load_field}
                                     FROM {self.broker_table_name}
+                                    WHERE
+                                        {self.broker_match_field} >= {chunk_min_id}
+                                        AND {self.broker_match_field} <= {chunk_max_id}
                                 )') AS broker_table
                                      (
                                           lookup_id bigint,
                                           load_field {self.load_field_type}
                                      )
-                            WHERE usas_table.{self.usas_match_field} = broker_table.lookup_id
+                            WHERE
+                                usas_table.{self.usas_match_field} = broker_table.lookup_id
                         ;
                         """
                     )
 
                 row_count = cursor.rowcount
                 total_row_count += row_count
                 ratio = (chunk_max_id - min_id + 1) / estimated_id_count
-                logging.info(
+                logger.info(
                     f'Updated {row_count:,d} rows with "{self.usas_match_field}" between {chunk_min_id:,d} and {chunk_max_id:,d}.'
                     f" Estimated time remaining: {timer.estimated_remaining_runtime(ratio)}"
                 )
-        logging.info(
+        logger.info(
             f'Finished updating {total_row_count:,d} rows for "{self.usas_table_name}"."{self.usas_load_field}" '
             f"in {timer}"
         )

diff --git a/usaspending_api/common/etl/spark.py b/usaspending_api/common/etl/spark.py
@@ -6,46 +6,47 @@
 """
 
 import logging
-
-from itertools import chain
-from typing import List
-from pyspark.sql.functions import to_date, lit, expr, concat, concat_ws, col, regexp_replace, transform, when
-from pyspark.sql.types import StructType, DecimalType, StringType, ArrayType
-from pyspark.sql import DataFrame, SparkSession
 import time
 from collections import namedtuple
+from itertools import chain
+from typing import List
+
 from py4j.protocol import Py4JError
+from pyspark.sql import DataFrame, SparkSession
+from pyspark.sql.functions import col, concat, concat_ws, expr, lit, regexp_replace, to_date, transform, when
+from pyspark.sql.types import ArrayType, DecimalType, StringType, StructType
 
 from usaspending_api.accounts.models import FederalAccount, TreasuryAppropriationAccount
-from usaspending_api.config import CONFIG
 from usaspending_api.common.helpers.spark_helpers import (
+    get_broker_jdbc_url,
     get_jdbc_connection_properties,
     get_usas_jdbc_url,
-    get_broker_jdbc_url,
 )
+from usaspending_api.config import CONFIG
+from usaspending_api.download.filestreaming.download_generation import EXCEL_ROW_LIMIT
 from usaspending_api.financial_activities.models import FinancialAccountsByProgramActivityObjectClass
 from usaspending_api.recipient.models import StateData
 from usaspending_api.references.models import (
-    Cfda,
-    Agency,
-    ToptierAgency,
-    SubtierAgency,
+    CGAC,
     NAICS,
-    Office,
     PSC,
-    RefCountryCode,
+    Agency,
+    Cfda,
     CityCountyStateCode,
-    PopCounty,
-    PopCongressionalDistrict,
     DisasterEmergencyFundCode,
-    RefProgramActivity,
-    ObjectClass,
     GTASSF133Balances,
-    CGAC,
+    ObjectClass,
+    Office,
+    PopCongressionalDistrict,
+    PopCounty,
+    RefCountryCode,
+    RefProgramActivity,
+    SubtierAgency,
+    ToptierAgency,
+    ZipsGrouped,
 )
 from usaspending_api.reporting.models import ReportingAgencyMissingTas, ReportingAgencyOverview
-from usaspending_api.submissions.models import SubmissionAttributes, DABSSubmissionWindowSchedule
-from usaspending_api.download.filestreaming.download_generation import EXCEL_ROW_LIMIT
+from usaspending_api.submissions.models import DABSSubmissionWindowSchedule, SubmissionAttributes
 
 MAX_PARTITIONS = CONFIG.SPARK_MAX_PARTITIONS
 _USAS_RDS_REF_TABLES = [
@@ -73,9 +74,10 @@
     TreasuryAppropriationAccount,
     ReportingAgencyOverview,
     ReportingAgencyMissingTas,
+    ZipsGrouped,
 ]
 
-_BROKER_REF_TABLES = ["zips_grouped", "cd_state_grouped", "cd_zips_grouped", "cd_county_grouped", "cd_city_grouped"]
+_BROKER_REF_TABLES = ["cd_state_grouped", "cd_zips_grouped", "cd_county_grouped", "cd_city_grouped"]
 
 logger = logging.getLogger(__name__)
 
@@ -444,7 +446,7 @@ def convert_array_cols_to_string(
     is_postgres_array_format=False,
     is_for_csv_export=False,
 ) -> DataFrame:
-    """For each column that is an Array of ANYTHING, transfrom it to a string-ified representation of that Array.
+    """For each column that is an Array of ANYTHING, transform it to a string-ified representation of that Array.
 
     This will:
       1. cast each array element to a STRING representation
@@ -576,7 +578,7 @@ def create_ref_temp_views(spark: SparkSession, create_broker_views: bool = False
         for sql_statement in broker_sql_strings:
             spark.sql(sql_statement)
 
-    logger.info(f"Created the reference views in the global_temp database")
+    logger.info("Created the reference views in the global_temp database")
 
 
 def write_csv_file(
@@ -691,7 +693,7 @@ def hadoop_copy_merge(
             logger.debug(f"Including part file: {file_path.getName()}")
             part_files.append(f.getPath())
     if not part_files:
-        logger.warn("Source directory is empty with no part files. Attempting creation of file with CSV header only")
+        logger.warning("Source directory is empty with no part files. Attempting creation of file with CSV header only")
         out_stream = None
         try:
             merged_file_path = f"{parts_dir}.{file_format}"