Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
92 commits
Select commit Hold shift + click to select a range
b8311a1
e.desc -> e.message
dpb-bah Apr 10, 2025
bd603ac
trying str(e)
dpb-bah Apr 10, 2025
10d1865
Merge branch 'qat' into mod/dev-11858-reload-transactions-fix
dpb-bah Jun 2, 2025
383d25c
[DEV-12638] Initial work for triggering Spark job from API
sethstoudenmier Jun 6, 2025
c1567b9
[DEV-12638] Small correction
sethstoudenmier Jun 6, 2025
1f79e1d
Migrate to modern logger interface
emmanuel-ferdman Jun 12, 2025
e12ed65
Merge pull request #4415 from fedspendingtransparency/mod/dev-12757-u…
zachflanders-frb Jun 13, 2025
5878caf
[DEV-12600] Update domestic location queries
aguest-kc Jun 17, 2025
5658d9c
Update DELETE query
aguest-kc Jun 17, 2025
3acd07d
[DEV-12638] Correction for local and more reuse for spark config
sethstoudenmier Jun 17, 2025
fcc339d
Merge branch 'qat' of github.com:fedspendingtransparency/usaspending-…
sethstoudenmier Jun 17, 2025
b5f8c9b
Merge branch 'qat' into mod/dev-11858-reload-transactions-fix
sethstoudenmier Jun 18, 2025
46c7e7b
Merge pull request #4359 from fedspendingtransparency/mod/dev-11858-r…
sethstoudenmier Jun 18, 2025
ad9ab30
Merge branch 'qat' into ftr/dev-12638-pattern-to-start-spark-job
sethstoudenmier Jun 18, 2025
db70631
Merge pull request #4412 from fedspendingtransparency/ftr/dev-12638-p…
sethstoudenmier Jun 18, 2025
7dfc204
[DEV-12758] Return unlinked awards from File C or D
aguest-kc Jun 18, 2025
f27284a
[DEV-12758] Update tests
aguest-kc Jun 18, 2025
8bbbdf0
Merge branch 'qat' into ftr/dev-12794-load-offices-query-improvement
aguest-kc Jun 18, 2025
e6f550d
Merge pull request #4419 from fedspendingtransparency/ftr/dev-12794-l…
aguest-kc Jun 18, 2025
32516df
Merge branch 'qat' into fix/dev-12758-agency-overview-fix
aguest-kc Jun 18, 2025
30d3cd5
Merge pull request #4421 from fedspendingtransparency/fix/dev-12758-a…
aguest-kc Jun 18, 2025
df77c76
[DEV-12205] Add CSS that was removed in update to rest framework
sethstoudenmier Jun 23, 2025
e4719d2
[DEV-11342] Consolidate DB queries
aguest-kc Jun 23, 2025
853ea6d
Merge branch 'qat' into ftr/dev-12600-reference-data-for-domestic-loc…
aguest-kc Jun 23, 2025
10e1acf
Merge branch 'qat' of github.com:fedspendingtransparency/usaspending-…
sethstoudenmier Jun 23, 2025
5b2e901
[DEV-12600] Add foreign cities
aguest-kc Jun 24, 2025
5c40ca7
Merge pull request #4423 from fedspendingtransparency/ftr/dev-11342-r…
aguest-kc Jun 24, 2025
6bb6e33
[DEV-12600] Update foreign city CTEs
aguest-kc Jun 24, 2025
3c2a8e6
Merge branch 'qat' into ftr/dev-12600-reference-data-for-domestic-loc…
aguest-kc Jun 24, 2025
43d77eb
[DEV-12600] Add id column
aguest-kc Jun 25, 2025
9a00c25
Merge branch 'qat' into fix/dev-12205-fix-api-ui
sethstoudenmier Jun 25, 2025
1b2b0d7
Merge pull request #4424 from fedspendingtransparency/fix/dev-12205-f…
sethstoudenmier Jun 25, 2025
b566225
[DEV-11770] - Add account_download create and load commands
zachflanders-frb Jan 31, 2025
7629138
[DEV-12235] - Handle case where downloadjob is None
zachflanders-frb Feb 7, 2025
83a9f06
[DEV-11770- - update account download table load query to remove fisc…
zachflanders-frb Feb 14, 2025
6fc1a3a
[DEV-11770] - update partition_column
zachflanders-frb Feb 14, 2025
a4b862e
[DEV-11772] - Update load query to filter by year
zachflanders-frb Feb 14, 2025
5eee5ce
[DEV-11771] - Move filters to download query
zachflanders-frb Feb 24, 2025
f213e2a
[DEV-12574] - WIP - add AccountDownloadDataFrameBuilder
zachflanders-frb Jun 4, 2025
042a654
[DEV-12574] - update spark download dataframe builder
zachflanders-frb Jun 5, 2025
b4080d9
[DEV-12574] - adding dynamic filters for def codes, agency, account id
zachflanders-frb Jun 6, 2025
1c8ba37
[DEV-12574] - Fix table spec
zachflanders-frb Jun 9, 2025
433858b
[DEV-12574] - Adding tests
zachflanders-frb Jun 10, 2025
c99d849
[DEV-12234] - Update account_download schema and table spec
zachflanders-frb Apr 15, 2025
6fcf116
[DEV-12234] - fix type in account download sql
zachflanders-frb Apr 16, 2025
550dee6
[DEV-12574] - Move test file to integration tests
zachflanders-frb Jun 12, 2025
db5677a
[DEV-12574] - update source of select columns
zachflanders-frb Jun 12, 2025
44f16f5
[DEV-12574] - Update fixtures to ensure cleanup of delta tables
zachflanders-frb Jun 20, 2025
c9ce1cb
[DEV-12600] Cleanup SQL
aguest-kc Jun 25, 2025
6c9c493
[DEV-12600] Update tests
aguest-kc Jun 25, 2025
91b2a06
Merge branch 'qat' into ftr/dev-12600-reference-data-for-domestic-loc…
aguest-kc Jun 25, 2025
83790b5
[DEV-12600] SQL cleanup
aguest-kc Jun 25, 2025
9088424
[DEV-12600] Add zips_grouped to USAS ref table list
aguest-kc Jun 25, 2025
3fbf67f
Addressing comments
zachflanders-frb Jun 25, 2025
5689836
Merge branch 'qat' into ftr/spark-account-download
zachflanders-frb Jun 25, 2025
f201296
Addressing comments
zachflanders-frb Jun 25, 2025
658cce2
Adding federal_account_id column to account_download table
zachflanders-frb Jun 25, 2025
85b1d68
[DEV-12579] Update spending_over_time to default to fiscal year calendar
sethstoudenmier Jun 26, 2025
aefeb23
[DEV-12600] Remove zip_grouped from Broker ref tables
aguest-kc Jun 26, 2025
99b2c47
Fix import in test
zachflanders-frb Jun 26, 2025
e92af07
Updating spark conf constants
zachflanders-frb Jun 26, 2025
2b96c83
Updating spark conf constants
zachflanders-frb Jun 26, 2025
0ddcd18
fix syntax error
zachflanders-frb Jun 26, 2025
fbf021a
Updating spark conf constants to account for environments where local…
zachflanders-frb Jun 26, 2025
b4afc0e
[DEV-12574] - Add owning agency id for filtering to account download …
zachflanders-frb Jun 26, 2025
092cfe2
[DEV-12579] Fix test cases
sethstoudenmier Jun 26, 2025
7eed37c
[DEV-12579] Fix test cases
sethstoudenmier Jun 27, 2025
e7a6c0d
[DEV-12574] - Additional Dynamic Filters for Account Downloads
zachflanders-frb Jun 27, 2025
aad4a45
[DEV-12574] - Removing unused logger
zachflanders-frb Jun 27, 2025
f7e14c2
[DEV-12574] - Move account download filter tests to integration tests
zachflanders-frb Jun 30, 2025
232de15
[DEV-12104] Improvements to command for larger datasets
sethstoudenmier Jul 1, 2025
5695a3f
[DEV-12104] Typo on logging
sethstoudenmier Jul 1, 2025
98effda
[DEV-12104] Move WHERE to subquery
sethstoudenmier Jul 1, 2025
3a932e3
[DEV-12600] Remove GROUP BY
aguest-kc Jul 1, 2025
4b92a2d
Merge pull request #4425 from fedspendingtransparency/ftr/dev-12600-r…
aguest-kc Jul 2, 2025
3f7a38b
Merge branch 'qat' into fix/dev-12579-spending-over-time-month
sethstoudenmier Jul 2, 2025
d536c83
Merge branch 'qat' into ftr/spark-account-download
zachflanders-frb Jul 2, 2025
a58906d
[DEV-12574] - Updating download accounts api contract
zachflanders-frb Jul 2, 2025
ca8d18e
Merge pull request #4428 from fedspendingtransparency/ftr/dev-12574-a…
zachflanders-frb Jul 2, 2025
f1fb297
[DEV-12574] - adjusting numbner fo returns for code climate
zachflanders-frb Jul 2, 2025
d7870a8
Merge pull request #4427 from fedspendingtransparency/fix/dev-12579-s…
boozallendanny Jul 2, 2025
9c225e6
Merge pull request #4426 from fedspendingtransparency/ftr/spark-accou…
boozallendanny Jul 2, 2025
925b5f1
Merge pull request #4430 from fedspendingtransparency/qat
zachflanders-frb Jul 2, 2025
02c738d
Merge branch 'qat' of https://github.com/fedspendingtransparency/usas…
sethstoudenmier Jul 2, 2025
9626771
Merge pull request #4431 from fedspendingtransparency/mod/dev-12104-c…
sethstoudenmier Jul 2, 2025
9fa859e
Update PR template, new action, and update readme badge
sethstoudenmier Jul 3, 2025
03de0f5
Merge branch 'qat' of https://github.com/fedspendingtransparency/usas…
sethstoudenmier Jul 3, 2025
bd5b0e5
[DEV-12574] - fix issues with generate covid19 downloads and generate…
zachflanders-frb Jul 8, 2025
5a2c07b
Merge pull request #4434 from fedspendingtransparency/fix/dev-12574-f…
zachflanders-frb Jul 8, 2025
64f77d6
Merge branch 'qat' into minor-updates-to-github
sethstoudenmier Jul 8, 2025
223b921
Merge pull request #4432 from fedspendingtransparency/minor-updates-t…
sethstoudenmier Jul 8, 2025
5c10518
Merge pull request #4435 from fedspendingtransparency/qat
boozallendanny Jul 8, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 23 additions & 23 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,27 @@
**Description:**
High level description of what the PR addresses should be put here. Should be detailed enough to communicate to a PO what this PR addresses without diving into the technical nuances
## Description:
<!-- High level description of what the PR addresses should be put here. Should be detailed enough to communicate to a PO what this PR addresses without diving into the technical nuances. -->

**Technical details:**
The technical details for the knowledge of other developers. Any detailed caveats or specific deployment steps should be outlined here.

**Requirements for PR merge:**

## Technical Details:
<!-- The technical details for the knowledge of other developers. Any detailed caveats or specific deployment steps should be outlined here. -->



## Requirements for PR Merge:
<!-- Items that aren't relevant should be marked as N/A and explained below as needed. -->

1. [ ] Unit & integration tests updated
2. [ ] API documentation updated
3. [ ] Necessary PR reviewers:
- [ ] Backend
- [ ] Frontend <OPTIONAL>
- [ ] Operations <OPTIONAL>
- [ ] Domain Expert <OPTIONAL>
4. [ ] Matview impact assessment completed
5. [ ] Frontend impact assessment completed
6. [ ] Data validation completed
7. [ ] Appropriate Operations ticket(s) created
8. [ ] Jira Ticket [DEV-123](https://federal-spending-transparency.atlassian.net/browse/DEV-123):
- [ ] Link to this Pull-Request
- [ ] Performance evaluation of affected (API | Script | Download)
- [ ] Before / After data comparison

**Area for explaining above N/A when needed:**
```
```
2. [ ] API documentation updated (examples listed below)
1. API Contracts
2. API UI
3. Comments
3. [ ] Data validation completed (examples listed below)
1. Does this work well with the current frontend? Or is the frontend aware of a needed change?
2. Is performance impacted in the changes (e.g., API, pipeline, downloads, etc.)?
3. Is the expected data returned with the expected format?
4. [ ] Appropriate Operations ticket(s) created
5. [ ] Jira Ticket(s)
1. [DEV-0](https://federal-spending-transparency.atlassian.net/browse/DEV-0)

### Explain N/A in above checklist:
18 changes: 0 additions & 18 deletions .github/pull_request_template_future.md

This file was deleted.

27 changes: 27 additions & 0 deletions .github/workflows/pull-request-and-review-updates.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: Pull Request and Review Updates

on:
pull_request:
types: [opened]
pull_request_review:
types: [submitted]

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number }}-${{ github.actor_id }}
cancel-in-progress: true

jobs:
Update-Pull-Request-Assignees:
name: Update Pull Request Assignees
runs-on: ${{ vars.RUNNER_VERSION }}
steps:
- name: Update Assignee
uses: actions/github-script@v7
with:
script: |
github.rest.issues.addAssignees({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
assignees: [context.actor]
});
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# <p align="center"><img src="https://www.usaspending.gov/img/[email protected]" alt="USAspending API"></p>

[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black) [![Pull Request Checks](https://github.com/fedspendingtransparency/usaspending-api/actions/workflows/pull-request-checks.yaml/badge.svg)](https://github.com/fedspendingtransparency/usaspending-api/actions/workflows/pull-request-checks.yaml) [![Test Coverage](https://codeclimate.com/github/fedspendingtransparency/usaspending-api/badges/coverage.svg)](https://codeclimate.com/github/fedspendingtransparency/usaspending-api/coverage) [![Code Climate](https://codeclimate.com/github/fedspendingtransparency/usaspending-api/badges/gpa.svg)](https://codeclimate.com/github/fedspendingtransparency/usaspending-api)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black) [![Pull Request Checks](https://github.com/fedspendingtransparency/usaspending-api/actions/workflows/pull-request-checks.yaml/badge.svg?branch=staging)](https://github.com/fedspendingtransparency/usaspending-api/actions/workflows/pull-request-checks.yaml) [![Test Coverage](https://codeclimate.com/github/fedspendingtransparency/usaspending-api/badges/coverage.svg)](https://codeclimate.com/github/fedspendingtransparency/usaspending-api/coverage) [![Code Climate](https://codeclimate.com/github/fedspendingtransparency/usaspending-api/badges/gpa.svg)](https://codeclimate.com/github/fedspendingtransparency/usaspending-api)

_This API is utilized by USAspending.gov to obtain all federal spending data which is open source and provided to the public as part of the DATA Act._

Expand Down
2 changes: 2 additions & 0 deletions requirements/requirements-app.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ asyncpg==0.29.*
attrs==23.2.*
boto3==1.34.*
certifi==2024.7.4
databricks-sdk==0.44.1 # Pinned because newer versions use v2.2 of the API which is not supported by PVC
dataclasses-json==0.6.*
dj-database-url==2.1.0
django-cors-headers==4.3.*
Expand All @@ -28,6 +29,7 @@ psutil==5.9.*
psycopg2==2.9.9 # Pinning exact version because this package will drop support for Python versions in patches
py-gfm==2.0.0
pydantic[dotenv]==1.9.*
python-dateutil==2.9.*
python-json-logger==2.0.7
requests==2.31.*
retrying==1.3.4
Expand Down
15 changes: 13 additions & 2 deletions usaspending_api/api_contracts/contracts/v2/download/accounts.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ Generate files and return metadata using filters on custom account
+ `account_level` (required, enum[string])
The account level is used to filter for a specific type of file.
+ Members
+ `treasury_account`
+ `federal_account`
+ `treasury_account`
+ `file_format` (optional, enum[string])
The format of the file(s) in the zip file containing the data.
+ Default: `csv`
Expand Down Expand Up @@ -87,9 +87,20 @@ Generate files and return metadata using filters on custom account
+ `agency` (optional, string)
The agency on which to filter. This field expects an internal toptier agency identifier also known as the `toptier_agency_id`.
+ Default: `all`
+ `budget_function` (optional, string)
The budget function code on which to filter.
+ `budget_subfunction` (optional, string)
The budget subfunction code on whicn to filter
+ `federal_account`(optional, string)
This field is an internal id.
+ `submission_types` (required, array)
+ `submission_type` (optional, enum[string])
Either `submission_type` or `submission_types` is required.
+ Members
+ `account_balances`
+ `object_class_program_activity`
+ `award_financial`
+ `submission_types` (optional, array)
Either `submission_type` or `submission_types` is required.
+ (enum[string])
+ `account_balances`
+ `object_class_program_activity`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -144,10 +144,10 @@ def process_data_copy_jobs(self, zip_file_path):
sql_file = None
final_path = self._create_data_csv_dest_path(final_name)
intermediate_data_file_path = final_path.parent / (final_path.name + "_temp")
data_file_names, count = self.download_to_csv(
download_metadata = self.download_to_csv(
sql_file, final_path, final_name, str(intermediate_data_file_path), zip_file_path, df
)
if count <= 0:
if download_metadata.number_of_rows <= 0:
logger.warning(f"Empty data file generated: {final_path}!")

self.filepaths_to_delete.extend(self.working_dir_path.glob(f"{final_path.stem}*"))
Expand All @@ -159,7 +159,7 @@ def complete_zip_and_upload(self, zip_file_path):
upload_download_file_to_s3(zip_file_path, settings.UNLINKED_AWARDS_DOWNLOAD_REDIRECT_DIR)
logger.info("Marking zip file for deletion in cleanup")
else:
logger.warn("Not uploading zip file to S3. Leaving file locally")
logger.warning("Not uploading zip file to S3. Leaving file locally")
self.filepaths_to_delete.remove(zip_file_path)

@property
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ def add_arguments(self, parser):
parser.add_argument(
"--load-field-type",
type=str,
required=True,
required=False,
default="text",
help="Postgres data type of the field that will be copied from Broker",
)
Expand Down Expand Up @@ -122,24 +122,28 @@ def run_update(self, min_id: int, max_id: int) -> None:
'broker_server','(
SELECT {self.broker_match_field}, {self.broker_load_field}
FROM {self.broker_table_name}
WHERE
{self.broker_match_field} >= {chunk_min_id}
AND {self.broker_match_field} <= {chunk_max_id}
)') AS broker_table
(
lookup_id bigint,
load_field {self.load_field_type}
)
WHERE usas_table.{self.usas_match_field} = broker_table.lookup_id
WHERE
usas_table.{self.usas_match_field} = broker_table.lookup_id
;
"""
)

row_count = cursor.rowcount
total_row_count += row_count
ratio = (chunk_max_id - min_id + 1) / estimated_id_count
logging.info(
logger.info(
f'Updated {row_count:,d} rows with "{self.usas_match_field}" between {chunk_min_id:,d} and {chunk_max_id:,d}.'
f" Estimated time remaining: {timer.estimated_remaining_runtime(ratio)}"
)
logging.info(
logger.info(
f'Finished updating {total_row_count:,d} rows for "{self.usas_table_name}"."{self.usas_load_field}" '
f"in {timer}"
)
Expand Down
52 changes: 27 additions & 25 deletions usaspending_api/common/etl/spark.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,46 +6,47 @@
"""

import logging

from itertools import chain
from typing import List
from pyspark.sql.functions import to_date, lit, expr, concat, concat_ws, col, regexp_replace, transform, when
from pyspark.sql.types import StructType, DecimalType, StringType, ArrayType
from pyspark.sql import DataFrame, SparkSession
import time
from collections import namedtuple
from itertools import chain
from typing import List

from py4j.protocol import Py4JError
from pyspark.sql import DataFrame, SparkSession
from pyspark.sql.functions import col, concat, concat_ws, expr, lit, regexp_replace, to_date, transform, when
from pyspark.sql.types import ArrayType, DecimalType, StringType, StructType

from usaspending_api.accounts.models import FederalAccount, TreasuryAppropriationAccount
from usaspending_api.config import CONFIG
from usaspending_api.common.helpers.spark_helpers import (
get_broker_jdbc_url,
get_jdbc_connection_properties,
get_usas_jdbc_url,
get_broker_jdbc_url,
)
from usaspending_api.config import CONFIG
from usaspending_api.download.filestreaming.download_generation import EXCEL_ROW_LIMIT
from usaspending_api.financial_activities.models import FinancialAccountsByProgramActivityObjectClass
from usaspending_api.recipient.models import StateData
from usaspending_api.references.models import (
Cfda,
Agency,
ToptierAgency,
SubtierAgency,
CGAC,
NAICS,
Office,
PSC,
RefCountryCode,
Agency,
Cfda,
CityCountyStateCode,
PopCounty,
PopCongressionalDistrict,
DisasterEmergencyFundCode,
RefProgramActivity,
ObjectClass,
GTASSF133Balances,
CGAC,
ObjectClass,
Office,
PopCongressionalDistrict,
PopCounty,
RefCountryCode,
RefProgramActivity,
SubtierAgency,
ToptierAgency,
ZipsGrouped,
)
from usaspending_api.reporting.models import ReportingAgencyMissingTas, ReportingAgencyOverview
from usaspending_api.submissions.models import SubmissionAttributes, DABSSubmissionWindowSchedule
from usaspending_api.download.filestreaming.download_generation import EXCEL_ROW_LIMIT
from usaspending_api.submissions.models import DABSSubmissionWindowSchedule, SubmissionAttributes

MAX_PARTITIONS = CONFIG.SPARK_MAX_PARTITIONS
_USAS_RDS_REF_TABLES = [
Expand Down Expand Up @@ -73,9 +74,10 @@
TreasuryAppropriationAccount,
ReportingAgencyOverview,
ReportingAgencyMissingTas,
ZipsGrouped,
]

_BROKER_REF_TABLES = ["zips_grouped", "cd_state_grouped", "cd_zips_grouped", "cd_county_grouped", "cd_city_grouped"]
_BROKER_REF_TABLES = ["cd_state_grouped", "cd_zips_grouped", "cd_county_grouped", "cd_city_grouped"]

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -444,7 +446,7 @@ def convert_array_cols_to_string(
is_postgres_array_format=False,
is_for_csv_export=False,
) -> DataFrame:
"""For each column that is an Array of ANYTHING, transfrom it to a string-ified representation of that Array.
"""For each column that is an Array of ANYTHING, transform it to a string-ified representation of that Array.

This will:
1. cast each array element to a STRING representation
Expand Down Expand Up @@ -576,7 +578,7 @@ def create_ref_temp_views(spark: SparkSession, create_broker_views: bool = False
for sql_statement in broker_sql_strings:
spark.sql(sql_statement)

logger.info(f"Created the reference views in the global_temp database")
logger.info("Created the reference views in the global_temp database")


def write_csv_file(
Expand Down Expand Up @@ -691,7 +693,7 @@ def hadoop_copy_merge(
logger.debug(f"Including part file: {file_path.getName()}")
part_files.append(f.getPath())
if not part_files:
logger.warn("Source directory is empty with no part files. Attempting creation of file with CSV header only")
logger.warning("Source directory is empty with no part files. Attempting creation of file with CSV header only")
out_stream = None
try:
merged_file_path = f"{parts_dir}.{file_format}"
Expand Down
Loading