Skip to content

Conversation

@loreleitrimberger
Copy link
Contributor

@loreleitrimberger loreleitrimberger commented Jul 30, 2025

Description:

This has file c (award_financial) use Spark to process a download job when award_financial is a submission_type

Technical Details:

Uses the experimental header HTTP-X-DOWNLOAD-API to go through spark
If Files A and B are included in the request, they still go through the SQS queue

Requirements for PR Merge:

  1. Unit & integration tests updated
  2. N/A API documentation updated (examples listed below)
    1. API Contracts
    2. API UI
    3. Comments
  3. Data validation completed (examples listed below)
    1. Does this work well with the current frontend? Or is the frontend aware of a needed change?
    2. Is performance impacted in the changes (e.g., API, pipeline, downloads, etc.)?
    3. Is the expected data returned with the expected format?
  4. N/A Appropriate Operations ticket(s) created
  5. Jira Ticket(s)
    1. DEV-12639

Explain N/A in above checklist:

"""
Returns True or False depending on if the expected_header_value matches what is sent with the request
"""
return request.headers.get(DOWNLOAD_API_HEADER) == DOWNLOAD_HEADER_VALUE
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the same format as the experimental_elasticsearch_api wasn't coming through for me (using META.get and _ instead of -), but doing it this way did

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Django does some handling of the request headers (see https://docs.djangoproject.com/en/4.2/ref/request-response/#django.http.HttpRequest.META).

In short, the ES experimental header would normally be supplied to the API as X-Experimental-API and Django would convert it under request.META to be HTTP_X_EXPERIMENTAL_API.

With that said you should also be good to use request.headers in this case, but I would recommend changing the expected value to no include "HTTP" as that is something that Django prepends for the "request.META".

download_job.json_request = json.dumps(str_to_json_original)
download_job.save()
else:
self.process_request(download_job)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is the best way to go about it, but this way the spark job is reading from the original download_job without files a or b in it and then the download_job runs without file c

"""
Returns True or False depending on if the expected_header_value matches what is sent with the request
"""
return request.headers.get(DOWNLOAD_API_HEADER) == DOWNLOAD_HEADER_VALUE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Django does some handling of the request headers (see https://docs.djangoproject.com/en/4.2/ref/request-response/#django.http.HttpRequest.META).

In short, the ES experimental header would normally be supplied to the API as X-Experimental-API and Django would convert it under request.META to be HTTP_X_EXPERIMENTAL_API.

With that said you should also be good to use request.headers in this case, but I would recommend changing the expected value to no include "HTTP" as that is something that Django prepends for the "request.META".

from usaspending_api.download.v2.request_validations import DownloadValidatorBase
from usaspending_api.routers.replicas import ReadReplicaRouter
from usaspending_api.submissions.models import DABSSubmissionWindowSchedule
from usaspending_api.settings import IS_LOCAL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use settings.IS_LOCAL since settings is already imported above.

@loreleitrimberger loreleitrimberger merged commit 8427bba into qat Aug 7, 2025
13 checks passed
@loreleitrimberger loreleitrimberger deleted the ftr/dev-12639-update-custom-download branch August 7, 2025 19:05
from usaspending_api.common.helpers.spark_helpers import configure_spark_session, get_active_spark_session
from usaspending_api.common.spark.configs import LOCAL_EXTENDED_EXTRA_CONF

if settings.IS_LOCAL:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IS_LOCAL is hardcoded to True:


This is causing the import below to happen, which errors in not spark contexts because pyspark is not a defined dependency of the app.

@loreleitrimberger @sethstoudenmier @aguest-kc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants