Skip to content

CDD-3099 update non public caching#3003

Open
kathryn-dale wants to merge 47 commits intomainfrom
task/CDD-3099-update-non-public-caching
Open

CDD-3099 update non public caching#3003
kathryn-dale wants to merge 47 commits intomainfrom
task/CDD-3099-update-non-public-caching

Conversation

@kathryn-dale
Copy link
Copy Markdown
Contributor

@kathryn-dale kathryn-dale commented Feb 17, 2026

Description

This PR includes the following:

  • Add JWT detection middleware to identify if the request comes from a public or non-public user

  • If the request comes from a non-public user, bypass the cache and calculate the data fresh. This newly calculated data should NOT be saved in the cache

  • This work only covers the private api. It has been agreed to leave the public api for now, as private data is not returned from the public api currently, and it has been agreed not to use it for non public data in the immediate future.

Fixes #CDD-3099


Type of change

Please select the options that are relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Tech debt item (this is focused solely on addressing any relevant technical debt)

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests at the right levels to prove my change is effective
  • I have added screenshots or screen grabs where appropriate
  • I have added docstrings in the correct style (google)

@kathryn-dale kathryn-dale force-pushed the task/CDD-3099-update-non-public-caching branch from bf3ecb8 to 80a4c29 Compare February 17, 2026 15:08
@kathryn-dale kathryn-dale marked this pull request as ready for review February 18, 2026 10:03
@kathryn-dale kathryn-dale requested a review from a team as a code owner February 18, 2026 10:03
@kathryn-dale kathryn-dale force-pushed the task/CDD-3099-update-non-public-caching branch 3 times, most recently from 44aa955 to 198e1fd Compare March 9, 2026 10:05
@kathryn-dale
Copy link
Copy Markdown
Contributor Author

From a discussion with @mattjreynolds

So the jwt validation (or any Django Authentication middleware) will set the values of request.user (to an instance of the User model) and request.auth (that will contain the valid auth - in this case the decoded jwt). So you should be able to just do simple check along the lines of valid_jwt = request.auth  (request.auth will be set to None for a public (unauthenticated) request). We could decide later on to make that check more specific (once we've figured out exactly how permissions are going to look) at which point we may shift it to calling a function on the permission sets model attached to the user, so something like valid_jwt = request.user.permission_sets.is_valid()  but I think to keep it simple for now, just checking request.auth will be fine, and that will mean your code will just start working as long as it gets merged after mine.

Blocking this work until CDD-3058 is merged and this can be properly tested and completed

@kathryn-dale kathryn-dale force-pushed the task/CDD-3099-update-non-public-caching branch from d7c4e8e to 7b17f60 Compare March 26, 2026 10:32
@kathryn-dale
Copy link
Copy Markdown
Contributor Author

I have reached out to @phill-stanley regarding the sonarqube failures. Namely:

  • There are two versions of the public api which are remarkably similar. This has necessitated duplicate code in the testing, which has crossed the threshold of tolerated levels of duplication

I have also reached out to @mattjreynolds to discuss where the JWT validation should live, to see if it logically can move so it doesn't break the contract

@kathryn-dale kathryn-dale force-pushed the task/CDD-3099-update-non-public-caching branch from c4cab6b to 763be9c Compare March 31, 2026 13:32
@kathryn-dale kathryn-dale force-pushed the task/CDD-3099-update-non-public-caching branch from e529fbf to 22a4bcd Compare April 1, 2026 08:43
return decorator


def _check_if_valid_non_public_request(request):
Copy link
Copy Markdown

@jeanpierrefouche-ukhsa jeanpierrefouche-ukhsa Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change:

def _check_if_valid_non_public_request(request) -> bool:
    auth = backend.JSONWebTokenAuthentication()
    result = auth.authenticate(request)
    return result is not None

This gives a better idea of what is going to be returned (bool) and the code represents what auth.authenticate(request) might return - not necessarily a bool - can be None.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add a Docstring here too to explain the intent of this function. Perform a run-through of the PR changes to ensure Docstrings?

Copy link
Copy Markdown

@jeanpierrefouche-ukhsa jeanpierrefouche-ukhsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Review has raised the following concerns:

HIGH:
"
High: V2 appears to remove the public-data filter without adding an authorization gate.
At api_time_series_request_serializer.py:87, the query flips to restrict_to_public=False. But in base.py:46 and base.py:48, the JWT result is only used to add a Cache-Control header. That means authentication is affecting caching semantics, not data access. If the manager respects restrict_to_public=False literally, an unauthenticated caller can still get non-public values.
"

This is an important concern that goes to the heart of the design approach i.e. using authentication status to determine caching behaviour and in-turn conflating this with public/private access.

Please review and confirm OK/Not OK with the current approach, addressing the logic above to ensure consistency.

MEDIUM:

"
Medium: Invalid X-UHD-AUTH headers can now break otherwise public requests.
Both public base views call auth.authenticate(request) directly in base.py:47 and base.py:46. The auth backend raises on malformed bearer headers in backend.py:72, backend.py:78, and backend.py:82. So a request that should be treated as public can start returning 401 just because the client sent a bad X-UHD-AUTH header.
"

This needs to be addressed - if a public request is made, it now appears to require a properly formed header. Happy to dismiss this if you are OK with this.

"
Medium: The same malformed-header risk exists in the private API cache decorator path.
The decorator now derives public/private status by calling auth in decorators.py:59, decorators.py:75, and decorators.py:78. That means cache bypass logic is coupled to authentication exceptions. If a caller sends an invalid token, the decorator will fail before it even reaches the normal response path.
"

Review and clarify whether the approach can be improved or the code bolstered to ensure greater clarity, separation of concerns and reliability.

LOW:

"
Low: There is still no test proving the actual access-control behavior.
The new tests mostly assert header-setting and cache bypass behavior, but they do not prove the critical end-to-end rule: unauthenticated requests must remain public-only, and authenticated requests may see non-public data. Given the V2 query change, that is the part most likely to regress.
"

It is worth pursuing this one to ensure that the current requirement is supported by the tests.

return decorator


def _check_if_valid_non_public_request(request):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add a Docstring here too to explain the intent of this function. Perform a run-through of the PR changes to ensure Docstrings?

) -> Response:
response: Response = _calculate_response_from_view(view_function, *args, **kwargs)
response: Response = _calculate_response_from_view(
view_function, *args, is_public=True, **kwargs
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a concern about the assumption that is_public is being derived from authentication context (e.g. JWT presence).

It feels like the current logic effectively treats:
JWT present => non-public => not cached

This may be an oversimplification of the data classification model. Authentication status does not necessarily define whether data is public or private, and conflating these concepts could lead to incorrect caching behaviour or unintended data exposure patterns.

More generally, caching decisions (Cache-Control) and data sensitivity classification (public/private) should ideally be derived from the data itself (or an explicit domain-level metadata flag), rather than inferred from request/auth state.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So only non-public requests will have a JWT, all public requests will not. This is the design decision made by the non public team at the outset of this ticket. I agree that the data itself should be checked, but this work is only for updating the caching element, not actually fetching non public data. As it's been agreed that only non-public requests will send a JWT, it was decided that it was the most straightforward approach to determine if the response should be cached or not :)

def _calculate_response_from_view(
view_function, *args, is_public: bool = True, **kwargs
) -> Response:
# Add is_public to response here, and then set header?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like is_public is being derived from request/auth context and then used to determine caching behaviour (via response headers).

If this is the intended model then happy to proceed. I am flagging the coupling between authentication state and caching behaviour.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is the intended model - caching should be directly tied to if the request is from a non public user, which can only be derived from whether or not they are authenticated.

kwargs: dict[str, str] = self.get_formatted_kwargs_from_request()
return self.api_time_series_manager.get_distinct_column_values_with_filters(
lookup_field=self.lookup_field, restrict_to_public=True, **kwargs
lookup_field=self.lookup_field, restrict_to_public=False, **kwargs
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring is missing an Args section—could we add one to document lookup_field, restrict_to_public, and **kwargs?

Also, what does restrict_to_public=True enforce in this context?
It seems like an important business rule (e.g. filtering out non-public data), and it may be worth documenting explicitly either here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to remove this, thanks for flagging. It was changed as part of my testing (to allow non public data to come back) but shouldn't be committed as this work I believe is being done in a later ticket

auth = backend.JSONWebTokenAuthentication()
is_valid_non_public_request = auth.authenticate(request)
if is_valid_non_public_request:
response["Cache-Control"] = "private, no-cache"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Evaluate whether Cache-Control: no-store is required here for sensitive/user-specific responses.
Also ensure CloudFront cache policy aligns with the intended no-caching behavior.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting to "private, no-cache" was agreed to after a review of the documentation and ensuring that it aligned with what we are trying to achieve. @luketowell can you confirm if you are happy with "private, no-cache" or if you'd prefer we use "no-store"

auth = backend.JSONWebTokenAuthentication()
is_valid_non_public_request = auth.authenticate(request)
if is_valid_non_public_request:
response["Cache-Control"] = "private, no-cache"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider centralising Cache-Control policy strings into a shared constants/module (or system settings if these vary by environment).
This would avoid duplication and reduce the risk of inconsistent cache behaviour across endpoints.

For example:

response["Cache-Control"] = system_settings.CACHE_POLICY_NO_CACHE

This makes it easier to audit and update caching policy centrally.

auth = backend.JSONWebTokenAuthentication()
is_valid_non_public_request = auth.authenticate(request)
if is_valid_non_public_request:
response["Cache-Control"] = "private, no-cache"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derive cache policy string centrally from shared module.
For example:

system_settings.CACHE_CONTROL_HTTP_HEADER

@@ -9,3 +9,6 @@ sonar.organization=ukhsa-internal

# Exclude other generated/auto-generated files
sonar.exclusions=**/migrations/**,**/__pycache__/**

# Exclude files from duplication check
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide justification for this here in the comment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test files have to contain much duplication, as the top base file for both versions of the public api are incredibly similar, meaning that they require incredibly similar tests. These tests are almost full duplicates of each other, but are required to reach the 100% code coverage set. It was agreed with Phill and Josh to exclude these files from the sonarqube duplication check for this reason

Copy link
Copy Markdown

@jeanpierrefouche-ukhsa jeanpierrefouche-ukhsa Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, no probs. Adding it into the code comment on line 13 it will be best, as that allows future readers to see the rationale.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 1, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants