Conversation
bf3ecb8 to
80a4c29
Compare
44aa955 to
198e1fd
Compare
|
From a discussion with @mattjreynolds
Blocking this work until CDD-3058 is merged and this can be properly tested and completed |
d7c4e8e to
7b17f60
Compare
|
I have reached out to @phill-stanley regarding the sonarqube failures. Namely:
I have also reached out to @mattjreynolds to discuss where the JWT validation should live, to see if it logically can move so it doesn't break the contract |
c4cab6b to
763be9c
Compare
Update testing
e529fbf to
22a4bcd
Compare
caching/private_api/decorators.py
Outdated
| return decorator | ||
|
|
||
|
|
||
| def _check_if_valid_non_public_request(request): |
There was a problem hiding this comment.
Suggested change:
def _check_if_valid_non_public_request(request) -> bool:
auth = backend.JSONWebTokenAuthentication()
result = auth.authenticate(request)
return result is not None
This gives a better idea of what is going to be returned (bool) and the code represents what auth.authenticate(request) might return - not necessarily a bool - can be None.
There was a problem hiding this comment.
We could add a Docstring here too to explain the intent of this function. Perform a run-through of the PR changes to ensure Docstrings?
There was a problem hiding this comment.
AI Review has raised the following concerns:
HIGH:
"
High: V2 appears to remove the public-data filter without adding an authorization gate.
At api_time_series_request_serializer.py:87, the query flips to restrict_to_public=False. But in base.py:46 and base.py:48, the JWT result is only used to add a Cache-Control header. That means authentication is affecting caching semantics, not data access. If the manager respects restrict_to_public=False literally, an unauthenticated caller can still get non-public values.
"
This is an important concern that goes to the heart of the design approach i.e. using authentication status to determine caching behaviour and in-turn conflating this with public/private access.
Please review and confirm OK/Not OK with the current approach, addressing the logic above to ensure consistency.
MEDIUM:
"
Medium: Invalid X-UHD-AUTH headers can now break otherwise public requests.
Both public base views call auth.authenticate(request) directly in base.py:47 and base.py:46. The auth backend raises on malformed bearer headers in backend.py:72, backend.py:78, and backend.py:82. So a request that should be treated as public can start returning 401 just because the client sent a bad X-UHD-AUTH header.
"
This needs to be addressed - if a public request is made, it now appears to require a properly formed header. Happy to dismiss this if you are OK with this.
"
Medium: The same malformed-header risk exists in the private API cache decorator path.
The decorator now derives public/private status by calling auth in decorators.py:59, decorators.py:75, and decorators.py:78. That means cache bypass logic is coupled to authentication exceptions. If a caller sends an invalid token, the decorator will fail before it even reaches the normal response path.
"
Review and clarify whether the approach can be improved or the code bolstered to ensure greater clarity, separation of concerns and reliability.
LOW:
"
Low: There is still no test proving the actual access-control behavior.
The new tests mostly assert header-setting and cache bypass behavior, but they do not prove the critical end-to-end rule: unauthenticated requests must remain public-only, and authenticated requests may see non-public data. Given the V2 query change, that is the part most likely to regress.
"
It is worth pursuing this one to ensure that the current requirement is supported by the tests.
caching/private_api/decorators.py
Outdated
| return decorator | ||
|
|
||
|
|
||
| def _check_if_valid_non_public_request(request): |
There was a problem hiding this comment.
We could add a Docstring here too to explain the intent of this function. Perform a run-through of the PR changes to ensure Docstrings?
| ) -> Response: | ||
| response: Response = _calculate_response_from_view(view_function, *args, **kwargs) | ||
| response: Response = _calculate_response_from_view( | ||
| view_function, *args, is_public=True, **kwargs |
There was a problem hiding this comment.
I have a concern about the assumption that is_public is being derived from authentication context (e.g. JWT presence).
It feels like the current logic effectively treats:
JWT present => non-public => not cached
This may be an oversimplification of the data classification model. Authentication status does not necessarily define whether data is public or private, and conflating these concepts could lead to incorrect caching behaviour or unintended data exposure patterns.
More generally, caching decisions (Cache-Control) and data sensitivity classification (public/private) should ideally be derived from the data itself (or an explicit domain-level metadata flag), rather than inferred from request/auth state.
There was a problem hiding this comment.
So only non-public requests will have a JWT, all public requests will not. This is the design decision made by the non public team at the outset of this ticket. I agree that the data itself should be checked, but this work is only for updating the caching element, not actually fetching non public data. As it's been agreed that only non-public requests will send a JWT, it was decided that it was the most straightforward approach to determine if the response should be cached or not :)
caching/private_api/decorators.py
Outdated
| def _calculate_response_from_view( | ||
| view_function, *args, is_public: bool = True, **kwargs | ||
| ) -> Response: | ||
| # Add is_public to response here, and then set header? |
There was a problem hiding this comment.
It looks like is_public is being derived from request/auth context and then used to determine caching behaviour (via response headers).
If this is the intended model then happy to proceed. I am flagging the coupling between authentication state and caching behaviour.
There was a problem hiding this comment.
Yep, this is the intended model - caching should be directly tied to if the request is from a non public user, which can only be derived from whether or not they are authenticated.
| kwargs: dict[str, str] = self.get_formatted_kwargs_from_request() | ||
| return self.api_time_series_manager.get_distinct_column_values_with_filters( | ||
| lookup_field=self.lookup_field, restrict_to_public=True, **kwargs | ||
| lookup_field=self.lookup_field, restrict_to_public=False, **kwargs |
There was a problem hiding this comment.
The docstring is missing an Args section—could we add one to document lookup_field, restrict_to_public, and **kwargs?
Also, what does restrict_to_public=True enforce in this context?
It seems like an important business rule (e.g. filtering out non-public data), and it may be worth documenting explicitly either here
There was a problem hiding this comment.
I need to remove this, thanks for flagging. It was changed as part of my testing (to allow non public data to come back) but shouldn't be committed as this work I believe is being done in a later ticket
| auth = backend.JSONWebTokenAuthentication() | ||
| is_valid_non_public_request = auth.authenticate(request) | ||
| if is_valid_non_public_request: | ||
| response["Cache-Control"] = "private, no-cache" |
There was a problem hiding this comment.
Evaluate whether Cache-Control: no-store is required here for sensitive/user-specific responses.
Also ensure CloudFront cache policy aligns with the intended no-caching behavior.
There was a problem hiding this comment.
Setting to "private, no-cache" was agreed to after a review of the documentation and ensuring that it aligned with what we are trying to achieve. @luketowell can you confirm if you are happy with "private, no-cache" or if you'd prefer we use "no-store"
| auth = backend.JSONWebTokenAuthentication() | ||
| is_valid_non_public_request = auth.authenticate(request) | ||
| if is_valid_non_public_request: | ||
| response["Cache-Control"] = "private, no-cache" |
There was a problem hiding this comment.
Consider centralising Cache-Control policy strings into a shared constants/module (or system settings if these vary by environment).
This would avoid duplication and reduce the risk of inconsistent cache behaviour across endpoints.
For example:
response["Cache-Control"] = system_settings.CACHE_POLICY_NO_CACHE
This makes it easier to audit and update caching policy centrally.
| auth = backend.JSONWebTokenAuthentication() | ||
| is_valid_non_public_request = auth.authenticate(request) | ||
| if is_valid_non_public_request: | ||
| response["Cache-Control"] = "private, no-cache" |
There was a problem hiding this comment.
Derive cache policy string centrally from shared module.
For example:
system_settings.CACHE_CONTROL_HTTP_HEADER
| @@ -9,3 +9,6 @@ sonar.organization=ukhsa-internal | |||
|
|
|||
| # Exclude other generated/auto-generated files | |||
| sonar.exclusions=**/migrations/**,**/__pycache__/** | |||
|
|
|||
| # Exclude files from duplication check | |||
There was a problem hiding this comment.
Provide justification for this here in the comment.
There was a problem hiding this comment.
The test files have to contain much duplication, as the top base file for both versions of the public api are incredibly similar, meaning that they require incredibly similar tests. These tests are almost full duplicates of each other, but are required to reach the 100% code coverage set. It was agreed with Phill and Josh to exclude these files from the sonarqube duplication check for this reason
There was a problem hiding this comment.
OK, no probs. Adding it into the code comment on line 13 it will be best, as that allows future readers to see the rationale.
|



Description
This PR includes the following:
Add JWT detection middleware to identify if the request comes from a public or non-public user
If the request comes from a non-public user, bypass the cache and calculate the data fresh. This newly calculated data should NOT be saved in the cache
This work only covers the private api. It has been agreed to leave the public api for now, as private data is not returned from the public api currently, and it has been agreed not to use it for non public data in the immediate future.
Fixes #CDD-3099
Type of change
Please select the options that are relevant.
Checklist: