Skip to content

[registration-cleaning-worker] Use dedicated security group for accessing Postgres#1707

Merged
jacobwinch merged 1 commit intomainfrom
jw-reg-cleaning-worker-sg
Mar 4, 2026
Merged

[registration-cleaning-worker] Use dedicated security group for accessing Postgres#1707
jacobwinch merged 1 commit intomainfrom
jw-reg-cleaning-worker-sg

Conversation

@jacobwinch
Copy link
Contributor

@jacobwinch jacobwinch commented Feb 26, 2026

What does this change?

The Notifications VPC has a “default group”. Prior to this PR, registration-cleaning-worker used this group to gain access to the RDS database (via the RDS proxy). This approach to networking does not follow the principle of least privilege, as any network interface in the VPC can implicitly reach the CODE and PROD databases.

In #1636 a new security group was created, with the intention of services joining it only if they need database access. This will ultimately allow us to remove the “default group”, thus improving our security posture by moving towards the principle of least privilege.

This PR updates registration-cleaning-worker to use the new security group / approach1.

How has this change been tested?

I've deployed this to CODE to check that the new CFN / Riff-Raff wiring is valid.

However, testing the actual behaviour of this Lambda seems tricky (#1527), so I'd like to merge this and confirm that it works in PROD by checking the logs / SQS metrics.

How can we measure success?

This is a step towards removing the "default group", which improves our security posture and allows us to standardise on a single approach following the changes made in #1636.

Have we considered potential risks?

As mentioned above, testing this in CODE is tricky.

'Testing' this in PROD (after merging to main) should be low risk because this Lambda processes a queue (and has a dead letter queue):

Dlq:
Type: AWS::SQS::Queue
Sqs:
Type: AWS::SQS::Queue
Properties:
VisibilityTimeout: 300
MessageRetentionPeriod: 3600 # 1 hour
RedrivePolicy:
deadLetterTargetArn: !GetAtt Dlq.Arn
maxReceiveCount: 5
Tags:
- Key: Stage
Value: !Ref Stage
- Key: Stack
Value: !Ref Stack
- Key: App
Value: !Ref App

So if anything goes wrong we can just rollback and re-drive the messages.

As far as I can tell there will be no negative impact caused by taking a bit longer than normal to tidy up invalid tokens.

Footnotes

  1. Note that all of the above applies regardless of whether the application connects to the RDS instance directly or whether it goes via the RDS proxy, because the RDS instance and the RDS proxy share the same security group (created here).

@jacobwinch jacobwinch added the maintenance Departmental tracking: maintenance work, not a fix or a feature label Feb 26, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 26, 2026

@jacobwinch jacobwinch force-pushed the jw-reg-cleaning-worker-sg branch 2 times, most recently from 1c8ad70 to 146259b Compare February 26, 2026 16:28
@github-actions
Copy link
Contributor

github-actions bot commented Feb 26, 2026

@jacobwinch jacobwinch marked this pull request as ready for review February 26, 2026 16:38
@jacobwinch jacobwinch requested a review from a team as a code owner February 26, 2026 16:38
@jacobwinch jacobwinch force-pushed the jw-reg-cleaning-worker-sg branch from 146259b to 4283483 Compare March 4, 2026 11:36
@jacobwinch jacobwinch merged commit 96c9654 into main Mar 4, 2026
10 checks passed
@jacobwinch jacobwinch deleted the jw-reg-cleaning-worker-sg branch March 4, 2026 11:40
@jacobwinch
Copy link
Contributor Author

This has been deployed successfully.

These logs show that we are deleting records as expected and the SQS metrics suggest that all messages are being processed as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintenance Departmental tracking: maintenance work, not a fix or a feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants