Skip to content

feat: add MessageGroupId in queues to activate fair-queue feature#4752

Open
joybytes wants to merge 6 commits intomainfrom
feat/tasks-propagate-message-group-id
Open

feat: add MessageGroupId in queues to activate fair-queue feature#4752
joybytes wants to merge 6 commits intomainfrom
feat/tasks-propagate-message-group-id

Conversation

@joybytes
Copy link
Contributor

@joybytes joybytes commented Feb 25, 2026

Summary

  • Added env variable ENABLE_SQS_MESSAGE_GROUP_IDS to control on and off switch to fair-queue (it's set to ON on this PR)
  • Passed MessageGroupId (service id) on all relevant queue calls for tasks file (shatter job rows, save tasks, retries)
  • Entry point for defining grouping for upload jobs is in rest.py via process_job.apply_async there we set service_id, and all subsequent async operations on tasks.py inherit that field by calling getattr(self, "message_group_id", None)

Testing

  • Deployed to dev-d and triggered relevant flows (one-off notification, bulk letters, email and sms, reports) to make sure the field is being propagated ✅
Screenshot 2026-02-25 at 12 20 06 Screenshot 2026-02-25 at 12 21 48
  • Flows like nightly and scheduled-tasks were tested locally as it was easier to trigger that way

Tracker table

What to do?

  • Changes are deployed to dev-d: https://www.dev-d.notify-dev-d.space/
  • Trigger a flow below (UI or locally) and verify that MessageGroupId is being populated in AWS for the specified queues.
  • If you are testing a flow, just put it as "In Progress" and if you found that the queue doesn't have the grouping (service_id), leave a comment

Table

Commit Commit title Flow Trigger method Primary queue(s) to check Status (In progress/Pass/Fail)
0cceb044d feat: message group id for job pipeline (bulk upload, tasks, scheduled) Bulk upload job pipeline (job -> shatter -> delivery) UI job-tasks, database-tasks, send-sms-tasks/send-email-tasks/create-letters-pdf-tasks
0cceb044d feat: message group id for job pipeline (bulk upload, tasks, scheduled) Scheduled job release Locally (run-scheduled-jobs) job-tasks
8acf0884d feat: message group id for notification entry points Single API notification (SMS/email) UI send-sms-tasks or send-email-tasks
8acf0884d feat: message group id for notification entry points Letter notification (templated or precompiled) UI create-letters-pdf-tasks, antivirus-tasks or letter-tasks
f8101d08b feat: message group id for callback entry points (SES, inbound SMS) Inbound SMS callback intake (MMG/Firetext) Locally (manual callback/API endpoint call) service-callbacks
f8101d08b feat: message group id for callback entry points (SES, inbound SMS) SES callback processing (delivery/bounce/complaint) Locally (manual callback/API endpoint call) service-callbacks
02b3dff47 feat: message group id for report request and reporting tasks Report request creation UI report-requests-notifications-tasks
02b3dff47 feat: message group id for report request and reporting tasks Nightly status fan-out task Local/manual task run (create-nightly-notification-status) reporting-tasks
7ecb3ee27 feat: message group id for nightly, letters PDF and research mode Letters PDF chain UI create-letters-pdf-tasks (+ follow-on queues)
7ecb3ee27 feat: message group id for nightly, letters PDF and research mode Nightly per-service fan-out / recursive delete Locally (manual task run) (archive-unsubscribe-requests or delete-notifications-older-than-retention) reporting-tasks
7ecb3ee27 feat: message group id for nightly, letters PDF and research mode Research mode email callback path UI/local (send with test key) research-mode-tasks

Ticket

Fair queue notifications-api (tasks)

@joybytes joybytes force-pushed the feat/tasks-propagate-message-group-id branch 2 times, most recently from 1062ed3 to e3e1a40 Compare February 25, 2026 10:31
@joybytes joybytes changed the title feat: propagate message group id in tasks file feat: add MessageGroupId to job/save task queueing Feb 25, 2026
@joybytes joybytes force-pushed the feat/tasks-propagate-message-group-id branch from e3e1a40 to 908e527 Compare February 25, 2026 14:41
@joybytes joybytes marked this pull request as ready for review February 25, 2026 15:41
@risicle
Copy link
Member

risicle commented Feb 25, 2026

This seems to be ignoring the entire point of adding NotifyTask.message_group_id in alphagov/notifications-utils#1318

The point is that you don't have to go digging around everywhere to find the id you want to use for MessageGroupId, you just feed it in to the first celery task you send to the queue and in other celery tasks you simply forward from self.message_group_id to MessageGroupId=....

I did all of this in https://github.com/alphagov/notifications-api/pull/4693/changes#diff-cca5917c620894fe770faa88835e9e186285d5b91a55fca992eb94d5dc87e635

@joybytes
Copy link
Contributor Author

joybytes commented Feb 25, 2026

This seems to be ignoring the entire point of adding NotifyTask.message_group_id in alphagov/notifications-utils#1318

The point is that you don't have to go digging around everywhere to find the id you want to use for MessageGroupId, you just feed it in to the first celery task you send to the queue and in other celery tasks you simply forward from self.message_group_id to MessageGroupId=....

I did all of this in https://github.com/alphagov/notifications-api/pull/4693/changes#diff-cca5917c620894fe770faa88835e9e186285d5b91a55fca992eb94d5dc87e635

@risicle the point of entry for a task needs an explicit message_group_id (in this case service_id) right?

if you're re-enqueing the task again using "self" is valid (like I did for retries)

at some point in the journey, we need an explicit grouping, otherwise where does it get it from? we need to pass on service_id

@risicle
Copy link
Member

risicle commented Feb 25, 2026

See my PR, it gets done in run_scheduled_jobs and create_job (in app/job/rest.py), because process_job itself also has a service_id context, which the shattered tasks can inherit their group id from.

@risicle
Copy link
Member

risicle commented Feb 25, 2026

We need to feed the message group id in at the point where a process becomes service_id-specific, ideally nowhere else.

@joybytes joybytes force-pushed the feat/tasks-propagate-message-group-id branch 5 times, most recently from f1af5aa to 91d6414 Compare February 25, 2026 17:47
@joybytes
Copy link
Contributor Author

We need to feed the message group id in at the point where a process becomes service_id-specific, ideally nowhere else.

@risicle done

@joybytes joybytes force-pushed the feat/tasks-propagate-message-group-id branch from bb72345 to f36f318 Compare February 25, 2026 19:58
@joybytes joybytes marked this pull request as draft February 25, 2026 19:59
@joybytes joybytes force-pushed the feat/tasks-propagate-message-group-id branch from fff72cc to b21c3a1 Compare February 26, 2026 09:18
@joybytes joybytes marked this pull request as ready for review February 26, 2026 09:32
@risicle
Copy link
Member

risicle commented Feb 27, 2026

I think we need to inject the MessageGroupId in run_scheduled_jobs and check_for_missing_rows_in_completed_jobs too - the two other places job-related tasks get initiated (see https://github.com/alphagov/notifications-api/pull/4693/changes#diff-859b3fa7698d3dc1be32ef99742c5d8620fab88d231fb3e4078752f7dc5bc05f)

FWIW this is why I'm unsure about whether we should be adding MessageGroupId bit-by-bit rather than just doing everything all at once like I did in my demo PR. We're liable to miss bits this way, and it's not like we'll be able to use ENABLE_SQS_MESSAGE_GROUP_IDS to enable the different parts separately - it's a single global flag.

@joybytes joybytes force-pushed the feat/tasks-propagate-message-group-id branch from b21c3a1 to ea70df8 Compare March 10, 2026 14:33
@joybytes joybytes marked this pull request as draft March 10, 2026 14:59
@joybytes joybytes force-pushed the feat/tasks-propagate-message-group-id branch 7 times, most recently from d7819b2 to 620a8fe Compare March 11, 2026 18:16
@joybytes joybytes force-pushed the feat/tasks-propagate-message-group-id branch 3 times, most recently from 9305f6f to 06f7dab Compare March 12, 2026 10:41
@joybytes joybytes changed the title feat: add MessageGroupId to job/save task queueing feat: add MessageGroupId in queues to activate fair-queue feature Mar 12, 2026
@joybytes joybytes force-pushed the feat/tasks-propagate-message-group-id branch 2 times, most recently from 4098856 to 1baebe9 Compare March 17, 2026 11:15
@joybytes joybytes force-pushed the feat/tasks-propagate-message-group-id branch 3 times, most recently from 2287b85 to fe69570 Compare March 18, 2026 15:30
@joybytes joybytes force-pushed the feat/tasks-propagate-message-group-id branch from fe69570 to 7ecb3ee Compare March 18, 2026 15:58
@joybytes joybytes marked this pull request as ready for review March 18, 2026 16:16
Copy link
Member

@risicle risicle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is right, but if we're going to make it default to ENABLE_SQS_MESSAGE_GROUP_IDS = 1 in the code we should update notifications-aws to set it to 0 in the terraform before we merge this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants