Skip to content

refactor: partition tasks by range and by status in OLAP repository#6

Merged
abelanger5 merged 2 commits intomainfrom
belanger/task-status-partitioning
Feb 11, 2025
Merged

refactor: partition tasks by range and by status in OLAP repository#6
abelanger5 merged 2 commits intomainfrom
belanger/task-status-partitioning

Conversation

@abelanger5
Copy link
Copy Markdown

Description

Partitions tasks by range and status to get better query results when indexes can't be loaded into memory.

Type of change

  • Refactor (non-breaking changes to code which doesn't change any behaviour)

Copy link
Copy Markdown
Collaborator

@mrkaye97 mrkaye97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a couple small questions, but lgtm 👍

Comment on lines +58 to +65
format('ALTER TABLE %s SET (
autovacuum_vacuum_scale_factor = ''0.1'',
autovacuum_analyze_scale_factor=''0.05'',
autovacuum_vacuum_threshold=''25'',
autovacuum_analyze_threshold=''25'',
autovacuum_vacuum_cost_delay=''10'',
autovacuum_vacuum_cost_limit=''1000''
)', targetNameWithStatus);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if these magic numbers are special, might be useful for future us to have a comment here explaining how we decided on them

Comment on lines +89 to +93
PERFORM create_v2_tasks_olap_partition_with_status(newTableName, 'QUEUED');
PERFORM create_v2_tasks_olap_partition_with_status(newTableName, 'RUNNING');
PERFORM create_v2_tasks_olap_partition_with_status(newTableName, 'COMPLETED');
PERFORM create_v2_tasks_olap_partition_with_status(newTableName, 'CANCELLED');
PERFORM create_v2_tasks_olap_partition_with_status(newTableName, 'FAILED');
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just calling out that this makes me nervous if we might add new enum values in the future - seems like the kind of thing we could easily forget to update. I don't really have any suggestions here though 🤷

readable_status = n.readable_status
FROM new_rows n
WHERE t.id = n.id
AND t.inserted_at = n.inserted_at;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq - do we need this second condition?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the primary key technically has it and the id isn't an identity column, it's a bigint which is sourced from the engine, so it might be necessary

@abelanger5 abelanger5 merged commit 175edf1 into main Feb 11, 2025
17 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants