Skip to content

CC-2454: Migrate from django-tagging to django-taggit#240

Open
joshmcarthur wants to merge 12 commits intomasterfrom
CC-2454-migrate-to-django-taggit
Open

CC-2454: Migrate from django-tagging to django-taggit#240
joshmcarthur wants to merge 12 commits intomasterfrom
CC-2454-migrate-to-django-taggit

Conversation

@joshmcarthur
Copy link

@joshmcarthur joshmcarthur commented Feb 3, 2026

This PR migrates NZSL Signbank from the vendored, unmaintained django-tagging package to django-taggit, a maintained alternative. The migration:

  • Removes legacy dependency code from the codebase
  • Provides a stable adapter API for future flexibility
  • Preserves migration history via no-op migrations
  • Maintains backward compatibility for templates and existing functionality

Architecture Changes

  • Adapter Pattern: Introduced signbank.tagging.adapter as a stable API layer backed by django-taggit. Application code now uses the adapter instead of directly importing tagging.models or taggit.models, allowing future backend swaps without changing business logic.
  • No-op Migrations: Created three no-op migrations in signbank.tagging.migrations that match the original django-tagging migration names (0001_initial, 0002_on_delete, 0003_adapt_max_tag_length). These are configured via MIGRATION_MODULES to satisfy existing migration dependencies without requiring the legacy dependency code.

Database Migrations

Two migrations handle the data migration:

Migration 0051: migrate_tags_to_taggit
  • Copies all tag data from legacy tagging_tag and tagging_taggeditem tables to taggit_tag and taggit_taggeditem:
  • Detects legacy tables via database introspection
  • Handles fresh installs (no-op if legacy tables don't exist)
  • Normalizes tag names according to FORCE_LOWERCASE_TAGS
  • Maps legacy tag IDs to new taggit tags
  • Copies TaggedItem relationships preserving content_type and object_id
  • Uses bulk_create with ignore_conflicts=True for idempotency
  • Reverse migration is a no-op (can't restore deleted legacy data)
  • Complexity: Handles ContentType conversion between historical and current models
Migration 0052: migrate_allowedtags_to_taggit
  • Migrates the AllowedTags.allowed_tags ManyToMany field from tagging.Tag to taggit.Tag:
    • Step 1: Reads existing M2M relationships before field alteration
    • Step 2: Clears old M2M relationships to prevent foreign key violations
    • Step 3: Alters the field definition to point at taggit.Tag (creates new M2M table)
    • Step 4: Populates new M2M table using raw SQL to work around Django's historical model limitations
  • Complexity: Multi-step process required because Django's AlterField on M2M fields drops and recreates the through table. We preserve data by reading it before clearing, then repopulating after the new table is created. Uses raw SQL in the final step because historical models can't reliably work with the new field definition.
Code Changes

Runtime code:

Updated all imports from tagging.models to signbank.tagging.adapter across:

  • signbank/dictionary/update.py
  • signbank/dictionary/adminviews.py
  • signbank/dictionary/admin.py
  • signbank/dictionary/forms.py
  • signbank/dictionary/views.py
  • signbank/dictionary/csv_import.py
  • signbank/comments.py

Associated tests

Template tags:

  • Implemented signbank.tagging.templatetags.tagging_tags providing the same {% load tagging_tags %} and {% tags_for_object %} API, so templates require no changes.

Settings:

  • Added taggit to INSTALLED_APPS
  • Added signbank.tagging to INSTALLED_APPS (for template tag discovery)
  • Configured MIGRATION_MODULES['tagging'] = 'signbank.tagging.migrations'
  • Removed tagging from INSTALLED_APPS (migrations handled via MIGRATION_MODULES)
  • Dependencies: Added django-taggit = "^6.1.0" to pyproject.toml

Backward Compatibility

  • Templates continue to use {% load tagging_tags %} without modification
  • Tagging functionality remains identical from a user perspective
  • Migration history is preserved for existing installations
  • Fresh installs work correctly (migrations handle missing legacy tables gracefully)

Testing Considerations

  • Migrations are idempotent and can be safely re-run
  • Both existing databases (with legacy tables) and fresh installs are supported
  • Tag normalization (FORCE_LOWERCASE_TAGS) is preserved throughout migration
  • All existing tag relationships (TaggedItem and AllowedTags) are preserved

Future Work

  • Legacy tagging_tag and tagging_taggeditem tables remain in existing databases but are no longer used. They can be dropped in a future migration if desired.
  • The adapter pattern allows swapping the backend implementation without changing application code.

Note: The vendor/django-tagging/ directory still exists in the repository but is no longer referenced in pyproject.toml. It can be removed in a follow-up cleanup commit if desired. We should do this, but I advise not removing it in this PR in case we need to flip back.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Migrates Signbank’s tagging implementation from the vendored django-tagging package to django-taggit, introducing a signbank.tagging adapter layer and updating app code, migrations, and tests to use the new tagging backend while keeping template tag compatibility.

Changes:

  • Added signbank.tagging adapter, models exports, and django-tagging-compatible template tags.
  • Added data migrations to copy legacy tag data and AllowedTags M2M relationships into taggit tables.
  • Replaced legacy tagging imports/usages across runtime code and updated affected tests; switched dependencies to django-taggit.

Reviewed changes

Copilot reviewed 24 out of 25 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
signbank/tagging/templatetags/tagging_tags.py Adds django-tagging-compatible template tags backed by the adapter.
signbank/tagging/templatetags/init.py Enables template tag module discovery.
signbank/tagging/models.py Provides legacy models for migration compatibility and exports taggit models for runtime.
signbank/tagging/migrations/init.py Adds migrations package for the tagging migration module override.
signbank/tagging/migrations/0001_initial.py Introduces legacy tagging schema migration (compatibility chain).
signbank/tagging/migrations/0002_on_delete.py Aligns legacy migration history with on_delete changes.
signbank/tagging/migrations/0003_adapt_max_tag_length.py Aligns legacy migration history with MAX_TAG_LENGTH behavior.
signbank/tagging/apps.py Defines app config (label tagging).
signbank/tagging/adapter.py Implements stable tagging API (add/remove/filter/usage) backed by taggit.
signbank/tagging/init.py Documents tagging module purpose.
signbank/settings/base.py Adds taggit + signbank.tagging, removes legacy app, overrides migrations via MIGRATION_MODULES.
signbank/dictionary/views.py Switches gloss creation tagging to adapter-backed tagging.
signbank/dictionary/update.py Replaces legacy TaggedItem/ContentType tagging logic with adapter calls.
signbank/dictionary/tests/test_csv_import.py Updates tagging-related test setup/assertions to use adapter/helpers.
signbank/dictionary/tests/test_adminviews.py Updates tagging-related test setup/assertions to use adapter.
signbank/dictionary/models.py Removes django-tagging registry usage; updates relation tag retrieval to adapter.
signbank/dictionary/migrations/0051_migrate_tags_to_taggit.py Copies legacy tags + tagged items into taggit tables (idempotent).
signbank/dictionary/migrations/0052_migrate_allowedtags_to_taggit.py Migrates AllowedTags M2M from legacy Tag to taggit.Tag while preserving data.
signbank/dictionary/forms.py Updates Tag import to the new tagging module export.
signbank/dictionary/csv_import.py Updates tagging imports to new exports and adapter helpers.
signbank/dictionary/adminviews.py Replaces legacy tag intersection queries with adapter-based filtering.
signbank/dictionary/admin.py Replaces legacy usage_for_model with adapter tags_usage_for_model.
signbank/comments.py Updates comment tagging flows and tag filtering to use adapter functions.
pyproject.toml Replaces vendored django-tagging dependency with django-taggit.
poetry.lock Locks django-taggit and updates dependency metadata.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +137 to +141
tag__in=tags
).values('object_id').annotate(
tag_count=Count('tag', distinct=True)
).filter(tag_count=len(tags))

Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter_queryset_with_all_tags can return incorrect results when some requested tag names don't exist: it filters TaggedItems by the tags that do exist, then compares tag_count to len(tags) (the existing tags), so requesting ['a','missing'] would match objects tagged only with 'a'. Ensure all requested normalized tag names exist first (e.g., compare tags.count() to len(tag_names)) and use the requested tag count for the intersection filter.

Copilot uses AI. Check for mistakes.
Comment on lines +218 to +221
# Get object IDs from the queryset
object_ids = list(queryset.values_list('pk', flat=True))
if not object_ids:
return []
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tags_usage_for_model materializes all PKs from a filtered queryset into Python (list(queryset.values_list(...))), which can be very expensive for large querysets and can blow memory. Prefer keeping this as a subquery (use queryset.values('pk')) and filtering tags_qs with that, avoiding full evaluation.

Suggested change
# Get object IDs from the queryset
object_ids = list(queryset.values_list('pk', flat=True))
if not object_ids:
return []
# Use a subquery for object IDs to avoid materializing them in Python
object_ids = queryset.values('pk')

Copilot uses AI. Check for mistakes.
Comment on lines +11 to +16
migrations.CreateModel(
name='Tag',
fields=[
('id', models.AutoField(
verbose_name='ID', serialize=False,
auto_created=True, primary_key=True)),
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says the signbank.tagging.migrations are "no-op" placeholders, but 0001_initial actually creates legacy tagging_tag/tagging_taggeditem tables on fresh installs. If the intent is truly no-op, these operations should be removed/replaced with empty migrations; otherwise please update the PR description to reflect that fresh installs will create (unused) legacy tables.

Copilot uses AI. Check for mistakes.
Comment on lines +21 to +24
# Check if legacy tagging tables exist
db_table_names = connection.introspection.table_names()
legacy_tag_table = 'tagging_tag'
legacy_taggeditem_table = 'tagging_taggeditem'
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration uses the global django.db.connection for introspection; in migrations it's safer to use schema_editor.connection so the correct DB alias/connection is used (and the migration behaves correctly in multi-DB setups).

Copilot uses AI. Check for mistakes.
Comment on lines +79 to +95
from django.conf import settings
from django.db import connection

AllowedTags = apps.get_model('dictionary', 'AllowedTags')
# Use historical taggit.Tag model to get tag IDs
TaggitTagHistorical = apps.get_model('taggit', 'Tag')
mappings = _migration_data.get('mappings', {})
force_lowercase = getattr(settings, 'FORCE_LOWERCASE_TAGS', False)

# Django creates M2M through tables with the pattern: {model_table}_{field_name}
# Get the model's table name dynamically
model_table = AllowedTags._meta.db_table
# Construct the through table name
through_table = f'{model_table}_allowed_tags'
# Field names follow: {model_name}_id and {related_model_name}_id
allowedtags_field = 'allowedtags_id'
tag_field = 'tag_id'
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

populate_allowedtags_data opens a cursor from the global django.db.connection and builds SQL with unquoted identifiers (table/column names). Use schema_editor.connection and schema_editor.quote_name(...) (or the DB ops quoting helpers) for identifiers, and consider inserting in bulk (e.g., executemany / ON CONFLICT DO NOTHING) to avoid per-row SELECT+INSERT loops.

Copilot uses AI. Check for mistakes.
Comment on lines +133 to +143
bits = token.contents.split()
if len(bits) != 5:
raise template.TemplateSyntaxError(
_('%s tag requires exactly four arguments') % bits[0])
if bits[2] != 'in':
raise template.TemplateSyntaxError(
_("second argument to %s tag must be 'in'") % bits[0])
if bits[4] != 'as':
raise template.TemplateSyntaxError(
_("fourth argument to %s tag must be 'as'") % bits[0])
return TaggedObjectsNode(bits[1], bits[3], bits[5])
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tagged_objects template tag parsing is broken: the token for {% tagged_objects [tag] in [model] as [varname] %} splits into 6 bits, but the code checks len(bits) != 5 and then accesses bits[5], which will raise IndexError for valid usage. Update the argument count check and indices so as is validated at bits[4] and the context variable is taken from bits[5].

Copilot uses AI. Check for mistakes.
return self.name


class TaggedItem(models.Model):
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to 'TaggedItem' is unnecessary as it is redefined before this value is used.

Copilot uses AI. Check for mistakes.
from notifications.signals import notify

from .dictionary.models import AllowedTags
from signbank.tagging.adapter import add_tag, filter_queryset_with_all_tags, remove_tag
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'remove_tag' is not used.

Suggested change
from signbank.tagging.adapter import add_tag, filter_queryset_with_all_tags, remove_tag
from signbank.tagging.adapter import add_tag, filter_queryset_with_all_tags

Copilot uses AI. Check for mistakes.
from django_comments.models import Comment
from guardian.shortcuts import get_objects_for_user, get_perms
from tagging.models import Tag, TaggedItem
from signbank.tagging.adapter import add_tag, filter_queryset_with_all_tags
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'add_tag' is not used.
Import of 'filter_queryset_with_all_tags' is not used.

Suggested change
from signbank.tagging.adapter import add_tag, filter_queryset_with_all_tags

Copilot uses AI. Check for mistakes.

from tagging.models import Tag
from signbank.tagging.adapter import add_tag
from signbank.tagging.models import Tag
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'Tag' is not used.

Suggested change
from signbank.tagging.models import Tag

Copilot uses AI. Check for mistakes.
@joshmcarthur joshmcarthur marked this pull request as ready for review February 16, 2026 00:06
from django.core.exceptions import ObjectDoesNotExist
from django.db.utils import OperationalError, ProgrammingError
from django.dispatch import receiver
from django.forms import ModelForm

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these imports I don't see actually used below, but I presume github knows what it's doing since it hasn't complained.

Copy link

@jonholdsworth jonholdsworth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's too much code for me to notice if there are any minor gotchas.
The overall principle looks sound.
I understand the keeping legacy Tag data in the classes while copying over.
I imagine this may take a few iterations.

Great work, let's see what it does!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants