|
| 1 | +# Paired Field Descriptor System |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +The Paired Field Descriptor is a Django model descriptor designed to manage fields with both manual and machine learning (ML) generated variants. This system provides a flexible approach to handling metadata fields, with a focus on tag management and priority handling. |
| 6 | + |
| 7 | +## Core Concepts |
| 8 | + |
| 9 | +### Field Pairing Mechanism |
| 10 | +The descriptor automatically creates two associated fields for each defined descriptor: |
| 11 | +- **Manual Field**: Manually entered or curated metadata |
| 12 | +- **ML Field**: Machine learning generated metadata |
| 13 | + |
| 14 | +### Key Characteristics |
| 15 | +- Manual field takes precedence over ML field |
| 16 | +- Flexible field type support |
| 17 | +- Handles empty arrays and None values |
| 18 | +- Requires explicit setting of ML fields |
| 19 | + |
| 20 | +## Implementation |
| 21 | + |
| 22 | +### Creating a Paired Field Descriptor |
| 23 | + |
| 24 | +```python |
| 25 | +tdamm_tag = PairedFieldDescriptor( |
| 26 | + field_name="tdamm_tag", |
| 27 | + field_type=ArrayField(models.CharField(max_length=255, choices=TDAMMTags.choices), blank=True, null=True), |
| 28 | + verbose_name="TDAMM Tags", |
| 29 | +) |
| 30 | +``` |
| 31 | + |
| 32 | +#### Parameters |
| 33 | +- `field_name`: Base name for the descriptor |
| 34 | +- `field_type`: Django field type (supports various field types) |
| 35 | +- `verbose_name`: Optional human-readable name |
| 36 | + |
| 37 | +### Field Naming Convention |
| 38 | +When you define a descriptor, two additional fields are automatically created: |
| 39 | +- `{field_name}_manual`: For manually entered values |
| 40 | +- `{field_name}_ml`: For machine learning generated values |
| 41 | + |
| 42 | +## Characteristics |
| 43 | + |
| 44 | +### Field Priority |
| 45 | +1. Manual field always takes precedence |
| 46 | +2. ML field serves as a fallback |
| 47 | +3. Empty manual fields or None values defer to ML field |
| 48 | + |
| 49 | +### Field Retrieval |
| 50 | +```python |
| 51 | +# Retrieval automatically prioritizes manual field |
| 52 | +tags = url.tdamm_tag # Returns manual tags if exist, otherwise ML tags |
| 53 | +``` |
| 54 | + |
| 55 | +### Field Setting |
| 56 | +```python |
| 57 | +# Sets only the manual field |
| 58 | +url.tdamm_tag = ["MMA_M_EM", "MMA_M_G"] |
| 59 | + |
| 60 | +# ML field must be set explicitly |
| 61 | +url.tdamm_tag_ml = ["MMA_O_BH"] |
| 62 | +``` |
| 63 | + |
| 64 | +### Field Deletion |
| 65 | +```python |
| 66 | +# Deletes both manual and ML fields |
| 67 | +del url.tdamm_tag |
| 68 | +``` |
| 69 | + |
| 70 | +### Data Preservation |
| 71 | +- Paired fields maintain their state during: |
| 72 | + - Dump to Delta migration |
| 73 | + - Delta to Curated promotion |
| 74 | +- Manual entries take precedence in all migration stages |
| 75 | + |
| 76 | +## Serializer Integration |
| 77 | + |
| 78 | +Here's the way to configure the serializer to retrieve the paired field, seamlessly extracting either manual or ML tags based on the descriptor's priority rules. |
| 79 | +```python |
| 80 | +class DeltaUrlSerializer(serializers.ModelSerializer): |
| 81 | + tdamm_tag = serializers.SerializerMethodField() |
| 82 | + |
| 83 | + class Meta: |
| 84 | + model = DeltaUrl |
| 85 | + fields = ("url", "tdamm_tag") |
| 86 | + |
| 87 | + def get_tdamm_tag(self, obj): |
| 88 | + tags = obj.tdamm_tag |
| 89 | + return tags if tags is not None else [] |
| 90 | +``` |
0 commit comments