[Python] Create field descriptor for dataclass field metadata

## Overview

This issue outlines the implementation plan for adding field-level metadata support to Python Fory (pyfory), enabling fine-grained control over serialization behavior per field. This follows the pattern established by Rust (`#[fory(...)]` attributes) and Go (`fory:"..."` struct tags) implementations.

**Related Issues:**
- GitHub Issue #3002 (Python field descriptor)
- GitHub Issue #3000 (Java @ForyField annotation)
- #3003
- #3004
- #1017 

## Design Goals

1. **Python-Idiomatic API**: Leverage dataclasses.field() with Fory-specific metadata
2. **Explicit Over Implicit**: Users explicitly control nullable/ref per field
3. **Zero Breaking Changes**: Existing code without field metadata works unchanged
4. **Performance First**: Pre-compute field info at registration time, optimize JIT codegen
5. **Cross-Language Compatibility**: Support TAG_ID encoding per xlang spec

## Key Design Decisions

### Default Values

| Option | Default | Notes |
|--------|---------|-------|
| `id` | **required** | Must specify: `-1` for field name, `>=0` for tag ID |
| `nullable` | `False` | No null flag written; exception: `Optional[T]` is always `True` |
| `ref` | `False` | No reference tracking; must explicitly enable with `ref=True` |
| `ignore` | `False` | Field is serialized |

### Nullable Rules

1. **Non-Optional fields**: `nullable=False` by default (no null flag, saves 1 byte)
2. **Optional[T] fields**: `nullable=True` is required; setting `nullable=False` raises `ValueError`
3. **Explicit override**: Use `nullable=True` for non-Optional fields that may be None

### Ref Tracking Rules

1. **All fields**: `ref=False` by default (no IdentityMap overhead)
2. **Explicit enable**: Use `ref=True` for fields with circular/shared references
3. **Global override**: If `Fory(ref_tracking=False)`, ALL fields use `ref=False` regardless of field setting
4. **Hash computation**: Uses field-level `ref` setting only (stable, independent of Fory config)
5. **Serializer generation**: Combines field-level `ref` with global config (global False overrides field True)

## API Design

### Core API: `field()` Function

```python
from dataclasses import MISSING
from typing import Any, Callable, Mapping, Optional

def field(
    id: int,                 # Tag ID (required): -1 = use field name, >=0 = use numeric tag ID
    *,
    # Fory-specific options
    nullable: bool = False,  # Whether null flag is written (default: False, auto-True for Optional[T])
    ref: bool = False,       # Whether ref tracking enabled (default: False, must explicitly enable)
    ignore: bool = False,    # Whether to ignore this field during serialization

    # Standard dataclass.field() options (passthrough)
    default: Any = MISSING,
    default_factory: Callable[[], Any] = MISSING,
    init: bool = True,
    repr: bool = True,
    hash: Optional[bool] = None,
    compare: bool = True,
    metadata: Optional[Mapping[str, Any]] = None,
    **kwargs,              # Forward any additional args to dataclasses.field()
) -> Any:
    """
    Create a field with Fory-specific serialization metadata.

    This wraps dataclasses.field() and stores Fory configuration in field.metadata.

    Args:
        id: Field tag ID (required positional parameter).
            - -1: Use field name with meta string encoding
            - >=0: Use numeric tag ID (more compact, stable across renames)
            Must be unique within the class (except -1).
            Required to force explicit choice about schema evolution strategy.

        nullable: Whether to write null flag for this field.
            - False (default): Skip null flag, field cannot be None
            - True: Write null flag (1 byte overhead), field can be None
            Note: For Optional[T] fields, nullable is automatically True
            regardless of this parameter.

        ref: Whether to enable reference tracking for this field.
            - False (default): No tracking, skip IdentityMap overhead
            - True: Track references (handles circular refs, shared objects)
            Note: Must be explicitly set to True when needed. Not inherited
            from Fory instance's ref_tracking config.

        ignore: Whether to ignore this field during serialization.
            - True: Field is excluded from serialization
            - False (default): Field is serialized

        default, default_factory, init, repr, hash, compare, metadata:
            Standard dataclass.field() parameters, passed through.

    Returns:
        A dataclass field descriptor with Fory metadata attached.

    Example:
        @dataclass
        class User:
            # Compact encoding with tag ID 0, non-nullable
            name: str = pyfory.field(0)

            # Tag ID 1, explicitly nullable
            email: Optional[str] = pyfory.field(1, nullable=True)

            # Tag ID 2, enable ref tracking
            friends: List[User] = pyfory.field(2, ref=True, default_factory=list)

            # Use field name encoding (id=-1), ignore this field
            _cache: dict = pyfory.field(-1, ignore=True, default_factory=dict)
    """
```

### ForyFieldMeta Data Class

```python
from dataclasses import dataclass

@dataclass(frozen=True)
class ForyFieldMeta:
    """Parsed Fory field metadata extracted from field.metadata."""

    id: int                           # Required: -1 = use field name, >=0 = use tag ID
    nullable: bool = False            # Whether null flag is written
    ref: bool = False                 # Whether ref tracking is enabled
    ignore: bool = False              # Whether to ignore this field

    def uses_tag_id(self) -> bool:
        """Returns True if this field uses tag ID encoding."""
        return self.id >= 0

    def validate_nullable(self, field_name: str, type_hint: type) -> None:
        """
        Validate nullable setting against type hint.

        Raises:
            ValueError: If Optional[T] field has nullable=False
        """
        if is_optional_type(type_hint) and not self.nullable:
            raise ValueError(
                f"Field '{field_name}' is Optional[T] but nullable=False. "
                f"Optional fields must have nullable=True (or omit the parameter)."
            )

    def effective_nullable(self, type_hint: type) -> bool:
        """
        Returns effective nullable value.

        Rules:
        - Optional[T] fields must have nullable=True (validated separately)
        - Other fields use the configured nullable value (default: False)
        """
        if is_optional_type(type_hint):
            return True  # Already validated that nullable=True
        return self.nullable

    def effective_ref(self) -> bool:
        """Returns ref tracking value (no inheritance from global config)."""
        return self.ref
```

### Metadata Storage

Fory metadata is stored in `field.metadata["__fory__"]`:

```python
FORY_FIELD_METADATA_KEY = "__fory__"

def field(...) -> Any:
    # Build Fory metadata
    fory_meta = ForyFieldMeta(
        id=id,
        nullable=nullable,
        ref=ref,
        ignore=ignore,
    )

    # Merge with user-provided metadata
    combined_metadata = dict(metadata) if metadata else {}
    combined_metadata[FORY_FIELD_METADATA_KEY] = fory_meta

    # Create dataclass field with combined metadata
    return dataclasses.field(
        default=default,
        default_factory=default_factory,
        init=init,
        repr=repr,
        hash=hash,
        compare=compare,
        metadata=combined_metadata,
        **kwargs,  # Forward any additional args
    )
```

## Type Utilities

Reuse existing utilities from `pyfory/type.py`:

```python
from pyfory.type import is_optional_type, unwrap_optional

# is_optional_type(type_) -> bool
#   Check if type is Optional[T] or Union[T, None]

# unwrap_optional(type_, field_nullable=False) -> tuple[type, bool]
#   Unwrap Optional[T] to (T, True) or return (type_, False)
```

**No new utilities needed** - `type.py` already provides these functions.

## Implementation Architecture

### Component Overview

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                           User Code                                         │
│                                                                             │
│  @dataclass                                                                 │
│  class User:                                                                │
│      name: str = pyfory.field(0)                                            │
│      email: Optional[str] = pyfory.field(1, nullable=True)                  │
│      age: int32 = pyfory.field(2)                                           │
│      _cache: dict = pyfory.field(-1, ignore=True, default_factory=dict)     │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                      Type Registration (Fory.register())                    │
│                                                                             │
│  1. Extract field metadata from dataclass fields                            │
│  2. Validate tag IDs are unique                                             │
│  3. Compute effective nullable/ref per field                                │
│  4. Filter out ignored fields                                               │
│  5. Build FieldInfo list with pre-computed flags                            │
│  6. Compute struct fingerprint (includes field metadata)                    │
│  7. Create DataClassSerializer with field metadata                          │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         DataClassSerializer                                 │
│                                                                             │
│  Fields:                                                                    │
│  - _schema_field_infos: List[FieldInfo]  # From type hints (for hash)      │
│  - _runtime_field_infos: List[FieldInfo] # With global config applied      │
│  - _hash: int32                          # Schema fingerprint hash         │
│                                                                             │
│  JIT Codegen:                                                               │
│  - Generate write/read methods with field-specific logic                    │
│  - Skip null flag for non-nullable fields                                   │
│  - Skip ref tracking for ref=False fields                                   │
│  - Use tag ID encoding when id >= 0                                         │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘
```

### FieldInfo Data Structure

```python
@dataclass
class FieldInfo:
    """Pre-computed field information for serialization."""

    # Identity
    name: str                        # Field name (snake_case)
    index: int                       # Field index in dataclass
    type_hint: type                  # Type annotation

    # Fory metadata (from pyfory.field()) - used for hash computation
    tag_id: int                      # Required: -1 or >=0
    nullable: bool                   # Effective nullable flag (considers Optional[T])
    ref: bool                        # Field-level ref setting (for hash computation)
    ignore: bool                     # Whether to ignore (always False in this list)

    # Runtime flags (combines field metadata with global Fory config)
    runtime_ref_tracking: bool       # Actual ref tracking: field.ref AND fory.ref_tracking

    # Derived info
    type_id: int                     # Fory TypeId
    serializer: Serializer           # Field serializer
    unwrapped_type: type             # Type with Optional unwrapped

    # Pre-computed flags for codegen (based on runtime flags)
    needs_null_flag: bool            # Whether to write null flag
    needs_ref_tracking: bool         # Whether to track references (= runtime_ref_tracking)
    uses_tag_id: bool                # Whether to use tag ID encoding
```

**Key distinction:**
- `ref`: Field-level setting from `pyfory.field()`, used for hash computation (stable)
- `runtime_ref_tracking`: `field.ref AND fory.ref_tracking`, used for actual serialization

## Implementation Steps

### Phase 1: Core API and Metadata Extraction

**Files to modify/create:**
- `python/pyfory/field.py` (NEW) - field() function and ForyFieldMeta
- `python/pyfory/__init__.py` - Export field

**Tasks:**
1. Create `field()` function wrapping `dataclasses.field()`
2. Create `ForyFieldMeta` dataclass for parsed metadata
3. Create `extract_field_meta()` helper to read metadata from fields
4. Create `validate_field_metas()` to check:
   - Tag ID uniqueness (no duplicate IDs >= 0)
   - Nullable consistency (Optional[T] with nullable=False raises ValueError)
5. Reuse `is_optional_type()` and `unwrap_optional()` from `pyfory/type.py`

### Phase 2: Serializer Integration

**Files to modify:**
- `python/pyfory/struct.py` - DataClassSerializer

**Tasks:**
1. Modify `DataClassSerializer.__init__()` to extract field metadata
2. Create `FieldInfo` dataclass for pre-computed field info
3. Update `_get_field_names()` to filter out ignored fields
4. Update `_nullable_fields` computation to use field metadata
5. Add `_ref_tracking_fields` dict for per-field ref control

### Phase 3: Fingerprint Computation

**Files to modify:**
- `python/pyfory/struct.py` - compute_struct_fingerprint()

**Tasks:**
1. Update fingerprint format to include field metadata:
   ```
   <field_id_or_name>,<type_id>,<ref>,<nullable>;
   ```
2. Use tag ID as sort key when id >= 0, otherwise use field name
3. Include ref flag in fingerprint (from field annotation, not runtime config)
4. Update `compute_struct_meta()` to pass field metadata

### Phase 4: JIT Codegen Updates

**Files to modify:**
- `python/pyfory/struct.py` - _gen_write_method(), _gen_read_method(), etc.
- `python/pyfory/codegen.py` - Code generation utilities

**Tasks:**
1. Update write codegen to skip null flag when `nullable=False`
2. Update write codegen to skip ref tracking when `ref=False`
3. Update read codegen accordingly
4. Update xwrite/xread methods for xlang mode
5. Optimize primitive field serialization for non-nullable fields

### Phase 5: TypeDef Encoding (Compatible Mode)

**Files to modify:**
- `python/pyfory/meta/typedef_encoder.py`
- `python/pyfory/meta/typedef_decoder.py`

**Tasks:**
1. Update TypeDef encoding to use tag ID when available
2. Write field header with TAG_ID encoding (2 bits = 0b11)
3. Write tag ID as varint instead of field name
4. Include nullable and ref flags in field header
5. Update decoder to handle TAG_ID encoded fields

### Phase 6: Cython Integration

**Files to modify:**
- `python/pyfory/serialization.pyx`

**Tasks:**
1. Add FieldInfo handling in Cython serialization code
2. Optimize field metadata access for Cython
3. Update Cython read/write paths for field metadata

### Phase 7: Testing

**Files to create:**
- `python/pyfory/tests/test_field_meta.py` (NEW)
- Update `python/pyfory/tests/xlang_test_main.py`

**Test cases:**
1. Basic pyfory.field() usage with all options
2. Tag ID uniqueness validation
3. Type-based nullable inference
4. Ref tracking per field
5. Ignored fields not serialized
6. Fingerprint computation with field metadata
7. Cross-language compatibility with Java/Rust/Go
8. Schema evolution with tag IDs
9. Mixed fields (some with metadata, some without)

## Fingerprint Format

The struct fingerprint format (matching Java/Rust/Go):

```
<field_id_or_name>,<type_id>,<ref>,<nullable>;
```

**Components:**
- `field_id_or_name`: Tag ID as string (e.g., "0", "1") if id >= 0, otherwise snake_case field name
- `type_id`: Fory TypeId as decimal string (e.g., "4" for INT32)
- `ref`: "1" if `ref=True` in field annotation, "0" otherwise (NOT affected by global Fory config)
- `nullable`: "1" if null flag is written, "0" otherwise

**Important:** The fingerprint uses field-level `ref` setting only, independent of `Fory(ref_tracking=...)`.
This ensures the hash is stable across different Fory instances with different configurations.

**Example fingerprints:**
```
# With tag IDs:
0,4,0,0;1,12,0,1;2,0,0,1;

# With field names:
age,4,0,0;email,12,0,1;name,9,0,0;
```

**Hash computation:**
```python
fingerprint = compute_struct_fingerprint(fields)
hash_bytes = fingerprint.encode("utf-8")
full_hash = murmurhash3_x64_128(hash_bytes, seed=47)
type_hash_32 = int32(full_hash & 0xFFFFFFFF)
```

## TypeDef Field Header Encoding

Per xlang spec, field header is 8 bits:

```
2 bits encoding + 4 bits size + 1 bit nullable + 1 bit ref_tracking
```

**TAG_ID encoding (when id >= 0):**
```
| 2 bits encoding (0b11) | 4 bits tag_id | 1 bit nullable | 1 bit ref_tracking |
```

When tag ID > 15, write additional varint for (tag_id - 15).

**Field name encoding (when id < 0):**
```
| 2 bits encoding (0b00-10) | 4 bits name_size | 1 bit nullable | 1 bit ref_tracking |
| meta string encoded field name |
```

## API Examples

### Basic Usage

```python
from dataclasses import dataclass
from typing import Optional, List
import pyfory
from pyfory import int32

@dataclass
class User:
    # Compact tag ID encoding, non-nullable (saves 1 byte)
    id: int32 = pyfory.field(0)

    # Tag ID 1, non-nullable string
    name: str = pyfory.field(1)

    # Tag ID 2, nullable (required for Optional)
    email: Optional[str] = pyfory.field(2, nullable=True)

    # Tag ID 3, enable ref tracking for circular references
    friends: List["User"] = pyfory.field(3, ref=True, default_factory=list)

    # Use field name encoding (-1), ignore this field (not serialized)
    _cache: dict = pyfory.field(-1, ignore=True, default_factory=dict)

# Usage
fory = pyfory.Fory(ref_tracking=True)
fory.register(User)

user = User(id=1, name="Alice", email="alice@example.com")
data = fory.serialize(user)
restored = fory.deserialize(data)
```

### Mixed Fields (Gradual Adoption)

```python
@dataclass
class MixedStruct:
    # New-style with field metadata (tag IDs)
    id: int32 = field(0)
    name: str = field(1)

    # New-style with field name encoding (id=-1)
    description: Optional[str] = ield(-1, nullable=True)
    count: int32 = field(-1)
```

### Schema Evolution

```python
# Version 1
@dataclass
class ConfigV1:
    timeout: int32 = pyfory.field(0)
    retries: int32 = pyfory.field(1)

# Version 2 - Added new field, renamed existing
@dataclass
class ConfigV2:
    timeout_ms: int32 = pyfory.field(0)  # Same tag ID, different name OK
    max_retries: int32 = pyfory.field(1) # Same tag ID, different name OK
    enabled: bool = pyfory.field(2)      # New field with new tag ID
```

## Performance Considerations

1. **Pre-computation**: All field metadata is computed once at registration time
2. **JIT Codegen**: Generated methods include field-specific optimizations
3. **Skip null flags**: Non-nullable primitives save 1 byte per field
4. **Skip ref tracking**: Fields with ref=False skip IdentityMap lookups
5. **Tag ID encoding**: Numeric IDs are more compact than field name strings


Option	Default	Notes
`id`	required	Must specify: `-1` for field name, `>=0` for tag ID
`nullable`	`False`	No null flag written; exception: `Optional[T]` is always `True`
`ref`	`False`	No reference tracking; must explicitly enable with `ref=True`
`ignore`	`False`	Field is serialized

[Python] Create field descriptor for dataclass field metadata #3002

Description

Overview

Design Goals

Key Design Decisions

Default Values

Nullable Rules

Ref Tracking Rules

API Design

Core API: field() Function

ForyFieldMeta Data Class

Metadata Storage

Type Utilities

Implementation Architecture

Component Overview

FieldInfo Data Structure

Implementation Steps

Phase 1: Core API and Metadata Extraction

Phase 2: Serializer Integration

Phase 3: Fingerprint Computation

Phase 4: JIT Codegen Updates

Phase 5: TypeDef Encoding (Compatible Mode)

Phase 6: Cython Integration

Phase 7: Testing

Fingerprint Format

TypeDef Field Header Encoding

API Examples

Basic Usage

Mixed Fields (Gradual Adoption)

Schema Evolution

Performance Considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Core API: `field()` Function