Expand data domain with 6 new validated types

## Summary

The data domain currently has only 3 types (`KafkaTopicName`, `SqlIdentifier`, `TableIdentifier`). This issue proposes 6 new types to bring it to 9 — all vendor-agnostic data engineering formats backed by formal specs or widely-adopted standards.

## Proposed Types

### 1. `JsonPointer` (Pattern B — Annotated)

- **Spec**: [RFC 6901](https://datatracker.ietf.org/doc/html/rfc6901)
- **Domain**: `data/`
- **Format**: Empty string or sequences of `/` followed by reference tokens. Escape rules: `~0` for `~`, `~1` for `/`
- **Regex**: `^(/([^~/]|~0|~1)*)*$`
- **Usage**: Central to JSON Patch (RFC 6902), OpenAPI `$ref` resolution, JSON Schema. Escape rules are a common source of bugs.
- **Examples**: `""`, `"/foo"`, `"/foo/0"`, `"/a~1b"` (refers to key `a/b`), `"/m~0n"` (refers to key `m~n`)

### 2. `HivePartitionPath` (Pattern A — str subclass)

- **Spec**: [Apache Hive partitioning convention](https://hive.apache.org/docs/latest/hcatalog-dynamicpartitions_34014006/)
- **Domain**: `data/`
- **Format**: `key=value/key=value/...` where keys are valid identifiers
- **Regex**: `^([a-zA-Z_][a-zA-Z0-9_]*=[^/]+)(/[a-zA-Z_][a-zA-Z0-9_]*=[^/]+)*$`
- **Parsed properties**: `.partitions` → `dict[str, str]` of partition key-value pairs
- **Usage**: Ubiquitous in data lake architectures — Spark, Databricks, AWS Athena, Presto/Trino, Delta Lake, Iceberg. Every data engineer writes these daily. Malformed partitions are a common source of pipeline failures.
- **Examples**: `"year=2024"`, `"year=2024/month=01/day=15"`, `"region=us-east-1/dt=2024-01-15"`

### 3. `Doi` (Pattern A — str subclass)

- **Spec**: [ISO 26324](https://www.doi.org/), [Crossref regex](https://www.crossref.org/blog/dois-and-matching-regular-expressions/)
- **Domain**: `data/`
- **Format**: `10.NNNN/suffix` where NNNN is a registrant code (4+ digits)
- **Regex**: `^10\.\d{4,9}/[-._;()/:A-Za-z0-9]+$` (covers 99.3% of all DOIs per Crossref)
- **Parsed properties**: `.prefix` (registrant code), `.suffix`
- **Usage**: Universal in scientific publishing, dataset registries (Zenodo, DataCite, Figshare), ML model cards, and data catalog metadata. Increasingly common in ML experiment tracking.
- **Examples**: `"10.1000/xyz123"`, `"10.1038/nature12373"`, `"10.5281/zenodo.1234567"`

### 4. `AvroFullName` (Pattern A — str subclass)

- **Spec**: [Apache Avro 1.11.1 Specification](https://avro.apache.org/docs/1.11.1/specification/)
- **Domain**: `data/`
- **Format**: Fully qualified name for Avro records/enums/fixed types. `namespace.name` where each component matches `[A-Za-z_][A-Za-z0-9_]*`
- **Regex**: `^[A-Za-z_][A-Za-z0-9_]*(\.[A-Za-z_][A-Za-z0-9_]*)*$`
- **Parsed properties**: `.namespace` (everything before last dot, or `None`), `.name` (last component)
- **Usage**: Every Kafka + Schema Registry deployment (Confluent, Redpanda). Also applies to Protobuf fully qualified names (same format). Frequently misconfigured.
- **Examples**: `"User"`, `"com.example.UserEvent"`, `"io.confluent.kafka.AvroMessage"`

### 5. `SchemaRegistrySubject` (Pattern A — str subclass)

- **Spec**: [Confluent Schema Registry — Subject Name Strategy](https://docs.confluent.io/platform/current/schema-registry/fundamentals/serdes-develop/index.html)
- **Domain**: `data/`
- **Format**: Default TopicNameStrategy: `<topic>-key` or `<topic>-value`
- **Regex**: `^[a-zA-Z0-9._-]+-(?:key|value)$`
- **Parsed properties**: `.topic` (topic name), `.record_type` (`"key"` or `"value"`)
- **Usage**: Every Kafka + Schema Registry deployment. Natural companion to existing `KafkaTopicName`. The TopicNameStrategy is the default and dominant naming pattern.
- **Examples**: `"user-events-value"`, `"order.created-key"`, `"payments-value"`

### 6. `ElasticsearchIndexName` (Pattern B — Annotated)

- **Spec**: [Elasticsearch index naming restrictions](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html)
- **Domain**: `data/`
- **Validation rules**:
  - Lowercase only
  - Max 255 bytes
  - Cannot start with `-`, `_`, or `+`
  - Cannot be `.` or `..`
  - Forbidden chars: `\`, `/`, `*`, `?`, `"`, `<`, `>`, `|`, `#`, `,`, space
- **Usage**: Every ELK/OpenSearch deployment. Index name validation is surprisingly nuanced and most people get it wrong. Multiple Pydantic+ES projects (esorm, pydastic) reinvent this validation.
- **Examples**: `"logs-2024.01.15"`, `"user-events"`, `"metrics-prod"`
- **Note**: Elasticsearch is vendor-agnostic (open-source, runs anywhere). Same rules apply to AWS OpenSearch.

## What goes where

All 6 types go into `data/` — they are vendor-agnostic data engineering formats, not tied to any specific cloud provider.

`DataUri` (RFC 2397) was also considered but belongs in `web/` — tracked separately if pursued.

## Verification

- None of these overlap with Pydantic core, pydantic-extra-types, or schwifty
- All are regex/parsing-only — no external service calls
- All have formal specs or widely-adopted standards behind them
- All follow existing pydantypes patterns (Pattern A or B per ARCHITECTURE.md)

## Implementation order

Suggested file structure:
- `src/pydantypes/data/json.py` — `JsonPointer`
- `src/pydantypes/data/hive.py` — `HivePartitionPath`
- `src/pydantypes/data/doi.py` — `Doi`
- `src/pydantypes/data/avro.py` — `AvroFullName`
- `src/pydantypes/data/schema_registry.py` — `SchemaRegistrySubject`
- `src/pydantypes/data/elasticsearch.py` — `ElasticsearchIndexName`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand data domain with 6 new validated types #10

Summary

Proposed Types

1. `JsonPointer` (Pattern B — Annotated)

2. `HivePartitionPath` (Pattern A — str subclass)

3. `Doi` (Pattern A — str subclass)

4. `AvroFullName` (Pattern A — str subclass)

5. `SchemaRegistrySubject` (Pattern A — str subclass)

6. `ElasticsearchIndexName` (Pattern B — Annotated)

What goes where

Verification

Implementation order

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Expand data domain with 6 new validated types #10

Description

Summary

Proposed Types

1. JsonPointer (Pattern B — Annotated)

2. HivePartitionPath (Pattern A — str subclass)

3. Doi (Pattern A — str subclass)

4. AvroFullName (Pattern A — str subclass)

5. SchemaRegistrySubject (Pattern A — str subclass)

6. ElasticsearchIndexName (Pattern B — Annotated)

What goes where

Verification

Implementation order

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. `JsonPointer` (Pattern B — Annotated)

2. `HivePartitionPath` (Pattern A — str subclass)

3. `Doi` (Pattern A — str subclass)

4. `AvroFullName` (Pattern A — str subclass)

5. `SchemaRegistrySubject` (Pattern A — str subclass)

6. `ElasticsearchIndexName` (Pattern B — Annotated)