Skip to content

[Rust] Add #[fory()] field attributes for optimization metadata #3004

@chaokunyang

Description

@chaokunyang

Feature Request

Extend the #[derive(ForyObject)] macro to support #[fory()] field attributes for performance and space optimization during xlang serialization.

Is your feature request related to a problem? Please describe

Currently, Fory's Rust xlang serialization treats all struct fields uniformly:

  1. Null checks are always performed - Even for fields that are never null, Fory writes a null/ref flag (1 byte per field)
  2. Reference tracking is always applied (when enabled globally) - Even for fields that won't be shared/cyclic, objects are tracked with hash lookup cost
  3. Field names use meta string encoding - In schema evolution mode, field names are encoded using meta string compression, but for fields with long names, this still takes space

These defaults ensure correctness but introduce unnecessary overhead when the developer has more specific knowledge about their data model.

Describe the solution you'd like

Extend the #[fory()] attribute to support field-level metadata:

use fory::ForyObject;

#[derive(ForyObject)]
struct Foo {
    // Field f1: non-nullable (default), no ref tracking (default)
    // Tag ID 0 provides compact encoding in schema evolution mode
    #[fory(id = 0)]
    f1: String,
    
    // Field f2: non-nullable (default), no ref tracking (default)
    #[fory(id = 1)]
    f2: Bar,
    
    // Field f3: nullable field that may contain null values
    #[fory(id = 2, nullable = false)]
    f3: Option<String>,
    
    // Field f4: shared reference that needs tracking (e.g., for circular refs)
    #[fory(id = 3, ref = true, nullable)]
    parent: Option<Rc<Node>>,
    
    // Field with long name: tag ID provides significant space savings
    #[fory(id = 4)]
    very_long_field_name_that_would_take_many_bytes: String,
    
    // Explicit opt-out: use field name encoding but get nullable optimization
    #[fory(id = -1, nullable)]
    optional_field: Option<String>,
}

Attribute Syntax

#[fory(
    id = <i32>,           // REQUIRED: Tag ID for field encoding, check in rust ForyObject macro
                          // >= 0: Use tag ID encoding
                          // -1: Use field name encoding (opt-out)
    
    nullable,             // Optional: Field can be None (default: false)
                          // Required for Option<T> types
    
    ref,                  // Optional: Track references (default: false)
                          // Useful for Rc<T>, Arc<T>, circular references
)]

Design Decision: Required id

The id attribute is required when using #[fory()] on a field:

  • id = 0 to id = N: Use tag ID encoding (compact)
  • id = -1: Explicit opt-out, use field name encoding
  • When no id is configured, use field name encoding

Rationale:

  1. Explicit control: Using #[fory()] means opting into explicit control
  2. Compile-time validation: Proc macro can check for duplicate IDs
  3. Proven pattern: Similar to protobuf field numbers

Optimization Details

1. Non-nullable (Default) Optimization

When nullable is NOT specified:

  • Skip writing the null flag entirely (1 byte saved per field)
  • Directly serialize the field value
  • Compile error if field type is Option<T> with nullable=true
  • Only Option<T> is nullable by default, other fields must use nullable macr attrs to mark as nullable.

2. No Ref Tracking (Default) Optimization

When ref is NOT specified:

  • Skip reference tracking map operations
  • Skip ref flag when combined with non-nullable
  • For Rc<T>/Arc<T>, ref is true by default, consider adding ref=false if no shared refs are possible

3. Tag ID Optimization

When id = N where N >= 0:

  • Field name encoded as varint instead of meta string
  • Significant space savings for long field names

Space savings:

Field Name Meta String (approx) Tag ID
f1 ~2 bytes 1 byte
user_name ~6 bytes 1 byte
transaction_id ~10 bytes 1 byte

Implementation Notes

  1. Proc Macro Enhancement:

    // In fory-derive/src/object.rs
    #[proc_macro_derive(ForyObject, attributes(fory))]
    pub fn derive_fory_object(input: TokenStream) -> TokenStream {
        // Parse #[fory(id = N, nullable, ref)] attributes
        // Generate optimized serialization code based on attributes
    }
  2. Code Generation:

    // Generated code for #[fory(id = 0)] (non-nullable, no ref)
    fn serialize_field_f1(&self, writer: &mut Writer) {
        // No null check, no ref tracking
        writer.write_string(&self.f1);
    }
    
    // Generated code for #[fory(id = 2, nullable)]
    fn serialize_field_f3(&self, writer: &mut Writer) {
        match &self.f3 {
            Some(v) => {
                writer.write_not_null();
                writer.write_string(v);
            }
            None => writer.write_null(),
        }
    }
  3. Compile-time Validation:

    • Error if duplicate tag IDs (>= 0) in same struct
    • Error if id < -1
    • Error if Option<T> field without nullable
    • Warning if Rc<T>/Arc<T> without ref (potential circular ref issues)
  4. Runtime Validation:

    • Panic if non-nullable field serialized with None value (shouldn't happen in Rust)

Example: Generated Code

#[derive(ForyObject)]
struct Foo {
    #[fory(id = 0)]
    name: String,
    
    #[fory(id = 1, nullable)]
    nickname: Option<String>,
}

// Generates approximately:
impl ForySerialize for Foo {
    fn serialize(&self, writer: &mut Writer) -> Result<()> {
        // Field: name (id=0, non-nullable, no ref)
        writer.write_tag_id(0);
        writer.write_string(&self.name)?;
        
        // Field: nickname (id=1, nullable, no ref)
        writer.write_tag_id(1);
        match &self.nickname {
            Some(v) => {
                writer.write_byte(NOT_NULL_FLAG);
                writer.write_string(v)?;
            }
            None => writer.write_byte(NULL_FLAG),
        }
        
        Ok(())
    }
}

Performance Impact

For a struct with 10 fields using default settings (non-nullable, no ref tracking):

  • Space savings: ~20 bytes per object (null + ref flags)
  • CPU savings: 10 fewer hash map operations per serialization
  • Zero runtime overhead for metadata (all compile-time via proc macro)

Additional context

This is the Rust equivalent of Java's @ForyField annotation. See Java issue #3000 for the original design discussion.

Protocol spec: https://fory.apache.org/docs/specification/fory_xlang_serialization_spec

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions