Schema resolution slowing down encoding

**Describe the bug**
We heavily use Kafka, Avro and the Schema Registry with Java. I wanted to implement now a service in Rust. The service is running fine but producing a message is very slow and I found the schema resolution to be the slow part. I read about the schema resolution and I wonder why it is called during encoding. As far as I understood it is needed during decoding when the schema is different than the one used during encoding.

We are using a quite big schema with many records that are used multiple times so they become named references after the first definition. Unfortunately, I cannot just attach the schema. Its already mentioned in avro-rs that this path is slow:

<img width="738" alt="image" src="https://github.com/user-attachments/assets/b1bb65a3-440c-4bf1-b799-06a0794f1bee">


**To Reproduce**
Steps to reproduce the behavior:
`EasyAvroEncoder.encode_struct()`with a schema with many named references

Here the .resolve() method is called and I don't understand why (see comment):

```
pub(crate) fn item_to_bytes(
    avro_schema: &AvroSchema,
    item: impl Serialize,
) -> Result<Vec<u8>, SRCError> {
    match to_value(item)
        .map_err(|e| {
            SRCError::non_retryable_with_cause(e, "Could not transform to apache_avro value")
        })
        // not sure why schema resolution should happen on serialization/writing
        .map(|r| r.resolve(&avro_schema.parsed))
    {
        Ok(Ok(v)) => to_bytes(avro_schema, v),
        Ok(Err(e)) => Err(SRCError::non_retryable_with_cause::<SRCError>(e, "Failed to resolve")),
        Err(e) => Err(e),
    }
}
```

I tried to write a test. The child struct could be duplicated to get more named references:

```
    #[test]
    fn named() {
        #[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize, serde::Serialize)]
        pub struct Parent {
            #[serde(rename = "child1")]
            pub child1: Option<Child>,
            #[serde(rename = "child2")]
            pub child2: Option<Child>,
            #[serde(rename = "child3")]
            pub child3: Option<Child>,
        }

        #[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize, serde::Serialize)]
        pub struct Child {
            #[serde(rename = "name")]
            pub name: Option<String>,
        }

        let writer_schema = r#"{
  "type": "record",
  "name": "Parent",
  "fields": [
    {
      "name": "child1",
      "type": [
        "null",
        {
          "type": "record",
          "name": "Child",
          "fields": [
            {
              "name": "name",
              "type": [
                "null",
                "string"
              ],
              "default": null
            }
          ]
        }
      ]
    },
    {
      "name": "child2",
      "type": [
        "null",
        "Child"
      ],
      "default": null
    },
    {
      "name": "child3",
      "type": [
        "null",
        "Child"
      ],
      "default": null
    }
  ]
}"#;

        let schema = AvroSchema {
            id: 6,
            raw: String::from(writer_schema),
            parsed: Schema::parse_str(writer_schema).unwrap(),
        };

        let item = Parent {
            child1: Some(Child { name: Some("child1".to_string()) }),
            child2: Some(Child { name: Some("child2".to_string()) }),
            child3: Some(Child { name: None }),
        };

        let now = Instant::now();
        let result = crate::avro_common::item_to_bytes(&schema, item);
        let elapsed = now.elapsed();
        println!("writing took: {:.2?}", elapsed);
        let bytes = result.unwrap();

        assert_eq!(bytes.len(), 25);
    }
```

Here is a screenshot of the running service from the IDE with some additional logs "Sending" and "Sent" around the `EasyAvroEncoder.encode_struct()` plus .await().

<img width="1313" alt="image" src="https://github.com/user-attachments/assets/3a39c563-24b7-4e65-925e-3421e8ca449d">


**Expected behavior**
I expect it to be faster. When I remove all the data related to named references (because I have many nullable fields so its possible) then it is much faster. The following screenshot shows first sending of a big event with many named references and then a small event with no named references:

<img width="1302" alt="image" src="https://github.com/user-attachments/assets/05a334da-673c-4bbf-ba46-050914edb61f">


**Options**
- is it possible to just avoid the schema resolution here?
- is it possible to check if the writer and reader schema is different, to only do schema resolution when needed``
- is it possible to speedup the schema resolution (but this is a change in avro-rs)?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema resolution slowing down encoding #117

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Schema resolution slowing down encoding #117

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions