Skip to content

Commit 57ec093

Browse files
author
Jill Grant
authored
Merge pull request #292728 from axisc/schema-registry-docs-refresh
Schema registry docs refresh
2 parents e61f9aa + 4e0b341 commit 57ec093

File tree

3 files changed

+67
-28
lines changed

3 files changed

+67
-28
lines changed

articles/event-hubs/schema-registry-client-side-enforcement.md

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,22 +7,32 @@ author: spelluru
77
ms.author: spelluru
88
---
99

10-
# Client-side schema enforcement
11-
The information flow when you use schema registry is the same for all protocols that you use to publish or consume events from Azure Event Hubs.
10+
# Client-side schema enforcement
1211

13-
The following diagram shows how the information flows when event producers and consumers use Schema Registry with the **Kafka** protocol using **Avro** serialization.
12+
Client-side schema enforcement ensures that the data sent by the producer application and received by the consumer application is validated against the schemas defined in the Schema Registry on the client side itself (that is, rather than on the broker/server side).
13+
14+
This flow is illustrated as shown -
1415

1516
:::image type="content" source="./media/schema-registry-overview/information-flow.svg" alt-text="Image showing the Schema Registry information flow." border="false":::
1617

18+
> [!NOTE]
19+
> While the diagram showcases the information flow when event producers and consumers use Schema Registry with the **Kafka** protocol and **Avro** schema, it doesn't really change for other protocols and schema formats.
20+
>
21+
1722
### Producer
1823

1924
1. Kafka producer application uses `KafkaAvroSerializer` to serialize event data using the specified schema. Producer application provides details of the schema registry endpoint and other optional parameters that are required for schema validation.
20-
1. The serializer looks for the schema in the schema registry to serialize event data. If it finds the schema, then the corresponding schema ID is returned. You can configure the producer application to auto register the schema with the schema registry if it doesn't exist.
21-
1. Then the serializer prepends the schema ID to the serialized data that is published to the Event Hubs.
25+
26+
2. The serializer looks for the schema in the schema registry to serialize event data. If it finds the schema, then the corresponding schema ID is returned. You can configure the producer application to auto register the schema with the schema registry if it doesn't exist.
27+
28+
3. Then the serializer prepends the schema ID to the serialized data that is published to the Event Hubs.
2229

2330
### Consumer
2431

2532
1. Kafka consumer application uses `KafkaAvroDeserializer` to deserialize data that it receives from the event hub.
26-
1. The deserializer uses the schema ID (prepended by the producer) to retrieve schema from the schema registry.
27-
1. The deserializer uses the schema to deserialize event data that it receives from the event hub.
28-
1. The schema registry client uses caching to prevent redundant schema registry lookups in the future.
33+
34+
2. The deserializer uses the schema ID (prepended by the producer) to retrieve schema from the schema registry.
35+
36+
3. The deserializer uses the schema to deserialize event data that it receives from the event hub.
37+
38+
4. The schema registry client uses caching to prevent redundant schema registry lookups in the future.

articles/event-hubs/schema-registry-concepts.md

Lines changed: 39 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -8,41 +8,66 @@ ms.author: spelluru
88
---
99

1010
# Schema Registry in Azure Event Hubs
11-
Schema Registry in Azure Event Hubs provides you with a repository to use and manage schemas in schema-driven event streaming scenarios.
1211

13-
> [!NOTE]
14-
> Schema Registry is not supported on Basic tier.
12+
Schema Registry is crucial in loosely coupled and event streaming workflows for maintaining data consistency, simplifying schema evolution, enhancing interoperability, and reducing development effort. It ensures highly reliable data processing and governance with little operational overhead in large distributed organizations with a centralized repository for schemas.
1513

16-
## Schema Registry components
14+
Schema Registry in Azure Event Hubs fulfills multiple roles in schema-driven event streaming scenarios -
15+
* Provides a repository where multiple schemas can be registered, managed, and evolved.
16+
* Managed schema evolution with multiple compatibility rules.
17+
* Performs data validation for all schematized data.
18+
* Provides client-side libraries (serializers and deserializers) for producers and consumers.
19+
* Improves network throughput efficiency by passing schema ID instead of the schema definition for every payload.
1720

18-
An Event Hubs namespace can host schema groups alongside event hubs (or Kafka topics). It hosts a schema registry and can have multiple schema groups. In spite of being hosted in Azure Event Hubs, the schema registry can be used universally with all Azure messaging services and any other message or events broker. Each of these schema groups is a separately securable repository for a set of schemas. Groups can be aligned with a particular application or an organizational unit.
21+
> [!NOTE]
22+
> Schema Registry is supported on Standard, Premium, and Dedicated tiers.
23+
>
1924
20-
:::image type="content" source="./media/schema-registry-overview/elements.png" alt-text="Diagram that shows the components of Schema Registry in Azure Event Hubs." border="false":::
25+
## Schema Registry components
2126

22-
### Schema groups
23-
Schema group is a logical group of similar schemas based on your business criteria. A schema group can hold multiple versions of a schema. The compatibility enforcement setting on a schema group can help ensure that newer schema versions are backwards compatible.
27+
The Schema Registry lives in the context of the Event Hubs namespace, but it can be used with all Azure messaging service or other message or events broker. It comprises multiple schema groups which act as a logical grouping of schemas and can be managed independent of other schema groups.
2428

25-
The security boundary imposed by the grouping mechanism help ensures that trade secrets don't inadvertently leak through metadata in situations where the namespace is shared among multiple partners. It also allows for application owners to manage schemas independent of other applications that share the same namespace.
29+
:::image type="content" source="./media/schema-registry-overview/elements.png" alt-text="Diagram that shows the components of Schema Registry in Azure Event Hubs." border="false":::
2630

2731
### Schemas
28-
Schemas define the contract between producers and consumers. A schema defined in an Event Hubs schema registry helps manage the contract outside of event data, thus removing the payload overhead. A schema has a name, type (example: record, array, and so on.), compatibility mode (none, forward, backward, full), and serialization type (both Avro and JSON). You can create multiple versions of a schema and retrieve and use a specific version of a schema.
2932

30-
### Schema formats
33+
In any loosely coupled system, there are multiple applications communicating with each other, primarily through data. Schemas act as a declarative way to define the structure of the data so that the contract between these producer and consumer applications is well defined, ensuring reliable processing at scale.
34+
35+
A schema definition includes -
36+
* Fields - name of the individual data elements (that is, first/last name, book title, address).
37+
* Data types - the kind of data that can be stored in each field (for example, string, date-time, array).
38+
* Structure - the organization of the different fields (that is, nested structures or arrays).
39+
40+
Schemas define the contract between producers and consumers. A schema defined in an Event Hubs schema registry helps manage the contract outside of event data, thus removing the payload overhead.
41+
42+
#### Schema formats
3143
Schema formats are used to determine the manner in which a schema is structured and defined, with each format outlining specific guidelines and syntax for defining the structure of the events that will be used for event streaming.
3244

33-
#### Avro schema
45+
##### Avro schema
3446
[Avro](https://avro.apache.org/) is a popular data serialization system that uses a compact binary format and provides schema evolution capabilities.
3547

3648
To learn more about using Avro schema format with Event Hubs Schema Registry, see:
3749
- [How to use schema registry with Kafka and Avro](schema-registry-kafka-java-send-receive-quickstart.md)
3850
- [How to use Schema registry with Event Hubs .NET SDK (AMQP) and Avro.](schema-registry-dotnet-send-receive-quickstart.md)
3951

40-
#### JSON Schema
52+
##### JSON Schema
4153
[JSON Schema](https://json-schema.org/) is a standardized way of defining the structure and data types of the events. JSON Schema enables the confident and reliable use of the JSON data format in event streaming.
4254

4355
To learn more about using JSON schema format with Event Hubs Schema Registry, see:
4456
- [How to use schema registry with Kafka and JSON Schema](schema-registry-json-schema-kafka.md)
4557

58+
##### Protobuf
59+
60+
[Protocol Buffers](https://protobuf.dev/) is a language-neutral, platform-neutral, extensible mechanism for serializing structured data. It's used for efficiently defining data structures and serializing them into a compact binary format.
61+
62+
### Schema groups
63+
64+
Schema groups are logical groups of similar schemas based on your business criteria. A schema group holds
65+
* multiple schema definition,
66+
* multiple versions of a specific schema, and
67+
* metadata regarding the schema type and compatibility for all schemas in the group.
68+
69+
A schema groups can be thought of as a subset of the schema registry, aligned with a particular application or organizational unit, with a separate authorization model. This extra security boundary ensures that in the shared services model, metadata, and trade secrets aren't leaked. It also allows for application owners to manage schemas independent of other applications that share the same namespace.
70+
4671
## Schema evolution
4772
Schemas need to evolve with the business requirement of producers and consumers. Azure Schema Registry supports schema evolution by introducing compatibility modes at the schema group level. When you create a schema group, you can specify the compatibility mode of the schemas that you include in that schema group. When you update a schema, the change should comply with the assigned compatibility mode and then only it creates a new version of the schema.
4873

@@ -85,7 +110,7 @@ For limits (for example: number of schema groups in a namespace) of Event Hubs,
85110
To access a schema registry programmatically, follow these steps:
86111

87112
1. [Register your application in Microsoft Entra ID](../active-directory/develop/quickstart-register-app.md)
88-
1. Add the security principal of the application to one of the following Azure role-based access control (Azure RBAC) roles at the **namespace** level.
113+
1. Add the security principal of the application to one of the following Azure RBAC(role-based access control) roles at the **namespace** level.
89114

90115
| Role | Description |
91116
| ---- | ----------- |

articles/event-hubs/schema-registry-overview.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,23 +8,27 @@ ms.custom: references_regions
88
---
99

1010
# Azure Schema Registry in Event Hubs
11-
In many event streaming and messaging scenarios, the event or message payload contains structured data. Schema-driven formats such as [Apache Avro](https://avro.apache.org/) are often used to serialize or deserialize such structured data.
1211

13-
An event producer uses a schema to serialize event payload and publish it to an event broker such as Event Hubs. Event consumers read event payload from the broker and deserialize it using the same schema. So, both producers and consumers can validate the integrity of the data with the same schema.
12+
Event streaming and messaging scenarios often deal with structured data in the event or message payload. However, the structured data is of little value to the event broker, which only deals with bytes. Schema-driven formats such as [Apache Avro](https://avro.apache.org/), [JSONSchema](https://json-schema.org/), or [Protobuf](https://protobuf.dev/) are often used to serialize or deserialize such structured data to/from binary.
13+
14+
An event producer uses a schema definition to serialize event payload and publish it to an event broker such as Event Hubs. Event consumers read event payload from the broker and deserialize it using the same schema definition.
15+
16+
So, both producers and consumers can validate the integrity of the data with the same schema.
1417

1518
:::image type="content" source="./media/schema-registry-overview/schema-driven-ser-de.svg" alt-text="Image showing producers and consumers serializing and deserializing event payload using schemas from the Schema Registry. ":::
1619

1720
## What is Azure Schema Registry?
18-
**Azure Schema Registry** is a feature of Event Hubs, which provides a central repository for schemas for event-driven and messaging-centric applications. It provides the flexibility for your producer and consumer applications to **exchange data without having to manage and share the schema**. It also provides a simple governance framework for reusable schemas and defines relationship between schemas through a grouping construct (schema groups).
21+
**Azure Schema Registry** is a feature of Event Hubs, which provides a central repository for schemas for event-driven and messaging-centric applications. It provides the flexibility for your producer and consumer applications to **exchange data without having to manage and share the schema**. It also provides a simple governance framework for reusable schemas and defines relationship between schemas through a logical grouping construct (schema groups).
1922

2023
:::image type="content" source="./media/schema-registry-overview/schema-registry.svg" alt-text="Image showing a producer and a consumer serializing and deserializing event payload using a schema from the Schema Registry." border="false":::
2124

22-
With schema-driven serialization frameworks like Apache Avro, moving serialization metadata into shared schemas can also help with **reducing the per-message overhead**. It's because each message doesn't need to have the metadata (type information and field names) as it's the case with tagged formats such as JSON.
25+
With schema-driven serialization frameworks like Apache Avro, JSONSchema and Protobuf, moving serialization metadata into shared schemas can also help with **reducing the per-message overhead**. It's because each message doesn't need to have the metadata (type information and field names) as it's the case with tagged formats such as JSON.
2326

2427
> [!NOTE]
25-
> The feature isn't available in the **basic** tier.
28+
> The feature is available in the **Standard**, **Premium**, and **Dedicated** tier.
29+
>
2630
27-
Having schemas stored alongside the events and inside the eventing infrastructure ensures that the metadata that's required for serialization or deserialization is always in reach and schemas can't be misplaced.
31+
Having schemas stored alongside the events and inside the eventing infrastructure ensures that the metadata required for serialization or deserialization is always in reach and schemas can't be misplaced.
2832

2933
## Related content
3034

0 commit comments

Comments
 (0)