Skip to content

Commit 7c53649

Browse files
authored
Named schema resolution and docs updates (#4)
* chore: update public and internal docs * Use order for field equality * chore: writer docs * bugfix: named schema resolution
1 parent e79a420 commit 7c53649

File tree

11 files changed

+136
-90
lines changed

11 files changed

+136
-90
lines changed

CHANGELOG.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,13 @@ All notable changes to this project will be documented in this file.
44
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
55
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
66

7-
# [Unreleased]
7+
## [Unreleased]
8+
9+
### Fixed
10+
- Named schema resolution outside of union variants.
11+
12+
### Updated
13+
- Documentation.
814

915
## 0.2.0 - 2020-10-10
1016

CONTRIBUTING.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,29 @@
11

22
# Contributing
33

4-
When contributing to this repository, please first discuss the change you wish to make via issue,
4+
Some of the features of avrow are feature gated.
5+
While making changes it's a good idea to build and
6+
test with `--all-features` flag.
7+
8+
## Building the project
9+
10+
```
11+
cargo build --all-features
12+
```
13+
14+
## Running test cases
15+
16+
```
17+
cargo test --all-features
18+
```
19+
20+
## Generating and opening documentation locally
21+
22+
```
23+
BROWSER=firefox cargo doc --no-deps --open
24+
```
25+
26+
When contributing to this repository, please discuss the change you wish to make via issue,
527
email, or any other method with the owners of this repository before making a change.
628

729
Please note we have a [code of conduct](./CODE_OF_CONDUCT.md), please follow it in all your interactions with the project.

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ The Avro specification provides two kinds of encoding:
6464

6565
This crate implements only the binary encoding as that's the format practically used for performance and storage reasons.
6666

67-
## Features.
67+
## Features
6868

6969
* Full support for recursive self-referential schemas with Serde serialization/deserialization.
7070
* All compressions codecs (`deflate`, `bzip2`, `snappy`, `xz`, `zstd`) supported as per spec.
@@ -139,7 +139,7 @@ fn main() -> Result<(), Error> {
139139

140140
```
141141

142-
A more involved self-referential recursive schema example:
142+
### Self-referential recursive schema example
143143

144144
```rust
145145
use anyhow::Error;
@@ -202,7 +202,7 @@ fn main() -> Result<(), Error> {
202202

203203
```
204204

205-
An example of writing a json object with a confirming schema. The json object maps to an `avrow::Record` type.
205+
### An example of writing a json object with a confirming schema. The json object maps to the `avrow::Record` type.
206206

207207
```rust
208208
use anyhow::Error;

src/lib.rs

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
11
//! Avrow is a pure Rust implementation of the [Apache Avro specification](https://avro.apache.org/docs/current/spec.html).
22
//!
3-
//! Pleaes refer to the [README](https://github.com/creativcoder/avrow/blob/main/README.md) for an overview.
3+
//! Please refer to the [README](https://github.com/creativcoder/avrow/blob/main/README.md) for an overview.
44
//! For more details on the spec, head over to the [FAQ](https://cwiki.apache.org/confluence/display/AVRO/FAQ).
55
//!
66
//! ## Using the library
77
//!
8-
//! Add to your `Cargo.toml`:
8+
//! Add avrow to your `Cargo.toml`:
99
//!```toml
1010
//! [dependencies]
1111
//! avrow = "0.2.0"
1212
//!```
13-
//! ### A hello world example of reading and writing avro data files
13+
//! ## A hello world example of reading and writing avro data files
1414
1515
//!```rust
1616
//! use avrow::{Reader, Schema, Writer, from_value};
@@ -26,7 +26,7 @@
2626
//! let mut writer = Writer::new(&schema, vec![])?;
2727
//! // Write data using write
2828
//! writer.write(())?;
29-
//! // or serialize
29+
//! // or serialize via serde
3030
//! writer.serialize(())?;
3131
//! // retrieve the underlying buffer using the into_inner method.
3232
//! let buf = writer.into_inner()?;
@@ -49,7 +49,6 @@
4949
5050
//!```
5151
52-
// TODO update logo
5352
#![doc(
5453
html_favicon_url = "https://raw.githubusercontent.com/creativcoder/avrow/main/assets/avrow_logo.png"
5554
)]

src/reader.rs

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -485,7 +485,7 @@ pub(crate) fn decode_with_resolution<R: Read>(
485485
pub(crate) fn decode<R: Read>(
486486
schema: &Variant,
487487
reader: &mut R,
488-
r_cxt: &Registry,
488+
w_cxt: &Registry,
489489
) -> Result<Value, AvrowErr> {
490490
let value = match schema {
491491
Variant::Null => Value::Null,
@@ -538,7 +538,7 @@ pub(crate) fn decode<R: Read>(
538538

539539
let mut it = Vec::with_capacity(block_count as usize);
540540
for _ in 0..block_count {
541-
let decoded = decode(&**items, reader, r_cxt)?;
541+
let decoded = decode(&**items, reader, w_cxt)?;
542542
it.push(decoded);
543543
}
544544

@@ -550,7 +550,7 @@ pub(crate) fn decode<R: Read>(
550550
let mut hm = HashMap::new();
551551
for _ in 0..block_count {
552552
let key = decode_string(reader)?;
553-
let value = decode(values, reader, r_cxt)?;
553+
let value = decode(values, reader, w_cxt)?;
554554
hm.insert(key, value);
555555
}
556556

@@ -560,7 +560,7 @@ pub(crate) fn decode<R: Read>(
560560
let mut v = IndexMap::with_capacity(fields.len());
561561
for (field_name, field) in fields {
562562
let field_name = field_name.to_string();
563-
let field_value = decode(&field.ty, reader, r_cxt)?;
563+
let field_value = decode(&field.ty, reader, w_cxt)?;
564564
let field_value = FieldValue::new(field_value);
565565
v.insert(field_name, field_value);
566566
}
@@ -571,15 +571,22 @@ pub(crate) fn decode<R: Read>(
571571
};
572572
Value::Record(rec)
573573
}
574+
Variant::Fixed { size, .. } => {
575+
let mut buf = vec![0; *size];
576+
reader
577+
.read_exact(&mut buf)
578+
.map_err(AvrowErr::DecodeFailed)?;
579+
Value::Fixed(buf)
580+
}
574581
Variant::Union { variants } => {
575582
let variant_idx: i64 = reader.read_varint().map_err(AvrowErr::DecodeFailed)?;
576-
decode(&variants[variant_idx as usize], reader, r_cxt)?
583+
decode(&variants[variant_idx as usize], reader, w_cxt)?
577584
}
578585
Variant::Named(schema_name) => {
579-
let schema_variant = r_cxt
586+
let schema_variant = w_cxt
580587
.get(schema_name)
581588
.ok_or(AvrowErr::NamedSchemaNotFound)?;
582-
decode(schema_variant, reader, r_cxt)?
589+
decode(schema_variant, reader, w_cxt)?
583590
}
584591
a => {
585592
return Err(AvrowErr::DecodeFailed(Error::new(

src/schema/canonical.rs

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,7 @@ const RELEVANT_FIELDS: [&str; 7] = [
2727
"name", "type", "fields", "symbols", "items", "values", "size",
2828
];
2929
/// Represents canonical form of an avro schema. This representation removes irrelevant fields
30-
/// such as docs and aliases in the schema.
31-
/// Fingerprinting methods are available on this instance.
30+
/// such as docs and aliases in the schema. Fingerprinting methods are available on this instance.
3231
#[derive(Debug, PartialEq)]
3332
pub struct CanonicalSchema(pub(crate) JsonValue);
3433

src/schema/common.rs

Lines changed: 7 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
// This module contains definition of types that are common across a subset of
2-
// avro schemas.
2+
// avro Schema implementation.
33

44
use crate::error::AvrowErr;
55
use crate::schema::Variant;
@@ -33,16 +33,16 @@ pub(crate) fn validate_namespace(s: &str) -> Result<(), AvrowErr> {
3333
Ok(())
3434
}
3535

36-
/// Represents `fullname` attribute and its constituents
37-
/// of a named avro type i.e, Record, Fixed and Enum
36+
/// Represents the `fullname` attribute
37+
/// of a named avro type i.e, Record, Fixed and Enum.
3838
#[derive(Debug, Clone, Eq, PartialOrd, Ord)]
3939
pub struct Name {
4040
pub(crate) name: String,
4141
pub(crate) namespace: Option<String>,
4242
}
4343

4444
impl Name {
45-
// Creates an validates the name. This will also extract the namespace if a dot is present in `name`
45+
// Creates a new name with validation. This will extract the namespace if a dot is present in `name`
4646
// Any further calls to set_namespace, will be a noop if the name already contains a dot.
4747
pub(crate) fn new(name: &str) -> Result<Self, AvrowErr> {
4848
let mut namespace = None;
@@ -56,10 +56,6 @@ impl Name {
5656
validate_name(0, name)?;
5757
name
5858
} else {
59-
// TODO perform namespace lookups from enclosing schema if any
60-
// This will require us to pass context to this method.
61-
// Update: this is now handled by from_json method as that's called from places
62-
// where we have context on most tightly enclosing schema.
6359
validate_name(0, name)?;
6460
name
6561
};
@@ -70,7 +66,6 @@ impl Name {
7066
})
7167
}
7268

73-
// TODO also parse namespace from json value
7469
pub(crate) fn from_json(
7570
json: &serde_json::map::Map<String, JsonValue>,
7671
enclosing_namespace: Option<&str>,
@@ -105,7 +100,6 @@ impl Name {
105100
}
106101

107102
// receives a mutable json and parses a Name and removes namespace. Used for canonicalization.
108-
// TODO change as above from_json method, should take enclosing namespace.
109103
pub(crate) fn from_json_mut(
110104
json: &mut serde_json::map::Map<String, JsonValue>,
111105
enclosing_namespace: Option<&str>,
@@ -129,13 +123,6 @@ impl Name {
129123
}
130124
}
131125

132-
// if let Some(namespace) = json.get("namespace") {
133-
// if let JsonValue::String(s) = namespace {
134-
// name.set_namespace(s)?;
135-
// json.remove("namespace");
136-
// }
137-
// }
138-
139126
Ok(name)
140127
}
141128

@@ -156,24 +143,8 @@ impl Name {
156143
}
157144

158145
// TODO according to Rust convention, item path separators are :: instead of .
159-
// TODO should we add a configurable separator.
160-
// TODO should do namespace lookup from enclosing name schema if applicable. (pass enclosing schema as a context)
146+
// should we add a configurable separator?
161147
pub(crate) fn fullname(&self) -> String {
162-
// if self.name.contains(".") {
163-
// self.name.to_string()
164-
// } else if let Some(n) = &self.namespace {
165-
// if n.is_empty() {
166-
// // According to spec, it's fine to put "" as a namespace, which becomes a null namespace
167-
// format!("{}", self.name)
168-
// } else {
169-
// format!("{}.{}", n, self.name)
170-
// }
171-
// } else {
172-
// // The case when only name exists.
173-
// // TODO As of now we just return without any enclosing namespace.
174-
// // TODO pass the most tightly enclosing namespace here when only name is provided.
175-
// self.name.to_string()
176-
// }
177148
if let Some(n) = &self.namespace {
178149
if n.is_empty() {
179150
// According to spec, it's fine to put "" as a namespace, which becomes a null namespace
@@ -255,10 +226,9 @@ pub struct Field {
255226
pub(crate) aliases: Option<Vec<String>>,
256227
}
257228

258-
// TODO do we also use order for equality?
259229
impl std::cmp::PartialEq for Field {
260230
fn eq(&self, other: &Self) -> bool {
261-
self.name == other.name && self.ty == other.ty
231+
self.name == other.name && self.ty == other.ty && self.order == other.order
262232
}
263233
}
264234

@@ -270,6 +240,7 @@ impl Field {
270240
order: Order,
271241
aliases: Option<Vec<String>>,
272242
) -> Result<Self, AvrowErr> {
243+
// According to spec, field names also must adhere to a valid nane.
273244
validate_name(0, name)?;
274245
Ok(Field {
275246
name: name.to_string(),

src/schema/mod.rs

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,6 @@ use std::fmt::Debug;
2222
use std::fs::OpenOptions;
2323
use std::path::Path;
2424

25-
/// A schema parsed from json value
2625
#[derive(Debug, Clone, PartialEq)]
2726
pub(crate) enum Variant {
2827
Null,
@@ -59,7 +58,7 @@ pub(crate) enum Variant {
5958
Named(String),
6059
}
6160

62-
/// Represents the avro schema used to write encoded avro data
61+
/// Represents the avro schema used to write encoded avro data.
6362
#[derive(Debug)]
6463
pub struct Schema {
6564
// TODO can remove this if not needed
@@ -80,7 +79,7 @@ impl PartialEq for Schema {
8079

8180
impl std::str::FromStr for Schema {
8281
type Err = AvrowErr;
83-
/// Parse an avro schema from a json string
82+
/// Parse an avro schema from a JSON string
8483
/// One can use Rust's raw string syntax (r##""##) to pass schema.
8584
fn from_str(schema: &str) -> Result<Self, Self::Err> {
8685
let schema_json =
@@ -90,8 +89,9 @@ impl std::str::FromStr for Schema {
9089
}
9190

9291
impl Schema {
93-
/// Parses an avro schema from a json description of schema in a file.
94-
/// Alternatively, one can use the `FromStr` impl to create a `Schema` from a JSON string:
92+
/// Parses an avro schema from a JSON schema in a file.
93+
/// Alternatively, one can use the [`FromStr`](https://doc.rust-lang.org/std/str/trait.FromStr.html)
94+
/// impl to create the Schema from a JSON string:
9595
/// ```
9696
/// use std::str::FromStr;
9797
/// use avrow::Schema;
@@ -134,7 +134,8 @@ impl Schema {
134134
self.variant.validate(value, &self.cxt)
135135
}
136136

137-
/// Returns the canonical form of an Avro schema
137+
/// Returns the canonical form of an Avro schema.
138+
/// Example:
138139
/// ```rust
139140
/// use avrow::Schema;
140141
/// use std::str::FromStr;
@@ -150,6 +151,7 @@ impl Schema {
150151
/// }]
151152
/// }
152153
/// "##).unwrap();
154+
///
153155
/// let canonical = schema.canonical_form();
154156
/// ```
155157
pub fn canonical_form(&self) -> &CanonicalSchema {

src/schema/parser.rs

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -304,8 +304,8 @@ impl Registry {
304304

305305
// Parses the `order` of a field, defaults to `ascending` order
306306
pub(crate) fn parse_field_order(order: &JsonValue) -> AvrowResult<Order> {
307-
match *order {
308-
JsonValue::String(ref s) => match &**s {
307+
match order {
308+
JsonValue::String(s) => match s.as_ref() {
309309
"ascending" => Ok(Order::Ascending),
310310
"descending" => Ok(Order::Descending),
311311
"ignore" => Ok(Order::Ignore),
@@ -439,7 +439,7 @@ mod tests {
439439
"value",
440440
Variant::Long,
441441
Some(Value::Long(1)),
442-
Order::Ascending,
442+
Order::Descending,
443443
None,
444444
)
445445
.unwrap();

0 commit comments

Comments
 (0)