ramon-garcia
diff --git a/‎docs/legacy_serialisation.md‎
Lines changed: 85 additions & 0 deletions b/‎docs/legacy_serialisation.md‎
Lines changed: 85 additions & 0 deletions
diff --git a/‎docs/serialisation.md‎
Lines changed: 21 additions & 39 deletions b/‎docs/serialisation.md‎
Lines changed: 21 additions & 39 deletions
diff --git a/‎src/Parquet.Test/Serialisation/CILProgramTest.cs‎
Lines changed: 0 additions & 36 deletions b/‎src/Parquet.Test/Serialisation/CILProgramTest.cs‎
Lines changed: 0 additions & 36 deletions
diff --git a/‎src/Parquet.Test/Serialisation/ParquetSerializerTest.cs‎
Lines changed: 17 additions & 1 deletion b/‎src/Parquet.Test/Serialisation/ParquetSerializerTest.cs‎
Lines changed: 17 additions & 1 deletion
diff --git a/‎src/Parquet.Test/Serialisation/SchemaReflectorTest.cs‎
Lines changed: 18 additions & 0 deletions b/‎src/Parquet.Test/Serialisation/SchemaReflectorTest.cs‎
Lines changed: 18 additions & 0 deletions
diff --git a/‎src/Parquet/Schema/MapField.cs‎
Lines changed: 1 addition & 1 deletion b/‎src/Parquet/Schema/MapField.cs‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎src/Parquet/Serialization/CILProgram.cs‎
Lines changed: 0 additions & 65 deletions b/‎src/Parquet/Serialization/CILProgram.cs‎
Lines changed: 0 additions & 65 deletions
diff --git a/‎src/Parquet/Serialization/MSILGenerator.cs‎
Lines changed: 1 addition & 1 deletion b/‎src/Parquet/Serialization/MSILGenerator.cs‎
Lines changed: 1 addition & 1 deletion
@@ -0,0 +1,85 @@
+# Class Serialisation
+
+Parquet library is generally extremely flexible in terms of supporting internals of the Apache Parquet format and allows you to do whatever the low level API allow to. However, in many cases writing boilerplate code is not suitable if you are working with business objects and just want to serialise them into a parquet file. 
+
+Class serialisation is **really fast** as it generates [MSIL](https://en.wikipedia.org/wiki/Common_Intermediate_Language) on the fly. That means there is a tiny bit of delay when serialising a first entity, which in most cases is negligible. Once the class is serialised at least once, further operations become blazingly fast (around *x40* speed improvement comparing to reflection on relatively large amounts of data (~5 million records)).
+
+> At the moment class serialisation supports only simple first-level class *properties* (having a getter and a setter). None of the complex types such as arrays etc. are supported. This is mostly due to lack of time rather than technical limitations.
+
+## Quick Start
+
+Both serialiser and deserialiser works with array of classes. Let's say you have the following class definition:
+
+```csharp
+class Record {
+    public DateTime Timestamp { get; set; }
+    public string EventName { get; set; }
+    public double MeterValue { get; set; }
+}
+```
+
+Let's generate a few instances of those for a test:
+
+```csharp
+var data = Enumerable.Range(0, 1_000_000).Select(i => new Record {
+    Timestamp = DateTime.UtcNow.AddSeconds(i),
+    EventName = i % 2 == 0 ? "on" : "off",
+    MeterValue = i 
+}).ToList();
+```
+
+Here is what you can do to write out those classes in a single file:
+
+```csharp
+await ParquetConvert.SerializeAsync(data, "/mnt/storage/data.parquet");
+```
+
+That's it! Of course the `.SerializeAsync()` method also has overloads and optional parameters allowing you to control the serialization process slightly, such as selecting compression method, row group size etc.
+
+Parquet.Net will automatically figure out file schema by reflecting class structure, types, nullability and other parameters for you.
+
+In order to deserialise this file back to array of classes you would write the following:
+
+```csharp
+Record[] data = await ParquetConvert.DeserializeAsync<Record>("/mnt/storage/data.parquet");
+```
+### Retrieve and Deserialize records by RowGroup:
+
+If you have a huge parquet file(~10million records), you can also retrieve records by rowgroup index (which could help to keep low memory footprint as you don't load everything into memory).
+```csharp
+SimpleStructure[] structures = ParquetConvert.Deserialize<SimpleStructure>(stream,rowGroupIndex);
+```
+### Deserialize only few properties:
+
+If you have a parquet file with huge number of columns and you only need few columns for processing, you can retrieve required columns only as described in the below code snippet.
+```csharp
+class MyClass
+{
+   public int Id { get; set; }
+   public string Name{get;set;}
+   public string Address{get;set;}
+   public int Age{get;set;}
+}
+class MyClassV1
+{
+   public string Name { get; set; }
+}
+SimpleStructure[] structures = Enumerable
+   .Range(0, 1000)
+   .Select(i => new SimpleStructure
+   {
+      Id = i,
+      Name = $"row {i}",
+   })
+   .ToArray();
+ParquetConvert.Serialize(structures, stream);
+
+MyClassV1[] v1structures = ParquetConvert.Deserialize<MyClassV1>(stream,rowGroupIndex);
+```
+
+## Customising Serialisation
+
+Serialisation tries to fit into C# ecosystem like a ninja 🥷, including customisations. It supports the following attributes from [`System.Text.Json.Serialization` Namespace](https://learn.microsoft.com/en-us/dotnet/api/system.text.json.serialization?view=net-7.0):
+
+- [`JsonPropertyName`](https://learn.microsoft.com/en-us/dotnet/api/system.text.json.serialization.jsonpropertynameattribute?view=net-7.0) - changes mapping of column name to property name.
+- [`JsonIgnore`](https://learn.microsoft.com/en-us/dotnet/api/system.text.json.serialization.jsonignoreattribute?view=net-7.0) - ignores property when reading or writing.
@@ -1,14 +1,16 @@
 # Class Serialisation
 
+> for legacy serialisation refer to [this doc](legacy_serialisation.md).
+
 Parquet library is generally extremely flexible in terms of supporting internals of the Apache Parquet format and allows you to do whatever the low level API allow to. However, in many cases writing boilerplate code is not suitable if you are working with business objects and just want to serialise them into a parquet file. 
 
-Class serialisation is **really fast** as it generates [MSIL](https://en.wikipedia.org/wiki/Common_Intermediate_Language) on the fly. That means there is a tiny bit of delay when serialising a first entity, which in most cases is negligible. Once the class is serialised at least once, further operations become blazingly fast (around *x40* speed improvement comparing to reflection on relatively large amounts of data (~5 million records)).
+Class serialisation is **really fast** as internally it generates [compiled expression trees](https://learn.microsoft.com/en-US/dotnet/csharp/programming-guide/concepts/expression-trees/) on the fly. That means there is a tiny bit of delay when serialising a first entity, which in most cases is negligible. Once the class is serialised at least once, further operations become blazingly fast (around *x40* speed improvement comparing to reflection on relatively large amounts of data (~5 million records)).
 
-> At the moment class serialisation supports only simple first-level class *properties* (having a getter and a setter). None of the complex types such as arrays etc. are supported. This is mostly due to lack of time rather than technical limitations.
+Class serialisation philosophy is trying to simply mimic .NET's built-in **json** serialisation infrastructure in order to ease in learning path and reuse as much existing code as possible.
 
 ## Quick Start
 
-Both serialiser and deserialiser works with array of classes. Let's say you have the following class definition:
+Both serialiser and deserialiser works with collection of classes. Let's say you have the following class definition:
 
 ```csharp
 class Record {
@@ -31,7 +33,7 @@ var data = Enumerable.Range(0, 1_000_000).Select(i => new Record {
 Here is what you can do to write out those classes in a single file:
 
 ```csharp
-await ParquetConvert.SerializeAsync(data, "/mnt/storage/data.parquet");
+await ParquetSerializer.SerializeAsync(data, "/mnt/storage/data.parquet");
 ```
 
 That's it! Of course the `.SerializeAsync()` method also has overloads and optional parameters allowing you to control the serialization process slightly, such as selecting compression method, row group size etc.
@@ -41,45 +43,25 @@ Parquet.Net will automatically figure out file schema by reflecting class struct
 In order to deserialise this file back to array of classes you would write the following:
 
 ```csharp
-Record[] data = await ParquetConvert.DeserializeAsync<Record>("/mnt/storage/data.parquet");
-```
-### Retrieve and Deserialize records by RowGroup:
-
-If you have a huge parquet file(~10million records), you can also retrieve records by rowgroup index (which could help to keep low memory footprint as you don't load everything into memory).
-```csharp
-SimpleStructure[] structures = ParquetConvert.Deserialize<SimpleStructure>(stream,rowGroupIndex);
-```
-### Deserialize only few properties:
-
-If you have a parquet file with huge number of columns and you only need few columns for processing, you can retrieve required columns only as described in the below code snippet.
-```csharp
-class MyClass
-{
-   public int Id { get; set; }
-   public string Name{get;set;}
-   public string Address{get;set;}
-   public int Age{get;set;}
-}
-class MyClassV1
-{
-   public string Name { get; set; }
-}
-SimpleStructure[] structures = Enumerable
-   .Range(0, 1000)
-   .Select(i => new SimpleStructure
-   {
-      Id = i,
-      Name = $"row {i}",
-   })
-   .ToArray();
-ParquetConvert.Serialize(structures, stream);
-
-MyClassV1[] v1structures = ParquetConvert.Deserialize<MyClassV1>(stream,rowGroupIndex);
+IList<Record> data = await ParquetSerializer.DeserializeAsync<Record>("/mnt/storage/data.parquet");
 ```
-
 ## Customising Serialisation
 
 Serialisation tries to fit into C# ecosystem like a ninja 🥷, including customisations. It supports the following attributes from [`System.Text.Json.Serialization` Namespace](https://learn.microsoft.com/en-us/dotnet/api/system.text.json.serialization?view=net-7.0):
 
 - [`JsonPropertyName`](https://learn.microsoft.com/en-us/dotnet/api/system.text.json.serialization.jsonpropertynameattribute?view=net-7.0) - changes mapping of column name to property name.
 - [`JsonIgnore`](https://learn.microsoft.com/en-us/dotnet/api/system.text.json.serialization.jsonignoreattribute?view=net-7.0) - ignores property when reading or writing.
+
+## Non-Trivial Types
+
+You can also serialize more complex types supported by the Parquet format.
+
+### Maps (Dictionaries)
+
+
+
+## FAQ
+
+**Q.** Can I specify schema for serialisation/deserialisation.
+
+**A.** No. Your class definition is the schema, so you don't need to supply it separately.
@@ -1,5 +1,6 @@
 using System;
 using System.Collections.Generic;
+using System.Diagnostics;
 using System.IO;
 using System.Linq;
 using System.Text;
@@ -10,14 +11,24 @@
 namespace Parquet.Test.Serialisation {
     public class ParquetSerializerTest {
 
-        class Record {
+        class Record : IEquatable<Record> {
             public DateTime Timestamp { get; set; }
             public string? EventName { get; set; }
             public double MeterValue { get; set; }
+
+            public bool Equals(Record? other) {
+                if(other == null)
+                    return false;
+
+                return Timestamp == other.Timestamp &&
+                    EventName == other.EventName &&
+                    MeterValue == other.MeterValue;
+            }
         }
 
         [Fact]
         public async Task SerializeDeserializeRecord() {
+
             var data = Enumerable.Range(0, 1_000_000).Select(i => new Record {
                 Timestamp = DateTime.UtcNow.AddSeconds(i),
                 EventName = i % 2 == 0 ? "on" : "off",
@@ -26,6 +37,11 @@ public async Task SerializeDeserializeRecord() {
 
             using var ms = new MemoryStream();
             await ParquetSerializer.SerializeAsync(data, ms);
+
+            ms.Position = 0;
+            IList<Record> data2 = await ParquetSerializer.DeserializeAsync<Record>(ms);
+
+            Assert.Equal(data2, data);
         }
     }
 }
@@ -1,3 +1,4 @@
+using System.Collections.Generic;
 using System.Text.Json.Serialization;
 using Parquet.Schema;
 using Parquet.Serialization;
@@ -136,5 +137,22 @@ public void IgnoredProperties() {
             Assert.Equal(new ParquetSchema(
                 new DataField<int>("NotIgnored")), schema);
         }
+
+        class SimpleMapPoco {
+            public int? Id { get; set; }
+
+            public Dictionary<string, int> Tags { get; set; } = new Dictionary<string, int>();
+        }
+
+        [Fact]
+        public void SimpleMap() {
+            ParquetSchema schema = typeof(SimpleMapPoco).GetParquetSchema(true);
+
+            Assert.Equal(new ParquetSchema(
+                new DataField<int?>("Id"),
+                new MapField("Tags", 
+                    new DataField<string>("Key"),
+                    new DataField<int>("Value"))), schema);
+        }
     }
 }
@@ -26,7 +26,7 @@ public class MapField : Field {
         /// <summary>
         /// Declares a map field
         /// </summary>
-        public MapField(string name, DataField keyField, DataField valueField)
+        public MapField(string name, Field keyField, Field valueField)
            : base(name, SchemaType.Map) {
             Key = keyField;
             Value = valueField;
 
@@ -355,7 +355,7 @@ static class PropertyHelpers {
             PropertyInfo? prop = classType.GetTypeInfo().GetDeclaredProperty(fieldName);
 
             // TODO: trying to get build, probably not the best solution
-            var baseType = classType.BaseType;
+            Type? baseType = classType.BaseType;
             while(prop == null && baseType != null) {
                 // if pi is null, try the base class
                 prop = baseType?.GetTypeInfo()?.GetDeclaredProperty(fieldName);