ramon-garcia
diff --git a/‎docs/complex-types.md‎
Lines changed: 15 additions & 53 deletions b/‎docs/complex-types.md‎
Lines changed: 15 additions & 53 deletions
diff --git a/‎docs/serialisation.md‎
Lines changed: 4 additions & 0 deletions b/‎docs/serialisation.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/writing.md‎
Lines changed: 7 additions & 3 deletions b/‎docs/writing.md‎
Lines changed: 7 additions & 3 deletions
diff --git a/‎src/Parquet.Test/Extensions/TypeExtensionsTest.cs‎
Lines changed: 38 additions & 0 deletions b/‎src/Parquet.Test/Extensions/TypeExtensionsTest.cs‎
Lines changed: 38 additions & 0 deletions
diff --git a/‎src/Parquet.Test/RepeatableFieldsTest.cs‎
Lines changed: 0 additions & 34 deletions b/‎src/Parquet.Test/RepeatableFieldsTest.cs‎
Lines changed: 0 additions & 34 deletions
diff --git a/‎src/Parquet.Test/SchemaTest.cs‎ renamed to ‎src/Parquet.Test/Schema/SchemaTest.cs‎
Lines changed: 31 additions & 39 deletions b/‎src/Parquet.Test/SchemaTest.cs‎ renamed to ‎src/Parquet.Test/Schema/SchemaTest.cs‎
Lines changed: 31 additions & 39 deletions
diff --git a/‎src/Parquet.Test/TypeExtensionsTest.cs‎
Lines changed: 0 additions & 45 deletions b/‎src/Parquet.Test/TypeExtensionsTest.cs‎
Lines changed: 0 additions & 45 deletions
@@ -11,71 +11,33 @@ Arrays *aka repeatable fields* is a basis for understanding how more complex dat
 ```csharp
 var field = new DataField<IEnumerable<int>>("items");
 ```
-or
-```csharp
-var field= new DataField("items", DataType.Int32, isArray: true);
-```
-
-Apparently to check if the field is repeated you can always check `.IsArray` Boolean flag.
+To check if the field is repeated you can always test `.IsArray` Boolean flag.
 
-Array column is also a usual instance of the `DataColumm` class, however in order to populate it you need to pass **repetition levels**. Repetition levels specify *at which level array starts* (please read more details on this in the link above). 
+Parquet columns are flat, so in order to store an array in the array which can only keep simple elements and not other arrays, you would *flatten* them. For instance to store two elements:
 
-### Example
-
-Let's say you have a following array of integer arrays:
-
-```
-[1 2 3]
-[4 5]
-[6 7 8 9]
-```
+- `[1, 2, 3]`
+- `[4, 5]`
 
-This can be represented as:
+in a flat array, it will look like `[1, 2, 3, 4, 5]`. And that's exactly how parquet stores them. Now, the problem starts when you want to read the values back. Is this `[1, 2]` and `[3, 4, 5]` or `[1]` and `[2, 3, 4, 5]`? There's no way to know without an extra information. Therefore, parquet also stores that extra information an an extra column per data column, which is called *repetition levels*. In the previous example, our array of arrays will expand into the following two columns:
 
-```
-values:             [1 2 3 4 5 6 7 8 9]
-repetition levels:  [0 1 1 0 1 0 1 1 1]
-```
+| #    | Data Column | Repetition Levels Column |
+| ---- | ----------- | ------------------------ |
+| 0    | 1           | 0                        |
+| 1    | 2           | 1                        |
+| 2    | 3           | 1                        |
+| 3    | 4           | 0                        |
+| 4    | 5           | 1                        |
 
-Where `0` means that this is a start of an array and `1` - it's a value continuation.
+In other words - it is the level at which we have to create a new list for the current value. In other words, the repetition level can be seen as a marker of when to start a new list and at which level.
 
 To represent this in C# code:
 
 ```csharp
 var field = new DataField<IEnumerable<int>>("items");
 var column = new DataColumn(
    field,
-   new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 },
-   new int[] { 0, 1, 1, 0, 1, 0, 1, 1, 1 });
+   new int[] { 1, 2, 3, 4, 5 },
+   new int[] { 0, 1, 1, 0, 1 });
 ```
 
-### Empty Arrays
-
-Empty arrays can be represented by simply having no element in them. For instance
-
-```
-[1 2]
-[]
-[3 4]
-```
-
-Goes into following:
-
-```
-values:             [1 2 null 3 4]
-repetition levels:  [0 1 0    0 1]
-```
-
-> Note that anything other than plain columns add a performance overhead due to obvious reasons for the need to pack and unpack data structures.
-
-## Structures
-
-todo
-
-## Lists
-
-todo
-
-## Maps
 
-##
@@ -56,6 +56,10 @@ Serialisation tries to fit into C# ecosystem like a ninja 🥷, including custom
 
 You can also serialize more complex types supported by the Parquet format.
 
+### Lists
+
+
+
 ### Maps (Dictionaries)
 
 
 
@@ -33,7 +33,7 @@ using(Stream fileStream = System.IO.File.OpenWrite("c:\\test.parquet")) {
 }
 ```
 
-## Specifying Compression Method and Level
+# Specifying Compression Method and Level
 
 After constructing `ParquetWriter` you can optionally set compression method ([`CompressionMethod`](../src/Parquet/CompressionMethod.cs)) and/or compression level ([`CompressionLevel`](https://learn.microsoft.com/en-us/dotnet/api/system.io.compression.compressionlevel?view=net-7.0)) which defaults to `Snappy`.  Unless you have specific needs to override compression, the default are very reasonable.
 
@@ -48,7 +48,7 @@ using(ParquetWriter parquetWriter = await ParquetWriter.CreateAsync(schema, file
 ```
 
 
-## Appending to Files
+# Appending to Files
 
 This lib supports pseudo appending to files, however it's worth keeping in mind that *row groups are immutable* by design, therefore the only way to append is to create a new row group at the end of the file. It's worth mentioning that small row groups make data compression and reading extremely ineffective, therefore the larger your row group the better.
 
@@ -96,7 +96,7 @@ Note that you have to specify that you are opening `ParquetWriter` in **append**
 
 Please keep in mind that row groups are designed to hold a large amount of data (5'0000 rows on average) therefore try to find a large enough batch to append to the file. Do not treat parquet file as a row stream by creating a row group and placing 1-2 rows in it, because this will both increase file size massively and cause a huge performance degradation for a client reading such a file.
 
-### Custom Metadata
+# Custom Metadata
 
 To read and write custom file metadata, you can use `CustomMetadata` property on `ParquetFileReader` and `ParquetFileWriter`, i.e.
 
@@ -122,3 +122,7 @@ using(ParquetReader reader = await ParquetReader.CreateAsync(ms)) {
     Assert.Equal("value2", reader.CustomMetadata["key2"]);
 }
 ```
+
+# Complex Types
+
+To write complex types (arrays, lists, maps, structs) read [this guide](complex-types.md).
@@ -0,0 +1,38 @@
+using System;
+using System.Collections.Generic;
+using System.Text;
+using Parquet.File;
+using Xunit;
+
+namespace Parquet.Test.Extensions {
+    public class TypeExtensionsTest {
+        [Fact]
+        public void String_array_is_enumerable() {
+            Assert.True(typeof(string[]).TryExtractEnumerableType(out Type? et));
+            Assert.Equal(typeof(string), et);
+        }
+
+        [Fact]
+        public void String_is_not_enumerable() {
+            Assert.False(typeof(string).TryExtractEnumerableType(out Type? et));
+        }
+
+        [Fact]
+        public void StringIenumerable_is_enumerable() {
+            Assert.True(typeof(IEnumerable<string>).TryExtractEnumerableType(out Type? et));
+            Assert.Equal(typeof(string), et);
+        }
+
+        [Fact]
+        public void Nullable_element_is_not_stripped() {
+            Assert.True(typeof(IEnumerable<int?>).TryExtractEnumerableType(out Type? et));
+            Assert.Equal(typeof(int?), et);
+        }
+
+        [Fact]
+        public void ListOfT_is_ienumerable() {
+            Assert.True(typeof(List<int>).TryExtractEnumerableType(out Type? baseType));
+            Assert.Equal(typeof(int), baseType);
+        }
+    }
+}
@@ -11,7 +11,7 @@
 using System.Numerics;
 using Parquet.Encodings;
 
-namespace Parquet.Test {
+namespace Parquet.Test.Schema {
     public class SchemaTest : TestBase {
         [Fact]
         public void Creating_element_with_unsupported_type_throws_exception() {
@@ -30,7 +30,7 @@ public void SchemaElement_different_names_not_equal() {
 
         [Fact]
         public void SchemaElement_different_types_not_equal() {
-            Assert.NotEqual((Field)(new DataField<int>("id")), (Field)(new DataField<double>("id")));
+            Assert.NotEqual(new DataField<int>("id"), (Field)new DataField<double>("id"));
         }
 
         [Fact]
@@ -96,10 +96,10 @@ public void But_i_can_declare_a_dictionary() {
         [Fact]
         public void Map_fields_with_same_types_are_equal() {
             Assert.Equal(
-                new MapField("dictionary", 
-                    new DataField<int>("key"), 
+                new MapField("dictionary",
+                    new DataField<int>("key"),
                     new DataField<string>("value")),
-                new MapField("dictionary", 
+                new MapField("dictionary",
                     new DataField<int>("key"),
                     new DataField<string>("value")));
         }
@@ -242,20 +242,19 @@ public void List_of_structures_valid_levels() {
         [InlineData("legacy-list-onearray.parquet")]
         [InlineData("legacy-list-onearray.v2.parquet")]
         public async Task BackwardCompat_list_with_one_array(string parquetFile) {
-            using(Stream input = OpenTestFile(parquetFile)) {
-                using(ParquetReader reader = await ParquetReader.CreateAsync(input)) {
-                    ParquetSchema schema = reader.Schema;
-
-                    //validate schema
-                    Assert.Equal("impurityStats", schema[3].Name);
-                    Assert.Equal(SchemaType.List, schema[3].SchemaType);
-                    Assert.Equal("gain", schema[4].Name);
-                    Assert.Equal(SchemaType.Data, schema[4].SchemaType);
-
-                    //smoke test we can read it
-                    using(ParquetRowGroupReader rg = reader.OpenRowGroupReader(0)) {
-                        DataColumn values4 = await rg.ReadColumnAsync((DataField)schema[4]);
-                    }
+            using(Stream input = OpenTestFile(parquetFile))
+            using(ParquetReader reader = await ParquetReader.CreateAsync(input)) {
+                ParquetSchema schema = reader.Schema;
+
+                //validate schema
+                Assert.Equal("impurityStats", schema[3].Name);
+                Assert.Equal(SchemaType.List, schema[3].SchemaType);
+                Assert.Equal("gain", schema[4].Name);
+                Assert.Equal(SchemaType.Data, schema[4].SchemaType);
+
+                //smoke test we can read it
+                using(ParquetRowGroupReader rg = reader.OpenRowGroupReader(0)) {
+                    DataColumn values4 = await rg.ReadColumnAsync((DataField)schema[4]);
                 }
             }
         }
@@ -266,32 +265,26 @@ public async Task Column_called_root() {
             var columns = new List<DataColumn>();
             columns.Add(new DataColumn(new DataField<string>("root"), new string[] { "AAA" }));
             columns.Add(new DataColumn(new DataField<string>("other"), new string[] { "BBB" }));
-            List<Field> fields = new List<Field>();
-            foreach(DataColumn column in columns) {
+            var fields = new List<Field>();
+            foreach(DataColumn column in columns)
                 fields.Add(column.Field);
-            }
 
             // the writer used to create structure type under "root" (https://github.com/aloneguid/parquet-dotnet/issues/143)
             var schema = new ParquetSchema(fields);
             var ms = new MemoryStream();
-            using(ParquetWriter parquetWriter = await ParquetWriter.CreateAsync(schema, ms)) {
-                using(ParquetRowGroupWriter groupWriter = parquetWriter.CreateRowGroup()) {
-                    foreach(DataColumn column in columns) {
-                        await groupWriter.WriteColumnAsync(column);
-                    }
-                }
-            }
+            using(ParquetWriter parquetWriter = await ParquetWriter.CreateAsync(schema, ms))
+            using(ParquetRowGroupWriter groupWriter = parquetWriter.CreateRowGroup())
+                foreach(DataColumn column in columns)
+                    await groupWriter.WriteColumnAsync(column);
 
             ms.Position = 0;
             using(ParquetReader parquetReader = await ParquetReader.CreateAsync(ms)) {
                 DataField[] dataFields = parquetReader.Schema.GetDataFields();
-                for(int i = 0; i < parquetReader.RowGroupCount; i++) {
-                    using(ParquetRowGroupReader groupReader = parquetReader.OpenRowGroupReader(i)) {
+                for(int i = 0; i < parquetReader.RowGroupCount; i++)
+                    using(ParquetRowGroupReader groupReader = parquetReader.OpenRowGroupReader(i))
                         foreach(DataColumn column in columns) {
                             DataColumn c = await groupReader.ReadColumnAsync(column.Field);
                         }
-                    }
-                }
             }
         }
 
@@ -301,14 +294,13 @@ public async Task ReadSchemaActuallyEqualToWriteSchema() {
             var schema = new ParquetSchema(field);
 
             using(var memoryStream = new MemoryStream()) {
-                using(var parquetWriter = await ParquetWriter.CreateAsync(schema, memoryStream)) {
-                    using(var groupWriter = parquetWriter.CreateRowGroup()) {
-                        var dataColumn = new DataColumn(field, new List<DateTime>() { DateTime.Now }.ToArray());
-                        await groupWriter.WriteColumnAsync(dataColumn);
-                    }
+                using(ParquetWriter parquetWriter = await ParquetWriter.CreateAsync(schema, memoryStream))
+                using(ParquetRowGroupWriter groupWriter = parquetWriter.CreateRowGroup()) {
+                    var dataColumn = new DataColumn(field, new List<DateTime>() { DateTime.Now }.ToArray());
+                    await groupWriter.WriteColumnAsync(dataColumn);
                 }
 
-                using(var parquetReader = await ParquetReader.CreateAsync(memoryStream)) {
+                using(ParquetReader parquetReader = await ParquetReader.CreateAsync(memoryStream)) {
                     parquetReader.Schema.Fields.ToString();
 
                     Assert.Single(schema.Fields);