Skip to content

Commit 107318b

Browse files
authored
Update README.md
1 parent 8888b95 commit 107318b

File tree

1 file changed

+6
-4
lines changed

1 file changed

+6
-4
lines changed

README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ The following columnar databases use a [shared-nothing architecture](https://en.
5959
- [Apache Pinot](https://pinot.apache.org/)
6060
- [Clickhouse](https://clickhouse.com)
6161
- [StarRocks](https://www.starrocks.io/)
62+
- [Dremio](https://www.dremio.com/)
6263

6364
### Search engines
6465

@@ -68,7 +69,7 @@ The following columnar databases use a [shared-nothing architecture](https://en.
6869
- [Quickwit](https://quickwit.io/) - Search engine on top of object storage, using shared-everything architecture.
6970
- [Typesense](https://typesense.org/) - Оpen-source, typo-tolerant search engine optimized for instant search-as-you-type experiences and developer productivity.
7071

71-
### NewSQL
72+
### Hybrid OLAP/OLTP NewSQL (aka HTAP)
7273

7374
- [Citus](https://www.citusdata.com/) - PostgreSQL compatible distributed table.
7475
- [TiDB](https://github.com/pingcap/tidb) - MySQL compatible SQL database that supports hybrid transactional and analytical processing workloads.
@@ -89,15 +90,15 @@ The following columnar databases use a [shared-nothing architecture](https://en.
8990

9091
## Data lake
9192

92-
The data lake approach (or "lakehouse") is a semi-structured schema that sit on top of object storage in the cloud.
93+
The data lake approach (or "lakehouse") is a semi-structured schema that sits on top of object storage in the cloud.
9394

9495
It is composed of a few layers (from lower to higher level): codec, file format, table format + metastore, and the ingestion/query layer.
9596

9697
### File formats and serialization
9798

98-
These formats are popular for shared-everything databases, using object storage as persistence layer. The data is organized in row or column, with strict schema definition. These files are immutable and offer partial reads (only headers, metadata, data page, etc). Mutation require a new upload. Most formats support nested schema, codecs, compression and data encryption. Index can be added to file metadata for faster processing.
99+
These formats are popular for shared-everything databases, using object storage as a persistence layer. The data is organized in row or column, with strict schema definition. These files are immutable and offer partial reads (only headers, metadata, data page, etc). Mutation requires a new upload. Most formats support nested schema, codecs, compression, and data encryption. Index can be added to file metadata for faster processing.
99100

100-
A single file can weight between tens of MB to a few GB. Lot of small files require more merge operation. Larger files can be costly to update.
101+
A single file can weight between tens of MB to a few GB. Lots of small files require more merge operation. Larger files can be costly to update.
101102

102103
- [Apache Arrow Columnar Format](https://arrow.apache.org/docs/format/Columnar.html) - Columnar format for in-memory Apache Arrow processing.
103104
- [Apache Avro](https://avro.apache.org/) - Row-oriented serialization for data streaming purpose.
@@ -237,6 +238,7 @@ The popular acronym for Extracting, Transforming and Loading data. ELT performs
237238
- [Apache Parquet format](https://github.com/apache/parquet-format/)
238239
- [Dremel paper](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36632.pdf)
239240
- [RDD](https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf)
241+
- [RocksDB](https://research.facebook.com/publications/rocksdb-evolution-of-development-priorities-in-a-key-value-store-serving-large-scale-applications/)
240242
- [Spanner paper](https://static.googleusercontent.com/media/research.google.com/en/us/archive/spanner-osdi2012.pdf)
241243

242244
### Architecture

0 commit comments

Comments
 (0)