Skip to content

Commit 8e09ecc

Browse files
authored
Create deltalake_optimizations
1 parent b17b266 commit 8e09ecc

File tree

1 file changed

+22
-0
lines changed

1 file changed

+22
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Delta Lake Optimization
2+
3+
Oracle Cloud Infrastructure (OCI) Data Flow is a fully managed Apache Spark service that performs processing tasks on extremely large datasets—without infrastructure to deploy or manage.
4+
Developers can also use Spark Streaming to perform cloud ETL on their continuously produced streaming data.
5+
However Spark structured streaming application can produce thousants of small files (according to microbatching and number of executors), which leads to performance degradadion.
6+
That's why the most crucial decision is file format for your datalake.
7+
8+
Delta Lake enables building a Lakehouse architecture on top of data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing on top of existing data lakes.
9+
For spark streaming application and realtime processing DeltaLake has one sighificant advantage - [built-in optimization](https://delta.io/blog/delta-lake-optimize/)
10+
11+
OCI Data Flow supports Delta Lake by default when your Applications run Spark 3.2.1 or later - [doc](https://docs.oracle.com/en-us/iaas/data-flow/using/delta-lake-about.htm)
12+
13+
14+
15+
16+
17+
18+
19+
# License
20+
Copyright (c) 2024 Oracle and/or its affiliates.
21+
Licensed under the Universal Permissive License (UPL), Version 1.0.
22+
See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details.

0 commit comments

Comments
 (0)