Skip to content

Commit 608f1a0

Browse files
authored
Update deltalake_optimizations
with image
1 parent a26a392 commit 608f1a0

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

data-platform/open-source-data-platforms/oci-data-flow/code-examples/DeltaLake_Optimize/deltalake_optimizations

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
Oracle Cloud Infrastructure (OCI) Data Flow is a fully managed Apache Spark service that performs processing tasks on extremely large datasets—without infrastructure to deploy or manage.
44
Developers can also use Spark Streaming to perform cloud ETL on their continuously produced streaming data.
55
However Spark structured streaming application can produce thousants of small files (according to microbatching and number of executors), which leads to performance degradadion.
6+
![small files in datalake](https://github.com/oracle-devrel/technology-engineering/blob/sylwesterdec-patch-6/data-platform/open-source-data-platforms/oci-data-flow/code-examples/DeltaLake_Optimize/files_in_datalake.png)
7+
68
That's why the most crucial decision is file format for your datalake.
79

810
Delta Lake enables building a Lakehouse architecture on top of data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing on top of existing data lakes.

0 commit comments

Comments
 (0)