You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: data-platform/open-source-data-platforms/oci-data-flow/code-examples/DeltaLake_Optimize/deltalake_optimizations
+40-2Lines changed: 40 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -5,13 +5,51 @@ Developers can also use Spark Streaming to perform cloud ETL on their continuous
5
5
However Spark structured streaming application can produce thousants of small files (according to microbatching and number of executors), which leads to performance degradadion.
6
6

7
7
8
-
That's why the most crucial decision is file format for your datalake.
8
+
That's why the most crucial decision is file format for your datalake. Small files can be a problem because they slow down your query reads. Listing, opening and closing many small files incurs expensive overhead. This is called “the Small File Problem”.
9
+
You can reduce the Small File Problem overhead by combining the data into bigger, more efficient files. Instead of doing it manually, pick the datalake format (delta, iceberg) and use build-in functions.
9
10
10
11
Delta Lake enables building a Lakehouse architecture on top of data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing on top of existing data lakes.
11
12
For spark streaming application and realtime processing DeltaLake has one sighificant advantage - [built-in optimization](https://delta.io/blog/delta-lake-optimize/)
12
-
13
13
OCI Data Flow supports Delta Lake by default when your Applications run Spark 3.2.1 or later - [doc](https://docs.oracle.com/en-us/iaas/data-flow/using/delta-lake-about.htm)
14
14
15
+
How to optimize data lake using DeltaLake functions:
16
+
Configure your preferences (please check DeltaLake doc):
0 commit comments