Skip to content

Commit 816622f

Browse files
committed
Include images
1 parent c453131 commit 816622f

File tree

6 files changed

+15
-10
lines changed

6 files changed

+15
-10
lines changed

content/blog/apache-spark-unleashing-big-data-with-rdds-dataframes-and-beyond.md

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -57,8 +57,9 @@ Let’s break down our description:
5757

5858
## Spark Components
5959

60-
![](<https://cdn.hashnode.com/res/hashnode/image/upload/v1732356652608/fe1bec32-214f-4784-aab3-8c2d1798c01a.png> align="center")
61-
60+
<p align="center">
61+
<img width="500px" src="/images/blog/apache-spark-unleashing-big-data-with-rdds-dataframes-and-beyond/spark-components.png" alt="Spark Components">
62+
</p>
6263
## High-Level Components (Spark Applications)
6364

6465
At a high level, Spark provides several libraries that extend its functionality and are used in specialized data processing tasks.
@@ -95,8 +96,9 @@ Spark's ability to interact with these diverse storage systems allows users to w
9596

9697
## Spark’s Basic Architecture
9798

98-
![](<https://cdn.hashnode.com/res/hashnode/image/upload/v1732357077877/d3ac6b43-9b95-48ed-8720-8cedc9c6550b.png> align="center")
99-
99+
<p align="center">
100+
<img width="500px" src="/images/blog/apache-spark-unleashing-big-data-with-rdds-dataframes-and-beyond/spark-architecture.png" alt="Spark Basic Architecture">
101+
</p>
100102
### 1\. The Spark Driver
101103

102104
The Spark driver(process) is like the “brain” of your Spark application. It’s responsible for controlling everything. The driver makes decisions about what tasks to run, keeps track of the application’s progress, and talks to the cluster manager to get the computing power needed. Essentially, it manages the entire process and checks on the tasks being handled by worker nodes (executors). So basically it manages the lifecycle of the spark application.
@@ -166,8 +168,9 @@ While the DataFrame concept is not unique to Spark; R and Python also include Da
166168

167169
Below is a comparison of distributed versus single-machine analysis.
168170

169-
![](<https://cdn.hashnode.com/res/hashnode/image/upload/v1732358269819/a38be331-b107-4065-bdda-ab40b0bcbff9.png> align="center")
170-
171+
<p align="center">
172+
<img width="500px" src="images/blog/apache-spark-unleashing-big-data-with-rdds-dataframes-and-beyond/spark-dataframe.png" alt="Spark DataFrame">
173+
</p>
171174
> Note: Spark also provides the Dataset API, which combines the benefits of RDDs and DataFrames by offering both compile-time type safety and query optimization. However, the Dataset API is only supported in Scala and Java, not in Python.
172175
>
173176
## Partitions
@@ -194,16 +197,18 @@ In a **narrow transformation**, each partition of the parent RDD/DataFrame contr
194197

195198
Examples: `map` `filter`
196199

197-
![](<https://cdn.hashnode.com/res/hashnode/image/upload/v1732344376978/13ba890a-95c4-4f21-b35f-ac19570daff1.png> align="center")
198-
200+
<p align="center">
201+
<img width="500px" src="/images/blog/apache-spark-unleashing-big-data-with-rdds-dataframes-and-beyond/narrow-transformation.png" alt="Spark Narrow Transformation">
202+
</p>
199203
### Wide Transformations
200204

201205
In a **wide transformation**, data from multiple parent RDD/DataFrame partitions must be shuffled (redistributed) to form new partitions. These operations involve **network communication**, making them more expensive.
202206

203207
Examples: `groupByKey` `reduceByKey` `join`
204208

205-
![](<https://cdn.hashnode.com/res/hashnode/image/upload/v1732348673655/115212c0-c124-47fa-8c5b-de1f659c0866.png> align="center")
206-
209+
<p align="center">
210+
<img width="500px" src="/images/blog/apache-spark-unleashing-big-data-with-rdds-dataframes-and-beyond/wide-transformation.png" alt="Spark Wide Transformation">
211+
</p>
207212
## Actions
208213

209214
They are operations that trigger the execution of transformations and return results to the driver program. Actions are the point where Spark evaluates the lazy transformations applied to an RDD, DataFrame, or Dataset.
40.7 KB
Loading
31.6 KB
Loading
195 KB
Loading
97.3 KB
Loading
75.5 KB
Loading

0 commit comments

Comments
 (0)