Skip to content

Commit 365e77c

Browse files
committed
Fix markdown lint
1 parent ea031c8 commit 365e77c

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

content/blog/apache-spark-unleashing-big-data-with-rdds-dataframes-and-beyond.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,11 @@ Let’s break down our description:
2323

2424
`Computing Engine`: It focuses on computation rather than storage, allowing it to work with various storage systems like Hadoop, Amazon S3, and Apache Cassandra. This flexibility makes Spark suitable for diverse environments, including cloud and streaming applications.
2525

26-
`Libraries`: It provides a unified API for common data analysis tasks. It supports both standard libraries that ship with the engine as well as external libraries published as third-party packages by the open-source communities. The standard libraries includes libraries for SQL (Spark SQL), machine learning (MLlib), stream processing (Structured Streaming), and graph analytics (GraphX).
26+
`Libraries`: It provides a unified API for common data analysis tasks. It supports both standard libraries that ship with the engine as well as external libraries published as third-party packages by the open-source communities. The standard libraries include libraries for SQL (Spark SQL), machine learning (MLlib), stream processing (Structured Streaming), and graph analytics (GraphX).
2727

2828
## Where to Run Spark ?
2929

30-
1. **Run Spark Locally**
30+
### Run Spark Locally
3131

3232
* Install Java (required as Spark is written in Scala and runs on the JVM) and Python (if using the Python API).
3333

@@ -43,13 +43,13 @@ Let’s break down our description:
4343

4444
* SQL: `./bin/spark-sql`
4545

46-
2. **Run Spark in the Cloud**
46+
### Run Spark in the Cloud
4747

4848
* No installation required; provides a web-based interactive notebook environment.
4949

5050
* **Option**: Use [Databricks Community Edition \[free\]](https://www.databricks.com/try-databricks#account)
5151

52-
3. **Building Spark from Source**
52+
### Building Spark from Source
5353

5454
* **Source**: Download the source code from the [Apache Spark download page](http://spark.apache.org/downloads.html).
5555

@@ -127,13 +127,13 @@ They are the fundamental building block of Spark's older API, introduced in the
127127

128128
An RDD represents a distributed collection of immutable records that can be processed in parallel across a cluster. Unlike DataFrames(High-Level API), where records are structured and organized into rows with known schemas, RDDs are more flexible. They allow developers to store and manipulate data in any format—whether Java, Scala, or Python objects. This flexibility gives you a lot of control but requires more manual effort compared to using higher-level APIs like DataFrames.
129129

130-
**Key properties of RDDS**
130+
### Key properties of RDDS
131131

132132
* **Fault Tolerance:** RDDs maintain a lineage graph that tracks the transformations applied to the data. If a partition is lost due to a node failure, Spark can recompute that partition by reapplying the transformations from the original dataset.
133133

134134
* **In-Memory Computation:** RDDs are designed for in-memory computation, which allows Spark to process data much faster than traditional disk-based systems. By keeping data in memory, Spark minimizes disk I/O and reduces latency.
135135

136-
**Creating RDDs**
136+
### Creating RDDs
137137

138138
Now that we discussed some key RDD properties, let’s begin applying them so that you can better understand how to use them.
139139

0 commit comments

Comments
 (0)