Skip to content

Commit 1c5ab4a

Browse files
committed
index-cleanup-complete
1 parent d2d2849 commit 1c5ab4a

File tree

1 file changed

+34
-20
lines changed
  • modules/use-cases-architectures/pages/real-time-data-pipeline

1 file changed

+34
-20
lines changed

modules/use-cases-architectures/pages/real-time-data-pipeline/index.adoc

Lines changed: 34 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,34 @@ David Dieruf <[email protected]>
66
:title: Real-time data pipelines with DataStax Astra and Decodable
77
:navtitle: Data pipeline with Astra and Decodable
88

9-
This guide takes a hands-on approach to defining the objects that make up a real-time data processing pipeline. You'll first create and configure the needed thing in Astra (both DB and Streaming) and then create and configuration the needed things in Decodable. The final step will be to start everything up and see it all work!
9+
This guide presents a hands-on approach to defining the objects that make up a real-time data processing pipeline.
10+
You'll create and configure an Astra streaming tenant and an Astra database, connect them with data processing pipelines in Decodable, and send a single data record through to validate your real-time data pipeline. +
11+
For extra credit, we'll demonstrate processing under load with a bulk of data.
1012

11-
All work being done uses Astra and Decodables’ UI in your web browser (no terminal or scripting). You will need a safe place to temporarily hold tokens.
13+
This guide uses the Astra and Decodable UIs in your web browser, so no terminal or scripting is required!
14+
You just need a safe place to temporarily store access tokens.
1215

1316
== Architecture
1417

15-
Before we get started in this journey, let’s talk about what we’re going to create and why we need these things. Our requirements are to create a pipeline that takes in raw web click data, breaks it into queryable values, saves it, and filters for certain values. Both the parsed click data and the filtered data should be saved. We are going to use Decodable’s real-time stream processing powered by Flink as well as DataStax’s Astra platform powered by Pulsar and Cassandra. This pipeline is meant to be production ready because we’re using cloud based services that are automatic (scaling, low latency, etc). Below is a diagram of the components involved followed by a brief description of each.
18+
Before we get started on our journey, let’s discuss the objects we’re creating and why we need to create them. +
19+
We want to build a pipeline that takes in raw web click data, breaks it into queryable values, saves the data, and filters for certain values. Both the parsed click data and the filtered data will be saved. We will use Decodable’s real-time stream processing (powered by Apache Flink) as well as DataStax’s Astra platform (powered by Apache Pulsar and Apache Cassandra).
20+
This pipeline is intended to be production ready, because we’re using cloud-based services that are automatically setting values for scaling, latency, and security. +
21+
22+
The pipeline components are outlined below.
1623

1724
image:decodable-data-pipeline/real-time-data-pipeline.png[Real-time data pipelines with DataStax Astra and Decodable]
1825

19-
*Ecommerce Site Clicks*: where the data comes from
26+
*E-Commerce Site Clicks*
27+
28+
- Where the data comes from
29+
30+
*DataStax Astra*
31+
32+
- All Clicks Topic: a collection of messages with normalized click data
33+
- Product Clicks Topic: a collection of messages with normalized and filtered click data
34+
- All Clicks Sink: a function that writes message data to a certain DB table
35+
- Product Clicks Sink: a function that writes message data to a certain DB table
36+
- Cassandra: data store
2037

2138
*Decodable*
2239

@@ -28,45 +45,42 @@ image:decodable-data-pipeline/real-time-data-pipeline.png[Real-time data pipelin
2845
- Product Clicks Pipeline: a SQL based pipeline that takes normalized data and filters for only clicks associated with a product
2946
- Product Clicks Stream: the flow of filtered product click data that other objects can “listen” to
3047

31-
*DataStax Astra*
32-
33-
- All Clicks Topic: a collection of messages with normalized click data
34-
- Product Clicks Topic: a collection of messages with normalized and filtered click data
35-
- All Clicks Sink: a function that writes message data to a certain DB table
36-
- Product Clicks Sink: a function that writes message data to a certain DB table
37-
- Cassandra: data store
38-
39-
== Pre-req’s
48+
== Prerequisites
4049

41-
To complete this guide you will need the following in place.
50+
You will need the following prerequisites in place to complete this guide:
4251

4352
- Astra (free) account - https://astra.datastax.com/signupstreaming[Sign up now^]{external-link-icon}
4453
- Decodable (free) account - https://app.decodable.co/-/accounts/create[Sign up now^]{external-link-icon}
4554

4655
[NOTE]
4756
====
48-
We are staying within the free tier of both Astra and Decodable. You won’t need a credit card for any of this guide.
57+
This guide stays within the free tiers of both Astra and Decodable.
58+
You won’t need a credit card for any of this guide.
4959
====
5060

5161
== Getting Started
5262

53-
The guide is broken into a few milestones. You'll want to follow these in order for everything to work.
63+
The guide is broken into a few milestones. You'll want to follow these milestones in order for everything to work.
5464

5565
. xref:use-cases-architectures:real-time-data-pipeline/01-create-astra-objects.adoc[]
5666
+
57-
In this guide you will be creating a new streaming tenant in Astra Streaming and then creating a namespace with topics. Then you’ll create a database in Astra DB and hook the topics and database together with a Sink Connector.
67+
In this guide, you will create a new streaming tenant in Astra Streaming with a namespace and topics.
68+
Then, you’ll create a database in Astra DB, and hook the streaming topics and database together with a sink connector.
5869

5970
. xref:use-cases-architectures:real-time-data-pipeline/02-create-decodable-objects.adoc[]
6071
+
61-
In this guide you will create a few pipelines that will process incoming data and connectors that bond a Decodable stream of data with the previously created Astra Streaming topics.
72+
In this guide, you will create pipelines for processing incoming data and connectors that bond a Decodable stream of data with the Astra Streaming topics created in step 1.
6273

6374
. xref:use-cases-architectures:real-time-data-pipeline/03-put-it-all-together.adoc[]
6475
+
65-
This is were the magic will happen. This guide will start the processing pipelines, send a single record of data through, and then validate everything happened as expected. As extra credit, you are also given the opportunity to put the processing under load with a bulk of data.
76+
This is where the magic happens!
77+
In this guide, you will start the processing pipelines, send a single record of data through them, and then validate everything happened as expected.
78+
For extra credit, you are also given the opportunity to put the processing under load with a bulk of data.
6679

6780
. xref:use-cases-architectures:real-time-data-pipeline/04-debugging-and-clean-up.adoc[]
6881
+
69-
This final milestone helps you with debugging the pipelines, in case something isn’t going quite right. You are also given instructions on how to tear down all the objects previously created.
82+
This final milestone helps with debugging the pipelines in case something doesn't go quite right.
83+
You are also given instructions on how to tear down and clean up all the objects previously created, because we're all about being good citizens of the cloud.
7084

7185
[cols=^,frame=none,grid=none]
7286
|===

0 commit comments

Comments
 (0)