Skip to content

Commit 61a51b7

Browse files
committed
03-cleanup-done
1 parent 6c3111e commit 61a51b7

File tree

5 files changed

+119
-39
lines changed

5 files changed

+119
-39
lines changed
510 KB
Loading

modules/use-cases-architectures/pages/real-time-data-pipeline/01-create-astra-objects.adoc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,10 @@ David Dieruf <[email protected]>
66
:title:
77
:navtitle:
88

9+
[NOTE]
10+
====
911
This guide is part of a series that creates a real-time data pipeline with Astra and Decodable. For context and prerequisites, start xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
12+
====
1013

1114
== Creating message topics to capture the stream of click data
1215

modules/use-cases-architectures/pages/real-time-data-pipeline/02-create-decodable-objects.adoc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,10 @@ David Dieruf <[email protected]>
66
:title:
77
:navtitle:
88

9+
[NOTE]
10+
====
911
This guide is part of a series that creates a real-time data pipeline with Astra and Decodable. For context and prerequisites, start xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
12+
====
1013

1114
== The Astra Streaming connection info
1215

modules/use-cases-architectures/pages/real-time-data-pipeline/03-put-it-all-together.adoc

Lines changed: 110 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -6,32 +6,39 @@ David Dieruf <[email protected]>
66
:title:
77
:navtitle:
88

9+
[NOTE]
10+
====
911
This guide is part of a series that creates a real-time data pipeline with Astra and Decodable. For context and prerequisites, start xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
12+
====
1013

11-
Now we have all the pieces of our data processing pipeline in place. It’s time to start the connection and pipelines up and input some test data.
14+
Now that we have all the pieces of our data processing pipeline in place, it’s time to start the connection and pipelines up and input some test data.
1215

1316
== Starting the processing
1417

15-
. Navigate to the “Connections” area and click the 3 dots at the right for each connection. Click the “Start” option on all 3 connections.
18+
. Navigate to the “Connections” area and click the three dots at the right for each connection.
19+
Click the “Start” option on all 3 connections.
1620
+
1721
image:decodable-data-pipeline/03/image9.png[]
1822

19-
. It might take a minute or so but each connection should refresh with a state of “Running”.
23+
. Be patient.
24+
It might take a minute or so, but each connection should refresh with a state of “Running”.
2025
+
2126
image:decodable-data-pipeline/03/image1.png[]
2227
+
23-
TIP: If one of the connections has an issue startup up (like a wrong setting or expired token) you can click on that connection to get more information.
28+
TIP: If one of the connections has an issue starting up (like an incorrect setting or expired token), click on that connection for more information.
2429

25-
. Navigate to the “Pipelines” area and use the same menu on each pipeline to start. Same as the connections, they might take a minute or so to get going. Grab a coffee while you wait - you’ve earned it.
30+
. Navigate to the “Pipelines” area and use the same three-dot menu on each pipeline to start.
31+
As with the connections, they might take a minute or so to get going.
32+
Grab a coffee while you wait - you’ve earned it.
2633
+
2734
image:decodable-data-pipeline/03/image3.png[]
2835

29-
Let’s make sure we have all the pieces in order
36+
Before ingesting data, let’s make sure we have all the pieces in order...
3037

31-
* REST connection running: CHECK
32-
* Astra Streaming connections running : CHECK
33-
* Normalization pipeline running: CHECK
34-
* Product clicks filter pipeline running: CHECK
38+
* REST connection running? **CHECK!**
39+
* Astra Streaming connections running? **CHECK!**
40+
* Normalization pipeline running? **CHECK!**
41+
* Product clicks filter pipeline running? **CHECK!**
3542

3643
== Your first ingested data
3744

@@ -52,82 +59,146 @@ Let’s make sure we have all the pieces in order…
5259
]
5360
----
5461

55-
. Click “Upload” to simulate data being posted to the endpoint. You should receive a confirmation that data has been received.
62+
. Click “Upload” to simulate data being posted to the endpoint. You will receive a confirmation that data has been received.
5663

57-
NOTE: NO, this was not a big moment with cheers and balloons. The celebration is at the end of the next area.
64+
No, this was not the big moment with cheers and balloons - the celebration is at the end of the next area.
5865

5966
== Following the flow
6067

6168
For this first record of data, let’s look at each step along the way and confirm processing is working.
6269

63-
. After the data was ingested, the “Webstore-Raw-Clicks-Normalize-Pipeline” should have received it. You can confirm this by navigating the “Pipelines” area and clicking that pipeline to see its metrics. Notice in the “Input Metrics” area 1 record has been received.
70+
. After the data was ingested, the “Webstore-Raw-Clicks-Normalize-Pipeline” received it.
71+
You can confirm this by inspecting the “Webstore-Raw-Clicks-Normalize-Pipeline” pipeline metrics.
72+
The “Input Metrics” and "Output Metrics" areas report that one record has been received.
73+
This confirms that the data passed successfully through this pipeline.
6474
+
6575
image:decodable-data-pipeline/03/image2.png[]
6676

67-
. Also notice in the “Output Metrics” 1 record has been written. This confirms that the data passed successfully through this pipeline.
68-
69-
. Next we can go to the “Connectors” area and click the “Astra-Streaming-All-Webclicks-Connector”. In the “Input Metrics” we see that 1 record has been received.
77+
. In the “Connections” area, click the “Astra-Streaming-All-Webclicks-Connector”.
78+
In “Input Metrics”, we see that 1 record has been received.
7079
+
7180
image:decodable-data-pipeline/03/image4.png[]
7281

73-
. Now we can go to our Astra Streaming tenant “webstore-clicks” and navigate to the “Namespace and Topics” area. Expand the “Production” namespace and click the “all-clicks” topic. Notice that “Data In” has 1 message and “Data Out” has 1 message. That means the topic took the data in and a consumer acknowledged receipt of the message.
82+
. Return to your Astra Streaming tenant “webstore-clicks” and navigate to the “Namespace and Topics” area.
83+
Expand the “production” namespace and select the “all-clicks” topic.
84+
Confirm that “Data In” has 1 message and “Data Out” has 1 message. This means the topic took the data in and a consumer acknowledged receipt of the message.
7485
+
7586
image:decodable-data-pipeline/03/image6.png[]
7687

77-
. On to the “Sinks” tab in Astra and click the “all-clicks” sink. In “Instance Stats” you see “Reads” has a value of 1 and “Writes” has a value of 1. This confirms the Sink consumed a message from the topic and wrote the data to the store.
88+
. In the “Sinks” tab in Astra, select the “all-clicks” sink. In “Instance Stats” you see “Reads” has a value of 1 and “Writes” has a value of 1. This means the sink consumed a message from the topic and wrote the data to the store.
7889
+
7990
image:decodable-data-pipeline/03/image5.png[]
8091

81-
. Let’s look at the final data in Astra DB. Navigate to the Astra home and click the “webstore-clicks” Serverless Database. Choose the “CQL Console” tab and copy/paste the following command in the terminal.
92+
. Finally, let’s look at the final data in your Astra database. Navigate to the Astra home and click the “webstore-clicks” Serverless Database. Choose the “CQL Console” tab and copy/paste the following command in the terminal.
8293
+
83-
[source,sql]
94+
[tabs]
95+
====
96+
CQL::
97+
+
98+
--
99+
[source,sql,subs="attributes+"]
84100
----
85101
select * from click_data.all_clicks;
86102
----
103+
--
104+
105+
Result::
87106
+
88-
Your should see a single record output.
89-
+
90-
image:decodable-data-pipeline/03/image8.png[]
107+
--
108+
[source,sql]
109+
----
110+
token@cqlsh> EXPAND ON; //this cleans up the output
111+
Now Expanded output is enabled
112+
token@cqlsh> select * from click_data.all_clicks;
113+
@ Row 1
114+
------------------+----------------------------------------
115+
operating_system | Windows
116+
browser_type | Chrome/102.0.0.0
117+
url_host | somedomain.com
118+
url_path | /catalog/area1/yetanother-cool-product
119+
click_timestamp | 1675286722000
120+
url_protocol | https
121+
url_query | a=b&c=d
122+
visitor_id | b56afbf3-321f-49c1-919c-b2ea3e550b07
123+
124+
(1 rows)
125+
----
126+
--
127+
====
128+
129+
This confirms that the data was successfully written to the database.
91130

92-
{emoji-tada}{emoji-tada} Queue the loud cheers and high-fives! Our pipeline ingested raw web click data, normalized it, and persisted the parsed data to the database! Woot woot!!
131+
{emoji-tada}{emoji-tada} Cue the cheers and high-fives! Our pipeline ingested raw web click data, normalized it, and persisted the parsed data to the database! Woot woot!!
93132

94133
== Follow the flow of the product clicks data
95134

96-
Similar to how you followed the above flow follow this flow to confirm the filtered messages were stored.
135+
Similar to how you followed the above flow of raw click data, follow this flow to confirm the filtered messages were stored.
97136

98137
. Navigate to your Decodable pipeline named “Webstore-Product-Clicks-Pipeline”.
99138
.. The “Input Metrics” should be 1 record and the “Output Metrics” should be 1 record.
100139

101140
. Navigate to your Decodable connection named “Astra-Streaming-Product-Webclicks-Connection”.
102141
.. The “Input Metrics” should be 1 record.
103142

104-
. Head over to your Astra tenant to the production/product-clicks topic.
143+
. Navigate to your Astra tenant and check the production/product-clicks topic.
105144
.. There should be 1 message in “Data In” and 1 message in “Data Out”.
106145

107-
. Finally, to your Astra database CQL Console.
108-
.. Query the product clicks table
146+
. Finally, navigate to your Astra database CQL Console.
147+
.. Query the product_clicks table:
109148
+
110-
[source,sql]
149+
[tabs]
150+
====
151+
CQL::
152+
+
153+
--
154+
[source,sql,subs="attributes+"]
111155
----
112156
select * from click_data.product_clicks;
113157
----
114-
+
115-
image:decodable-data-pipeline/03/image7.png[]
158+
--
116159
117-
{emoji-rocket}{emoji-rocket} Yesssss! The first web click data we entered happened to be a product click. So the data was filtered in the pipeline and processed into the correct table!
160+
Result::
161+
+
162+
--
163+
[source,sql]
164+
----
165+
@ Row 1
166+
-------------------+---------------------------------
167+
catalog_area_name | area1
168+
product_name | yetanother cool product
169+
click_timestamp | 2023-02-01 21:25:22.000000+0000
170+
----
171+
--
172+
====
173+
{emoji-rocket}{emoji-rocket} Yesssss! The first web click data we entered happened to be a product click, so the data was filtered in the pipeline and processed into the correct table!
118174

119175
== Example real-time site data
120176

121-
Let’s see what this can do! To put a load on the pipeline we’ll need a way to continuously post data to our endpoint. Below are a few examples. Use the download button below to download a zip of a static(html) site ecommerce catalog, that silently posts click data to an endpoint. The site is a copy of https://www.blazemeter.com/[BlazeMeter^]’s{external-link-icon} https://www.demoblaze.com/[Demoblaze site^]{external-link-icon}.
177+
Let’s see what this can do! To put a load on the pipeline, we’ll need a way to continuously post data to our endpoint. Below are a few examples.
122178

123-
You’ll need 2 pieces of information the Endpoint URL and an authorization token. Learn more about retrieving both of those in https://docs.decodable.co/docs/connector-reference-rest#authentication[Decodable documentation^]{external-link-icon}.
124-
125-
Once you extract the zip, open the folder in your text editor of IDE of choice and look in the script.js file. There are 2 placeholders for the data retrieved above.
179+
. Use the download button below to download a zip of a static HTML e-commerce catalog that silently posts click data to an endpoint.
180+
The site is a copy of https://www.blazemeter.com/[BlazeMeter^]’s{external-link-icon} https://www.demoblaze.com/[Demoblaze site^]{external-link-icon}.
181+
+
182+
[.button]#xref:attachment$web-clicks-website.zip[*Download Now*]#
126183

127-
Open the phones.html file in your browser (yes, as a local file) and begin clicking on products. Each click should be a new post to your Decodable endpoint.
184+
. Extract the zip, open the folder in your text editor or IDE of choice, and open the "script.js" file.
185+
There are 2 placeholders for data you'll need to retrieve from Decodable: the Endpoint URL and an authorization token.
186+
+
187+
[source,bash]
188+
----
189+
function post_click(url){
190+
let decodable_token = "access token: <value retrieved from access_token in .decodable/auth>";
191+
let endpoint_url = "https://ddieruf.api.decodable.co/v1alpha2/connections/4f003544/events";
192+
----
193+
+
194+
Learn more about retrieving the Endpoint URL and auth token in the https://docs.decodable.co/docs/connector-reference-rest#authentication[Decodable documentation^]{external-link-icon}.
128195

129-
[.button]#xref:attachment$web-clicks-website.zip[*Download Now*]#
196+
. Replace the placeholders with your retrieved values and save "script.js".
197+
. Open the "phones.html" file in your browser (yes, as a local file) and begin clicking on products.
198+
Each click should be a new post to your Decodable endpoint.
199+
+
200+
image:decodable-data-pipeline/03/image10.png[]
130201

131202
== Next step
132203

133-
Continue on with cleaning up your environments and debugging tips! xref:real-time-data-pipeline/04-debugging-and-clean-up.adoc[Next >>]
204+
Continue on to the last step for debugging and cleanup! xref:real-time-data-pipeline/04-debugging-and-clean-up.adoc[Next >>]

modules/use-cases-architectures/pages/real-time-data-pipeline/04-debugging-and-clean-up.adoc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,10 @@ David Dieruf <[email protected]>
66
:title:
77
:navtitle:
88

9+
[NOTE]
10+
====
911
This guide is part of a series that creates a real-time data pipeline with Astra and Decodable. For context and prerequisites, start xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
12+
====
1013

1114
== Debugging the pipeline
1215

0 commit comments

Comments
 (0)