Skip to content

Commit 72bd0fb

Browse files
committed
01-cleanup-complete
1 parent 1c5ab4a commit 72bd0fb

File tree

4 files changed

+112
-23
lines changed

4 files changed

+112
-23
lines changed

modules/use-cases-architectures/pages/real-time-data-pipeline/01-create-astra-objects.adoc

Lines changed: 108 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -6,53 +6,70 @@ David Dieruf <[email protected]>
66
:title:
77
:navtitle:
88

9-
This guide is part of a series of guides that creates a real-time data pipeline. Read more about the series xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
9+
This guide is part of a series that creates a real-time data pipeline with Astra and Decodeable. For context and prerequisites, start xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
1010

1111
== Creating message topics to capture the stream of click data
1212

13-
. Navigate to your Astra portal home and choose "Create a stream"
13+
. Navigate to your Astra portal home and click "Create a Stream".
1414
+
1515
image:decodable-data-pipeline/01/image4.png[]
1616

17-
. Name the new tenant “webstore-clicks”. You can choose any cloud provider and region. Click “Create Tenant”.
17+
. Name the new streaming tenant “webstore-clicks".
18+
Choose any cloud provider and region.
19+
Click “Create Tenant”.
1820
+
1921
image:decodable-data-pipeline/01/image6.png[]
2022

21-
. You will be redirected to your new tenant’s quickstart. Navigate to the “Namespace and Topics” tab at the top.
23+
. You will be redirected to your new tenant’s quickstart. Navigate to the “Namespace and Topics” tab at the top of the screen.
2224
+
2325
image:decodable-data-pipeline/01/image16.png[]
2426

25-
. Create a new namespace with the name “production”. We are treating namespaces as logical development environments to illustrate how you could create a continuous delivery flow. You could also have namespaces for “development” and “staging”.
27+
. Create a new namespace with the name “production”.
28+
We are treating namespaces as logical development environments to illustrate how you could create a continuous delivery flow.
29+
You could also have namespaces for “development” and “staging”.
2630
+
2731
image:decodable-data-pipeline/01/image11.png[]
2832

29-
. Your namespaces view should refresh with the new namespace. Click the “Add Topic” button associated with that namespace. Name it “all-clicks” and leave it as “Persistent”. Click the “Add Topic” button to create the topic.
33+
. The namespaces view will refresh to display your new "production" namespace.
34+
Click the “Add Topic” button associated with the "production" namespace.
35+
Name your new topic “all-clicks” and leave it as a “Persistent” topic.
36+
Click the “Add Topic” button to finish creating the topic.
3037
+
3138
image:decodable-data-pipeline/01/image15.png[]
3239

33-
. Click the “Add Topic” button (again) associated with that namespace. Name it “product-clicks” and leave as “Persistent”. Click the “Add Topic” button to create the topic.
40+
. Create a second new topic.
41+
Click the “Add Topic” button associated with the "production" namespace.
42+
Name your second new topic “product-clicks” and leave it as a “Persistent” topic.
43+
Click the “Add Topic” button to finish creating the topic.
3444
+
3545
image:decodable-data-pipeline/01/image8.png[]
3646

37-
. You should have 2 namespaces. The “Production” namespace should have 2 topics.
47+
. You should have 2 namespaces.
48+
The “production” namespace should contain the "all-clicks" and "product-clicks" topics you created.
49+
The "default" namespace is automatically created by Pulsar within each new streaming tenant.
3850
+
3951
image:decodable-data-pipeline/01/image13.png[]
4052

4153
== Storing the stream of click data
4254

43-
. From the Astra portal home click “Create a Database”.
55+
. From the Astra portal home, click “Create a Database”.
4456
+
4557
image:decodable-data-pipeline/01/image18.png[]
4658

47-
. Name the database “webstore-clicks” and the keyspace “click_data”. Choose any cloud provider and region. Click “Create Database”.
59+
. Name the database “webstore-clicks” and the keyspace “click_data”.
60+
Choose any cloud provider and region.
61+
Click “Create Database”.
4862
+
4963
image:decodable-data-pipeline/01/image5.png[]
5064

51-
. The page will refresh with your new token details. Don’t worry about saving them, we will come back to retrieve this later. You can “Esc” or just head back to home. In the “Recent Resources” area of your Astra portal home you should see 2 new items.
65+
. The page will refresh with your new token details.
66+
Don’t worry about saving the tokens - we will retrieve these later.
67+
You can “Esc” or just return to your Astra portal home, where you will see your new streaming tenant and database.
5268
+
5369
image:decodable-data-pipeline/01/image1.png[]
5470

55-
. Copy/paste the following CQL statement into the terminal and hit “Enter”. This will create a table in the database that will hold our all web click data (ie: the raw data).
71+
. Copy and paste the following CQL statement into the CQL console and press “Enter”.
72+
This will create a table in the database to hold our "all-clicks" web click data (ie: the raw data).
5673
+
5774
[source, sql]
5875
----
@@ -69,7 +86,9 @@ CREATE TABLE IF NOT EXISTS click_data.all_clicks (
6986
);
7087
----
7188

72-
. Then copy/paste the following CQL statement (again) into the terminal and hit “Enter”. This will create a second table in the database that will hold our filtered product web clicks.
89+
. Create a second table in the database.
90+
Copy and paste the following CQL statement into the CQL console and press “Enter”.
91+
This will create a second table in the database to hold our "product-clicks" web click data (ie: the filtered data).
7392
+
7493
[source, sql]
7594
----
@@ -79,17 +98,81 @@ CREATE TABLE click_data.product_clicks (
7998
click_timestamp timestamp,
8099
PRIMARY KEY ((catalog_area_name), product_name, click_timestamp)
81100
) WITH CLUSTERING ORDER BY (product_name ASC, click_timestamp DESC);
82-
83101
----
84102

85103
. You can confirm everything was created correctly by describing the keyspace in the CQL terminal.
86104
+
87-
[source, sql]
105+
[tabs]
106+
====
107+
CQL::
108+
+
109+
--
110+
[source,sql,subs="attributes+"]
88111
----
89112
describe click_data;
90113
----
114+
--
115+
116+
Result::
91117
+
92-
The output will be 3 “create” CQL statements for the keyspace, the click_data.all_clicks table, and the click_data.product_clicks table
118+
--
119+
[source,sql,subs="attributes+"]
120+
----
121+
token@cqlsh> describe click_data;
122+
123+
CREATE KEYSPACE click_data WITH replication = {'class': 'NetworkTopologyStrategy', 'us-east-1': '3'} AND durable_writes = true;
124+
125+
CREATE TABLE click_data.all_clicks (
126+
operating_system text,
127+
browser_type text,
128+
url_host text,
129+
url_path text,
130+
click_timestamp bigint,
131+
url_protocol text,
132+
url_query text,
133+
visitor_id uuid,
134+
PRIMARY KEY ((operating_system, browser_type, url_host, url_path), click_timestamp)
135+
) WITH CLUSTERING ORDER BY (click_timestamp ASC)
136+
AND additional_write_policy = '99PERCENTILE'
137+
AND bloom_filter_fp_chance = 0.01
138+
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
139+
AND comment = ''
140+
AND compaction = {'class': 'org.apache.cassandra.db.compaction.UnifiedCompactionStrategy'}
141+
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
142+
AND crc_check_chance = 1.0
143+
AND default_time_to_live = 0
144+
AND gc_grace_seconds = 864000
145+
AND max_index_interval = 2048
146+
AND memtable_flush_period_in_ms = 0
147+
AND min_index_interval = 128
148+
AND read_repair = 'BLOCKING'
149+
AND speculative_retry = '99PERCENTILE';
150+
151+
CREATE TABLE click_data.product_clicks (
152+
catalog_area_name text,
153+
product_name text,
154+
click_timestamp timestamp,
155+
PRIMARY KEY (catalog_area_name, product_name, click_timestamp)
156+
) WITH CLUSTERING ORDER BY (product_name ASC, click_timestamp DESC)
157+
AND additional_write_policy = '99PERCENTILE'
158+
AND bloom_filter_fp_chance = 0.01
159+
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
160+
AND comment = ''
161+
AND compaction = {'class': 'org.apache.cassandra.db.compaction.UnifiedCompactionStrategy'}
162+
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
163+
AND crc_check_chance = 1.0
164+
AND default_time_to_live = 0
165+
AND gc_grace_seconds = 864000
166+
AND max_index_interval = 2048
167+
AND memtable_flush_period_in_ms = 0
168+
AND min_index_interval = 128
169+
AND read_repair = 'BLOCKING'
170+
AND speculative_retry = '99PERCENTILE';
171+
----
172+
--
173+
====
174+
175+
The output displays three “create” CQL statements for the “click_data” keyspace, the `click_data.all_clicks` table, and the `click_data.product_clicks` table.
93176

94177
== Connecting the topics to the store
95178

@@ -142,9 +225,12 @@ image:decodable-data-pipeline/01/image17.png[]
142225
|===
143226

144227
+
145-
WARNING: You are going to need the token again while creating a second sink. Either paste it in notepad (or some temp safe place) or keep the browser tab open.
228+
WARNING: You will need the token again when creating a second sink.
229+
Either paste it in notepad (or some temp safe place) or keep the browser tab open.
146230

147-
. Click “Create” to create the sink. You will be directed back to the Sinks listing where your new sink should be initializing. Once it’s ready the status will automatically change to “Running”.
231+
. Click “Create” to create the sink.
232+
You will be directed back to the Sinks listing where your new sink is initializing.
233+
When your new sink is ready, its status will change to “Running”.
148234
+
149235
image:decodable-data-pipeline/01/image14.png[]
150236

@@ -180,15 +266,16 @@ image:decodable-data-pipeline/01/image14.png[]
180266
|(leave alone)
181267
|===
182268

183-
. If everything goes smooth you should have 2 sinks “Running”.
269+
. If everything goes smoothly, you should have 2 sinks in a “Running” state.
184270
+
185271
image:decodable-data-pipeline/01/image9.png[]
186272
+
187273
[NOTE]
188274
====
189-
To debug you can click the sink name, scroll to the bottom terminal output area to view deployment logs. This is a semi-verbose log of the sink starting, validating, and running.
275+
To debug, click the sink name and scroll to the bottom of the sink's page, where there is a terminal output area to view deployment logs.
276+
This is a semi-verbose log of the sink starting, validating, and running.
190277
====
191278

192279
== Next step
193280

194-
With the Astra objects in place, now it's time to get the Decodable processing set up. xref:real-time-data-pipeline/03-put-it-all-together.adoc[Setup Decodable >>]
281+
Great work! With the Astra objects in place, let's move on to setting up the Decodable processing. xref:real-time-data-pipeline/03-put-it-all-together.adoc[Setup Decodable >>]

modules/use-cases-architectures/pages/real-time-data-pipeline/02-create-decodable-objects.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ David Dieruf <[email protected]>
66
:title:
77
:navtitle:
88

9-
This guide is part of a series of guides that creates a real-time data pipeline. Read more about the series xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
9+
This guide is part of a series that creates a real-time data pipeline with Astra and Decodeable. For context and prerequisites, start xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
1010

1111
== The Astra Streaming connection info
1212

modules/use-cases-architectures/pages/real-time-data-pipeline/03-put-it-all-together.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ David Dieruf <[email protected]>
66
:title:
77
:navtitle:
88

9-
This guide is part of a series of guides that creates a real-time data pipeline. Read more about the series xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
9+
This guide is part of a series that creates a real-time data pipeline with Astra and Decodeable. For context and prerequisites, start xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
1010

1111
Now we have all the pieces of our data processing pipeline in place. It’s time to start the connection and pipelines up and input some test data.
1212

modules/use-cases-architectures/pages/real-time-data-pipeline/04-debugging-and-clean-up.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ David Dieruf <[email protected]>
66
:title:
77
:navtitle:
88

9+
This guide is part of a series that creates a real-time data pipeline with Astra and Decodeable. For context and prerequisites, start xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
10+
911
== Debugging the pipeline
1012

1113
Maybe things didn’t go so smoothly for you. Or maybe you're just having a bad day :(. Whatever the case use the flow above to find where in the pipeline things are broken. Then get a bit deeper into that object. You’ll want to test input data at the point of failure. The first question to answer is if the input data is malformed or if the object itself is erring. Decodable’s UI gives you the ability in each pipeline to “Preview” the processing. This is a very powerful debugging tool. Your Astra Tenant has the “Try Me” area where you can set up producing and consuming messages in specific topics. These tools can help you recreate each step of the pipeline and debug issues.

0 commit comments

Comments
 (0)