You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This guide is part of a series of guides that creates a real-time data pipeline. Read more about the series xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
9
+
This guide is part of a series that creates a real-time data pipeline with Astra and Decodeable. For context and prerequisites, start xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
10
10
11
11
== Creating message topics to capture the stream of click data
12
12
13
-
. Navigate to your Astra portal home and choose "Create a stream"
13
+
. Navigate to your Astra portal home and click "Create a Stream".
14
14
+
15
15
image:decodable-data-pipeline/01/image4.png[]
16
16
17
-
. Name the new tenant “webstore-clicks”. You can choose any cloud provider and region. Click “Create Tenant”.
17
+
. Name the new streaming tenant “webstore-clicks".
18
+
Choose any cloud provider and region.
19
+
Click “Create Tenant”.
18
20
+
19
21
image:decodable-data-pipeline/01/image6.png[]
20
22
21
-
. You will be redirected to your new tenant’s quickstart. Navigate to the “Namespace and Topics” tab at the top.
23
+
. You will be redirected to your new tenant’s quickstart. Navigate to the “Namespace and Topics” tab at the top of the screen.
22
24
+
23
25
image:decodable-data-pipeline/01/image16.png[]
24
26
25
-
. Create a new namespace with the name “production”. We are treating namespaces as logical development environments to illustrate how you could create a continuous delivery flow. You could also have namespaces for “development” and “staging”.
27
+
. Create a new namespace with the name “production”.
28
+
We are treating namespaces as logical development environments to illustrate how you could create a continuous delivery flow.
29
+
You could also have namespaces for “development” and “staging”.
26
30
+
27
31
image:decodable-data-pipeline/01/image11.png[]
28
32
29
-
. Your namespaces view should refresh with the new namespace. Click the “Add Topic” button associated with that namespace. Name it “all-clicks” and leave it as “Persistent”. Click the “Add Topic” button to create the topic.
33
+
. The namespaces view will refresh to display your new "production" namespace.
34
+
Click the “Add Topic” button associated with the "production" namespace.
35
+
Name your new topic “all-clicks” and leave it as a “Persistent” topic.
36
+
Click the “Add Topic” button to finish creating the topic.
30
37
+
31
38
image:decodable-data-pipeline/01/image15.png[]
32
39
33
-
. Click the “Add Topic” button (again) associated with that namespace. Name it “product-clicks” and leave as “Persistent”. Click the “Add Topic” button to create the topic.
40
+
. Create a second new topic.
41
+
Click the “Add Topic” button associated with the "production" namespace.
42
+
Name your second new topic “product-clicks” and leave it as a “Persistent” topic.
43
+
Click the “Add Topic” button to finish creating the topic.
34
44
+
35
45
image:decodable-data-pipeline/01/image8.png[]
36
46
37
-
. You should have 2 namespaces. The “Production” namespace should have 2 topics.
47
+
. You should have 2 namespaces.
48
+
The “production” namespace should contain the "all-clicks" and "product-clicks" topics you created.
49
+
The "default" namespace is automatically created by Pulsar within each new streaming tenant.
38
50
+
39
51
image:decodable-data-pipeline/01/image13.png[]
40
52
41
53
== Storing the stream of click data
42
54
43
-
. From the Astra portal home click “Create a Database”.
55
+
. From the Astra portal home, click “Create a Database”.
44
56
+
45
57
image:decodable-data-pipeline/01/image18.png[]
46
58
47
-
. Name the database “webstore-clicks” and the keyspace “click_data”. Choose any cloud provider and region. Click “Create Database”.
59
+
. Name the database “webstore-clicks” and the keyspace “click_data”.
60
+
Choose any cloud provider and region.
61
+
Click “Create Database”.
48
62
+
49
63
image:decodable-data-pipeline/01/image5.png[]
50
64
51
-
. The page will refresh with your new token details. Don’t worry about saving them, we will come back to retrieve this later. You can “Esc” or just head back to home. In the “Recent Resources” area of your Astra portal home you should see 2 new items.
65
+
. The page will refresh with your new token details.
66
+
Don’t worry about saving the tokens - we will retrieve these later.
67
+
You can “Esc” or just return to your Astra portal home, where you will see your new streaming tenant and database.
52
68
+
53
69
image:decodable-data-pipeline/01/image1.png[]
54
70
55
-
. Copy/paste the following CQL statement into the terminal and hit “Enter”. This will create a table in the database that will hold our all web click data (ie: the raw data).
71
+
. Copy and paste the following CQL statement into the CQL console and press “Enter”.
72
+
This will create a table in the database to hold our "all-clicks" web click data (ie: the raw data).
56
73
+
57
74
[source, sql]
58
75
----
@@ -69,7 +86,9 @@ CREATE TABLE IF NOT EXISTS click_data.all_clicks (
69
86
);
70
87
----
71
88
72
-
. Then copy/paste the following CQL statement (again) into the terminal and hit “Enter”. This will create a second table in the database that will hold our filtered product web clicks.
89
+
. Create a second table in the database.
90
+
Copy and paste the following CQL statement into the CQL console and press “Enter”.
91
+
This will create a second table in the database to hold our "product-clicks" web click data (ie: the filtered data).
) WITH CLUSTERING ORDER BY (product_name ASC, click_timestamp DESC)
157
+
AND additional_write_policy = '99PERCENTILE'
158
+
AND bloom_filter_fp_chance = 0.01
159
+
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
160
+
AND comment = ''
161
+
AND compaction = {'class': 'org.apache.cassandra.db.compaction.UnifiedCompactionStrategy'}
162
+
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
163
+
AND crc_check_chance = 1.0
164
+
AND default_time_to_live = 0
165
+
AND gc_grace_seconds = 864000
166
+
AND max_index_interval = 2048
167
+
AND memtable_flush_period_in_ms = 0
168
+
AND min_index_interval = 128
169
+
AND read_repair = 'BLOCKING'
170
+
AND speculative_retry = '99PERCENTILE';
171
+
----
172
+
--
173
+
====
174
+
175
+
The output displays three “create” CQL statements for the “click_data” keyspace, the `click_data.all_clicks` table, and the `click_data.product_clicks` table.
WARNING: You are going to need the token again while creating a second sink. Either paste it in notepad (or some temp safe place) or keep the browser tab open.
228
+
WARNING: You will need the token again when creating a second sink.
229
+
Either paste it in notepad (or some temp safe place) or keep the browser tab open.
146
230
147
-
. Click “Create” to create the sink. You will be directed back to the Sinks listing where your new sink should be initializing. Once it’s ready the status will automatically change to “Running”.
231
+
. Click “Create” to create the sink.
232
+
You will be directed back to the Sinks listing where your new sink is initializing.
233
+
When your new sink is ready, its status will change to “Running”.
. If everything goes smooth you should have 2 sinks “Running”.
269
+
. If everything goes smoothly, you should have 2 sinks in a “Running” state.
184
270
+
185
271
image:decodable-data-pipeline/01/image9.png[]
186
272
+
187
273
[NOTE]
188
274
====
189
-
To debug you can click the sink name, scroll to the bottom terminal output area to view deployment logs. This is a semi-verbose log of the sink starting, validating, and running.
275
+
To debug, click the sink name and scroll to the bottom of the sink's page, where there is a terminal output area to view deployment logs.
276
+
This is a semi-verbose log of the sink starting, validating, and running.
190
277
====
191
278
192
279
== Next step
193
280
194
-
With the Astra objects in place, now it's time to get the Decodable processing set up. xref:real-time-data-pipeline/03-put-it-all-together.adoc[Setup Decodable >>]
281
+
Great work! With the Astra objects in place, let's move on to setting up the Decodable processing. xref:real-time-data-pipeline/03-put-it-all-together.adoc[Setup Decodable >>]
This guide is part of a series of guides that creates a real-time data pipeline. Read more about the series xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
9
+
This guide is part of a series that creates a real-time data pipeline with Astra and Decodeable. For context and prerequisites, start xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
This guide is part of a series of guides that creates a real-time data pipeline. Read more about the series xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
9
+
This guide is part of a series that creates a real-time data pipeline with Astra and Decodeable. For context and prerequisites, start xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
10
10
11
11
Now we have all the pieces of our data processing pipeline in place. It’s time to start the connection and pipelines up and input some test data.
This guide is part of a series that creates a real-time data pipeline with Astra and Decodeable. For context and prerequisites, start xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
10
+
9
11
== Debugging the pipeline
10
12
11
13
Maybe things didn’t go so smoothly for you. Or maybe you're just having a bad day :(. Whatever the case use the flow above to find where in the pipeline things are broken. Then get a bit deeper into that object. You’ll want to test input data at the point of failure. The first question to answer is if the input data is malformed or if the object itself is erring. Decodable’s UI gives you the ability in each pipeline to “Preview” the processing. This is a very powerful debugging tool. Your Astra Tenant has the “Try Me” area where you can set up producing and consuming messages in specific topics. These tools can help you recreate each step of the pipeline and debug issues.
0 commit comments