01-cleanup-complete

mendonk · mendonk · commit 72bd0fb62d51 · 2023-02-01T14:40:38.000-05:00
diff --git a/modules/use-cases-architectures/pages/real-time-data-pipeline/01-create-astra-objects.adoc b/modules/use-cases-architectures/pages/real-time-data-pipeline/01-create-astra-objects.adoc
@@ -6,53 +6,70 @@ David Dieruf <david.dieruf@datastax.com>
 :title:
 :navtitle:
 
-This guide is part of a series of guides that creates a real-time data pipeline. Read more about the series xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
+This guide is part of a series that creates a real-time data pipeline with Astra and Decodeable. For context and prerequisites, start xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
 
 == Creating message topics to capture the stream of click data
 
-. Navigate to your Astra portal home and choose "Create a stream"
+. Navigate to your Astra portal home and click "Create a Stream".
 +
 image:decodable-data-pipeline/01/image4.png[]
 
-. Name the new tenant “webstore-clicks”. You can choose any cloud provider and region. Click “Create Tenant”.
+. Name the new streaming tenant “webstore-clicks".
+Choose any cloud provider and region.
+Click “Create Tenant”.
 +
 image:decodable-data-pipeline/01/image6.png[]
 
-. You will be redirected to your new tenant’s quickstart. Navigate to the “Namespace and Topics” tab at the top.
+. You will be redirected to your new tenant’s quickstart. Navigate to the “Namespace and Topics” tab at the top of the screen.
 +
 image:decodable-data-pipeline/01/image16.png[]
 
-. Create a new namespace with the name “production”. We are treating namespaces as logical development environments to illustrate how you could create a continuous delivery flow. You could also have namespaces for “development” and “staging”.
+. Create a new namespace with the name “production”.
+We are treating namespaces as logical development environments to illustrate how you could create a continuous delivery flow.
+You could also have namespaces for “development” and “staging”.
 +
 image:decodable-data-pipeline/01/image11.png[]
 
-. Your namespaces view should refresh with the new namespace. Click the “Add Topic” button associated with that namespace. Name it “all-clicks” and leave it as “Persistent”. Click the “Add Topic” button to create the topic.
+. The namespaces view will refresh to display your new "production" namespace.
+Click the “Add Topic” button associated with the "production" namespace.
+Name your new topic “all-clicks” and leave it as a “Persistent” topic.
+Click the “Add Topic” button to finish creating the topic.
 +
 image:decodable-data-pipeline/01/image15.png[]
 
-. Click the “Add Topic” button (again) associated with that namespace. Name it “product-clicks” and leave as “Persistent”. Click the “Add Topic” button to create the topic.
+. Create a second new topic.
+Click the “Add Topic” button associated with the "production" namespace.
+Name your second new topic “product-clicks” and leave it as a “Persistent” topic.
+Click the “Add Topic” button to finish creating the topic.
 +
 image:decodable-data-pipeline/01/image8.png[]
 
-. You should have 2 namespaces. The “Production” namespace should have 2 topics.
+. You should have 2 namespaces.
+The “production” namespace should contain the "all-clicks" and "product-clicks" topics you created.
+The "default" namespace is automatically created by Pulsar within each new streaming tenant.
 +
 image:decodable-data-pipeline/01/image13.png[]
 
 == Storing the stream of click data
 
-. From the Astra portal home click “Create a Database”.
+. From the Astra portal home, click “Create a Database”.
 +
 image:decodable-data-pipeline/01/image18.png[]
 
-. Name the database “webstore-clicks” and the keyspace “click_data”. Choose any cloud provider and region. Click “Create Database”.
+. Name the database “webstore-clicks” and the keyspace “click_data”.
+Choose any cloud provider and region.
+Click “Create Database”.
 +
 image:decodable-data-pipeline/01/image5.png[]
 
-. The page will refresh with your new token details. Don’t worry about saving them, we will come back to retrieve this later. You can “Esc” or just head back to home. In the “Recent Resources” area of your Astra portal home you should see 2 new items.
+. The page will refresh with your new token details.
+Don’t worry about saving the tokens - we will retrieve these later.
+You can “Esc” or just return to your Astra portal home, where you will see your new streaming tenant and database.
 +
 image:decodable-data-pipeline/01/image1.png[]
 
-. Copy/paste the following CQL statement into the terminal and hit “Enter”. This will create a table in the database that will hold our all web click data (ie: the raw data).
+. Copy and paste the following CQL statement into the CQL console and press “Enter”.
+This will create a table in the database to hold our "all-clicks" web click data (ie: the raw data).
 +
 [source, sql]
 ----
@@ -69,7 +86,9 @@ CREATE TABLE IF NOT EXISTS click_data.all_clicks (
 );
 ----
 
-. Then copy/paste the following CQL statement (again) into the terminal and hit “Enter”. This will create a second table in the database that will hold our filtered product web clicks.
+. Create a second table in the database.
+Copy and paste the following CQL statement into the CQL console and press “Enter”.
+This will create a second table in the database to hold our "product-clicks" web click data (ie: the filtered data).
 +
 [source, sql]
 ----
@@ -79,17 +98,81 @@ CREATE TABLE click_data.product_clicks (
     click_timestamp timestamp,
     PRIMARY KEY ((catalog_area_name), product_name, click_timestamp)
 ) WITH CLUSTERING ORDER BY (product_name ASC, click_timestamp DESC);
-
 ----
 
 . You can confirm everything was created correctly by describing the keyspace in the CQL terminal.
 +
-[source, sql]
+[tabs]
+====
+CQL::
++
+--
+[source,sql,subs="attributes+"]
 ----
 describe click_data;
 ----
+--
+
+Result::
 +
-The output will be 3 “create” CQL statements for the keyspace, the click_data.all_clicks table, and the click_data.product_clicks table
+--
+[source,sql,subs="attributes+"]
+----
+token@cqlsh> describe click_data;
+
+CREATE KEYSPACE click_data WITH replication = {'class': 'NetworkTopologyStrategy', 'us-east-1': '3'}  AND durable_writes = true;
+
+CREATE TABLE click_data.all_clicks (
+    operating_system text,
+    browser_type text,
+    url_host text,
+    url_path text,
+    click_timestamp bigint,
+    url_protocol text,
+    url_query text,
+    visitor_id uuid,
+    PRIMARY KEY ((operating_system, browser_type, url_host, url_path), click_timestamp)
+) WITH CLUSTERING ORDER BY (click_timestamp ASC)
+    AND additional_write_policy = '99PERCENTILE'
+    AND bloom_filter_fp_chance = 0.01
+    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
+    AND comment = ''
+    AND compaction = {'class': 'org.apache.cassandra.db.compaction.UnifiedCompactionStrategy'}
+    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
+    AND crc_check_chance = 1.0
+    AND default_time_to_live = 0
+    AND gc_grace_seconds = 864000
+    AND max_index_interval = 2048
+    AND memtable_flush_period_in_ms = 0
+    AND min_index_interval = 128
+    AND read_repair = 'BLOCKING'
+    AND speculative_retry = '99PERCENTILE';
+
+CREATE TABLE click_data.product_clicks (
+    catalog_area_name text,
+    product_name text,
+    click_timestamp timestamp,
+    PRIMARY KEY (catalog_area_name, product_name, click_timestamp)
+) WITH CLUSTERING ORDER BY (product_name ASC, click_timestamp DESC)
+    AND additional_write_policy = '99PERCENTILE'
+    AND bloom_filter_fp_chance = 0.01
+    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
+    AND comment = ''
+    AND compaction = {'class': 'org.apache.cassandra.db.compaction.UnifiedCompactionStrategy'}
+    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
+    AND crc_check_chance = 1.0
+    AND default_time_to_live = 0
+    AND gc_grace_seconds = 864000
+    AND max_index_interval = 2048
+    AND memtable_flush_period_in_ms = 0
+    AND min_index_interval = 128
+    AND read_repair = 'BLOCKING'
+    AND speculative_retry = '99PERCENTILE';
+----
+--
+====
+
+The output displays three “create” CQL statements for the “click_data” keyspace, the `click_data.all_clicks` table, and the `click_data.product_clicks` table.
 
 == Connecting the topics to the store
 
@@ -142,9 +225,12 @@ image:decodable-data-pipeline/01/image17.png[]
 |===
 
 +
-WARNING: You are going to need the token again while creating a second sink. Either paste it in notepad (or some temp safe place) or keep the browser tab open.
+WARNING: You will need the token again when creating a second sink.
+Either paste it in notepad (or some temp safe place) or keep the browser tab open.
 
-. Click “Create” to create the sink. You will be directed back to the Sinks listing where your new sink should be initializing. Once it’s ready the status will automatically change to “Running”.
+. Click “Create” to create the sink.
+You will be directed back to the Sinks listing where your new sink is initializing.
+When your new sink is ready, its status will change to “Running”.
 +
 image:decodable-data-pipeline/01/image14.png[]
 
@@ -180,15 +266,16 @@ image:decodable-data-pipeline/01/image14.png[]
 |(leave alone)
 |===
 
-. If everything goes smooth you should have 2 sinks “Running”.
+. If everything goes smoothly, you should have 2 sinks in a “Running” state.
 +
 image:decodable-data-pipeline/01/image9.png[]
 +
 [NOTE]
 ====
-To debug you can click the sink name, scroll to the bottom terminal output area to view deployment logs. This is a semi-verbose log of the sink starting, validating, and running.
+To debug, click the sink name and scroll to the bottom of the sink's page, where there is a terminal output area to view deployment logs.
+This is a semi-verbose log of the sink starting, validating, and running.
 ====
 
 == Next step
 
-With the Astra objects in place, now it's time to get the Decodable processing set up. xref:real-time-data-pipeline/03-put-it-all-together.adoc[Setup Decodable >>]
+Great work! With the Astra objects in place, let's move on to setting up the Decodable processing. xref:real-time-data-pipeline/03-put-it-all-together.adoc[Setup Decodable >>]
diff --git a/modules/use-cases-architectures/pages/real-time-data-pipeline/02-create-decodable-objects.adoc b/modules/use-cases-architectures/pages/real-time-data-pipeline/02-create-decodable-objects.adoc
@@ -6,7 +6,7 @@ David Dieruf <david.dieruf@datastax.com>
 :title:
 :navtitle:
 
-This guide is part of a series of guides that creates a real-time data pipeline. Read more about the series xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
+This guide is part of a series that creates a real-time data pipeline with Astra and Decodeable. For context and prerequisites, start xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
 
 == The Astra Streaming connection info
 
diff --git a/modules/use-cases-architectures/pages/real-time-data-pipeline/03-put-it-all-together.adoc b/modules/use-cases-architectures/pages/real-time-data-pipeline/03-put-it-all-together.adoc
@@ -6,7 +6,7 @@ David Dieruf <david.dieruf@datastax.com>
 :title:
 :navtitle:
 
-This guide is part of a series of guides that creates a real-time data pipeline. Read more about the series xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
+This guide is part of a series that creates a real-time data pipeline with Astra and Decodeable. For context and prerequisites, start xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
 
 Now we have all the pieces of our data processing pipeline in place. It’s time to start the connection and pipelines up and input some test data.
 
diff --git a/modules/use-cases-architectures/pages/real-time-data-pipeline/04-debugging-and-clean-up.adoc b/modules/use-cases-architectures/pages/real-time-data-pipeline/04-debugging-and-clean-up.adoc
@@ -6,6 +6,8 @@ David Dieruf <david.dieruf@datastax.com>
 :title:
 :navtitle:
 
+This guide is part of a series that creates a real-time data pipeline with Astra and Decodeable. For context and prerequisites, start xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here].
+
 == Debugging the pipeline
 
 Maybe things didn’t go so smoothly for you. Or maybe you're just having a bad day :(. Whatever the case use the flow above to find where in the pipeline things are broken. Then get a bit deeper into that object. You’ll want to test input data at the point of failure. The first question to answer is if the input data is malformed or if the object itself is erring. Decodable’s UI gives you the ability in each pipeline to “Preview” the processing. This is a very powerful debugging tool. Your Astra Tenant has the “Try Me” area where you can set up producing and consuming messages in specific topics. These tools can help you recreate each step of the pipeline and debug issues.