Skip to content

Commit 01bc6b0

Browse files
committed
Making sure links open in a new tab, link descriptions are descriptive
1 parent fa3cd65 commit 01bc6b0

File tree

4 files changed

+13
-13
lines changed

4 files changed

+13
-13
lines changed

src/connections/storage/data-lakes/comparison.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -91,8 +91,8 @@ If a bad data type is seen, such as text in place of a number or an incorrectly
9191

9292
Tables between Warehouses and Data Lakes will be the same, except for in these two cases:
9393

94-
- `tracks` - Warehouses provide one table per specific event (`track_button_clicked`) in addition to a summary table listing all `track` method calls. Data Lakes also creates one table per specific event, but does not provide a summary table. Learn more about the `tracks` table [here](/docs/connections/storage/warehouses/schema/).
95-
- `users` - Both Warehouses and Data Lakes create an `identifies` table (as seen [here](/docs/connections/storage/warehouses/schema/)), however Warehouses also create a `users` table just for user data. Data Lakes does not create this, since it does not support object data. The `users` table is a materialized view of users in a source, constructed by data inferred about users from the identify calls.
94+
- `tracks` - Warehouses provide one table per specific event (`track_button_clicked`) in addition to a summary table listing all `track` method calls. Data Lakes also creates one table per specific event, but does not provide a summary table. Learn more about the `tracks` table [in the Warehouses schema docs](/docs/connections/storage/warehouses/schema/).
95+
- `users` - Both Warehouses and Data Lakes create an `identifies` table (as seen [in the Warehouses schema docs](/docs/connections/storage/warehouses/schema/)), however Warehouses also create a `users` table just for user data. Data Lakes does not create this, since it does not support object data. The `users` table is a materialized view of users in a source, constructed by data inferred about users from the identify calls.
9696
- `accounts` - Group calls generate the `accounts` table in Warehouses. However because Data Lakes does not support object data (Groups are objects not events), there is no `accounts` table in Data Lakes.
9797
- *(Redshift only)* **Table names which begin with numbers** - Table names are not allowed to begin with numbers in the Redshift Warehouse, so they are automatically given an underscore ( _ ) prefix. Glue Data Catalog does not have this restriction, so Data Lakes don't assign this prefix. For example, in Redshift a table name may be named `_101_account_update`, however in Data Lakes it would be named `101_account_update`. While this nuance is specific to Redshift, other warehouses may show similar behavior for other reserved words.
9898

@@ -105,4 +105,4 @@ Similar to tables, columns between Warehouses and Data Lakes will be the same, e
105105
- `channel`, `metadata_*`, `project_id`, `type`, `version` - These columns are Segment internal data which are not found in Warehouses, but are found in Data Lakes. Warehouses is intentionally very detailed about it's transformation logic and does not include these. Data Lakes does include them due to its more straightforward approach to flatten the whole event.
106106
- (Redshift only) `uuid`, `uuid_ts` - Redshift customers will see columns for `uuid` and `uuid_ts`, which are used for de-duplication in Redshift; Other warehouses may have similar columns. These aren't relevant for Data Lakes so the columns won't appear there.
107107
- `sent_at` - Warehouses computes the `sent_at` value based on timestamps found in the original event in order to account for clock skews and timestamps in the future. This was done when the Segment pipeline didn't do this on it's own, however it now calculates for this so Data Lakes does not need to do any additional computation, and will send the value as-is when computed at ingestion.
108-
- `integrations` - Warehouses does not include the integrations object. Data Lakes flattens and includes the integrations object. You can read more about the `integrations` object [here](/docs/guides/filtering-data/#filtering-with-the-integrations-object).
108+
- `integrations` - Warehouses does not include the integrations object. Data Lakes flattens and includes the integrations object. You can read more about the `integrations` object [in the filtering data documentation](/docs/guides/filtering-data/#filtering-with-the-integrations-object).

src/connections/storage/data-lakes/data-lakes-manual-setup.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -272,7 +272,7 @@ When you update an EMR cluster to 5.33.0, you can participate in [AWS Lake Forma
272272

273273
## Procedure
274274
1. Open your Segment app workspace and select the Data Lakes destination.
275-
2. On the Settings tab, select the EMR Cluster ID field and replace the existing ID with the ID of your v5.33.0 EMR cluster. For help finding the cluster ID in AWS, see Amazon's [View cluster status and details](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-manage-view-clusters.html). You don't need to update the Glue Catalog ID, IAM Role ARN, or S3 Bucket name fields.
275+
2. On the Settings tab, select the EMR Cluster ID field and replace the existing ID with the ID of your v5.33.0 EMR cluster. For help finding the cluster ID in AWS, see Amazon's [View cluster status and details](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-manage-view-clusters.html){:target="_blank"}. You don't need to update the Glue Catalog ID, IAM Role ARN, or S3 Bucket name fields.
276276
3. Click **Save**.
277277
4. In the AWS EMR console, view the Events tab for your cluster to verify it is receiving data.
278278

src/connections/storage/data-lakes/index.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Segment Data Lakes sends Segment data to a cloud data store (for example AWS S3)
1010
> info ""
1111
> Segment Data Lakes is available to Business tier customers only.
1212
13-
To learn more, check out the blog post, [Introducing Segment Data Lakes](https://segment.com/blog/introducing-segment-data-lakes/){:target="_blank"}.
13+
To learn more, check out the blog post [Introducing Segment Data Lakes](https://segment.com/blog/introducing-segment-data-lakes/){:target="_blank"}.
1414

1515

1616
## How Segment Data Lakes work
@@ -38,7 +38,7 @@ When you use Data Lakes, you can either use Data Lakes as your _only_ source of
3838

3939
## Set up Segment Data Lakes
4040

41-
For detailed instructions on how to configure Segment Data Lakes, see the [Data Lakes catalog page](/docs/connections/storage/catalog/data-lakes/). Be sure to consider the EMR and AWS IAM components listed below."
41+
For detailed instructions on how to configure Segment Data Lakes, see the [Data Lakes catalog page](/docs/connections/storage/catalog/data-lakes/). Be sure to consider the EMR and AWS IAM components listed below.
4242

4343
### EMR
4444

@@ -85,7 +85,7 @@ By default, the date partition structure is `day=<YYYY-MM-DD>/hr=<HH>` to give y
8585

8686
Data Lakes stores the inferred schema and associated metadata of the S3 data in AWS Glue Data Catalog. This metadata includes the location of the S3 file, data converted into Parquet format, column names inferred from the Segment event, nested properties and traits which are now flattened, and the inferred data type.
8787

88-
![A screenshot of the AWS ios_prod_identify table, containing the schema for the table, information about the table, and the table version](images/dl_gluecatalog.png)
88+
![A screenshot of the AWS ios_prod_identify table, displaying the schema for the table, information about the table, and the table version](images/dl_gluecatalog.png)
8989
<!--
9090
TODO:
9191
add annotated glue image calling out different parts of inferred schema)
@@ -158,7 +158,7 @@ Data types and labels available in Protocols aren't supported by Data Lakes.
158158
{% endfaqitem %}
159159

160160
{% faqitem What is the cost to use AWS Glue? %}
161-
You can find details on Amazon's [pricing for Glue page](https://aws.amazon.com/glue/pricing/){:target="_blank"}. For reference, Data Lakes creates 1 table per event type in your source, and adds 1 partition per hour to the event table.
161+
You can find details on Amazon's [pricing for Glue](https://aws.amazon.com/glue/pricing/){:target="_blank"} page. For reference, Data Lakes creates 1 table per event type in your source, and adds 1 partition per hour to the event table.
162162
{% endfaqitem %}
163163

164164
{% faqitem What limits does AWS Glue have? %}
@@ -171,7 +171,7 @@ The most common limits to keep in mind are:
171171

172172
Segment stops creating new tables for the events after you exceed this limit. However you can contact your AWS account representative to increase these limits.
173173

174-
You should also read the [additional considerations](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-glue.html){:target="_blank"} when using AWS Glue Data Catalog.
174+
You should also read the [additional considerations in Amazon's documentation](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-glue.html){:target="_blank"} when using AWS Glue Data Catalog.
175175

176176
{% endfaqitem %}
177177

src/connections/storage/data-lakes/sync-reports.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ The table has the following columns in its schema:
1313

1414
| **Sync Metric** | **Description** |
1515
| ----------------- | ------------------- |
16-
| `workspace_id` | Distinct ID assigned to each Segment workspace and [found in the workspace settings](https://app.segment.com/goto-my-workspace/settings/basic). |
16+
| `workspace_id` | Distinct ID assigned to each Segment workspace and [found in the workspace settings](https://app.segment.com/goto-my-workspace/settings/basic){:target="_blank"}. |
1717
| `source_id` | Distinct ID assigned to each Segment source, found in the Source Settings > API Keys > Source ID. |
1818
| `database` | Name of the Glue Database used to store sync report tables. Segment automatically creates this database during the Data Lakes set up process. |
1919
| `emr_cluster_id` | ID of the EMR cluster which Data Lakes uses, found in the [Data Lakes Settings page](). |
@@ -223,7 +223,7 @@ WHERE source_id='9IP56Shn6' AND status='failed' AND date(day) >= (CURRENT_DATE -
223223
The following error types can cause your data lake syncs to fail:
224224
- **[Insufficient permissions](#insufficient-permissions)** - Segment does not have the permissions necessary to perform a critical operation. You must grant Segment additional permissions.
225225
- **[Invalid settings](#invalid-settings)** - The settings are invalid. This could be caused by a missing required field, or a validation check that fails. The invalid setting must be corrected before the sync can succeed.
226-
- **[Internal error](#internal-error)** - An error occurred in Segment's internal systems. This should resolve on its own. [Contact the Segment Support team](https://segment.com/help/contact/) if the sync failure persists.
226+
- **[Internal error](#internal-error)** - An error occurred in Segment's internal systems. This should resolve on its own. [Contact the Segment Support team](https://segment.com/help/contact/){:target="_blank"} if the sync failure persists.
227227

228228
### Insufficient permissions
229229

@@ -253,11 +253,11 @@ If you have invalid settings, you might see one of the error messages below:
253253
- "External ID is invalid. Please ensure the external ID in the IAM role used to connect to your Data Lake matches the source ID."
254254
- "External ID is not set. Please ensure that the IAM role used to connect to your Data Lake has the source ID in the list of external IDs."
255255

256-
The most common error occurs when you do not list all Source IDs in the External ID section of the IAM role. You can find your Source IDs in the Segment workspace, and you must add each one to the list of [External IDs](https://github.com/segmentio/terraform-aws-data-lake/tree/master/modules/iam#external_ids) in the IAM policy. You can either update the IAM policy from the AWS Console, or re-run the [Data Lakes set up Terraform job](https://github.com/segmentio/terraform-aws-data-lake).
256+
The most common error occurs when you do not list all Source IDs in the External ID section of the IAM role. You can find your Source IDs in the Segment workspace, and you must add each one to the list of [External IDs](https://github.com/segmentio/terraform-aws-data-lake/tree/master/modules/iam#external_ids){:target="_blank"} in the IAM policy. You can either update the IAM policy from the AWS Console, or re-run the [Data Lakes set up Terraform job](https://github.com/segmentio/terraform-aws-data-lake){:target="_blank"}.
257257

258258
### Internal error
259259

260-
Internal errors occur in Segment's internal systems, and should resolve on their own. If sync failures persist, [contact the Segment Support team](https://segment.com/help/contact/).
260+
Internal errors occur in Segment's internal systems, and should resolve on their own. If sync failures persist, [contact the Segment Support team](https://segment.com/help/contact/){:target="_blank"}.
261261

262262
## FAQ
263263

0 commit comments

Comments
 (0)