Skip to content

Commit fad7bd8

Browse files
committed
Updating FAQ formatting [netlify-build]
1 parent 5961b34 commit fad7bd8

File tree

4 files changed

+48
-83
lines changed

4 files changed

+48
-83
lines changed

src/connections/storage/catalog/data-lakes/index.md

Lines changed: 19 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -384,16 +384,14 @@ Running the `plan` command gives you an output that creates 19 new objects, unle
384384

385385
### Segment Data Lakes
386386

387-
{% faq %}
388-
{% faqitem Do I need to create Glue databases? %}
387+
388+
#### Do I need to create Glue databases?
389389
No, Data Lakes automatically creates one Glue database per source. This database uses the source slug as its name.
390-
{% endfaqitem %}
391390

392-
{% faqitem What IAM role do I use in the Settings page? %}
391+
#### What IAM role do I use in the Settings page?
393392
Four roles are created when you set up Data Lakes using Terraform. You add the `arn:aws:iam::$ACCOUNT_ID:role/segment-data-lake-iam-role` role to the Data Lakes Settings page in the Segment web app.
394-
{% endfaqitem %}
395393

396-
{% faqitem What level of access do the AWS roles have? %}
394+
#### What level of access do the AWS roles have?
397395
The roles which Data Lakes assigns during set up are:
398396

399397
- **`segment-datalake-iam-role`** - This is the role that Segment assumes to access S3, Glue and the EMR cluster. It allows Segment access to:
@@ -408,54 +406,46 @@ The roles which Data Lakes assigns during set up are:
408406
- Access only to the specific S3 bucket used for Data Lakes.
409407

410408
- **`segment_emr_autoscaling_role`** - Restricted role that can only be assumed by EMR and EC2. This is set up based on [AWS best practices](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-iam-role-automatic-scaling.html).
411-
{% endfaqitem %}
412409

413-
{% faqitem Why doesn't the Data Lakes Terraform module create an S3 bucket? %}
410+
411+
#### Why doesn't the Data Lakes Terraform module create an S3 bucket?
414412
The module doesn't create a new S3 bucket so you can re-use an existing bucket for your Data Lakes.
415-
{% endfaqitem %}
416413

417-
{% faqitem Does my S3 bucket need to be in the same region as the other infrastructure? %}
414+
#### Does my S3 bucket need to be in the same region as the other infrastructure?
418415
Yes, the S3 bucket and the EMR cluster must be in the same region.
419-
{% endfaqitem %}
420416

421-
{% faqitem How do I connect a new source to Data Lakes? %}
417+
#### How do I connect a new source to Data Lakes?
422418
To connect a new source to Data Lakes:
423419

424420
1. Ensure that the `workspace_id` of the Segment workspace is in the list of [external ids](https://github.com/segmentio/terraform-aws-data-lake/tree/master/modules/iam#external_ids) in the IAM policy. You can either update this from the AWS console, or re-run the [Terraform](https://github.com/segmentio/terraform-aws-data-lake) job.
425421
2. From your Segment workspace, connect the source to the Data Lakes destination.
426-
{% endfaqitem %}
427422

428-
{% faqitem Can I configure multiple sources to use the same EMR cluster? %}
423+
#### Can I configure multiple sources to use the same EMR cluster?
429424
Yes, you can configure multiple sources to use the same EMR cluster. Segment recommends that the EMR cluster only be used for Data Lakes to ensure there aren't interruptions from non-Data Lakes job.
430-
{% endfaqitem %}
431425

432-
{% faqitem Why don't I see any data in S3 or Glue after enabling a source? %}
426+
#### Why don't I see any data in S3 or Glue after enabling a source?
433427
If you don't see data after enabling a source, check the following:
434428
- Does the IAM role have the Segment account ID and workspace ID as the external ID?
435429
- Is the EMR cluster running?
436430
- Is the correct IAM role and S3 bucket configured in the settings?
437431

438432
If all of these look correct and you're still not seeing any data, please [contact the Support team](https://segment.com/help/contact/).
439-
{% endfaqitem %}
440433

441-
{% faqitem What are "Segment Output" tables in S3? %}
434+
#### What are "Segment Output" tables in S3?
442435
The `output` tables are temporary tables Segment creates when loading data. They are deleted after each sync.
443-
{% endfaqitem %}
444436

445-
{% faqitem Can I make additional directories in the S3 bucket Data Lakes is using? %}
437+
#### Can I make additional directories in the S3 bucket Data Lakes is using?
446438
Yes, you can create new directories in S3 without interfering with Segment data.
447439
Do not modify, or create additional directories with the following names:
448440
- `logs/`
449441
- `segment-stage/`
450442
- `segment-data/`
451443
- `segment-logs/`
452-
{% endfaqitem %}
453444

454-
{% faqitem What does "partitioned" mean in the table name? %}
445+
#### What does "partitioned" mean in the table name?
455446
`Partitioned` just means that the table has partition columns (day and hour). All tables are partitioned, so you should see this on all table names.
456-
{% endfaqitem %}
457447

458-
{% faqitem How can I use AWS Spectrum to access Data Lakes tables in Glue, and join it with Redshift data? %}
448+
#### How can I use AWS Spectrum to access Data Lakes tables in Glue, and join it with Redshift data?
459449
You can use the following command to create external tables in Spectrum to access tables in Glue and join the data with Redshift:
460450

461451
Run the `CREATE EXTERNAL SCHEMA` command:
@@ -471,35 +461,25 @@ create external database if not exists;
471461
Replace:
472462
- [glue_db_name] = The Glue database created by Data Lakes which is named after the source slug
473463
- [spectrum_schema_name] = The schema name in Redshift you want to map to
474-
{% endfaqitem %}
475-
{% endfaq %}
476464

477465
### Azure Data Lakes
478466

479-
{% faq %}
480-
481-
{% faqitem Does my ALDS-enabled storage account need to be in the same region as the other infrastructure? %}
467+
#### Does my ALDS-enabled storage account need to be in the same region as the other infrastructure?
482468
Yes, your storage account and Databricks instance should be in the same region.
483-
{% endfaqitem %}
484469

485-
{% faqitem What analytics tools are available to use with my Azure Data Lake? %}
470+
#### What analytics tools are available to use with my Azure Data Lake?
486471
Azure Data Lakes supports the following post-processing tools:
487472
- PowerBI
488473
- Azure HDInsight
489474
- Azure Synapse Analytics
490475
- Databricks
491-
{% endfaqitem %}
492476

493-
{% faqitem What can I do to troubleshoot my Databricks database? %}
477+
#### What can I do to troubleshoot my Databricks database?
494478
If you encounter errors related to your Databricks database, try adding the following line to the config: <br/>
495479
```py
496480
spark.sql.hive.metastore.schema.verification.record.version false
497481
```
498482
<br/>After you've added to your config, restart your cluster so that your changes can take effect. If you continue to encounter errors, [contact Segment Support](https://segment.com/help/contact/){:target="_blank"}.
499-
{% endfaqitem %}
500-
501-
{% faqitem What do I do if I get a "Version table does not exist" error when setting up the Azure MySQL database? %}
502-
Check your Spark configs to ensure that the information you entered about the database is correct, then restart the cluster. The Databricks cluster automatically initializes the Hive Metastore, so an issue with your config file will stop the table from being created. If you continue to encounter errors, [contact Segment Support](https://segment.com/help/contact/){:target="_blank"}.
503-
{% endfaqitem %}
504483

505-
{% endfaq %}
484+
#### What do I do if I get a "Version table does not exist" error when setting up the Azure MySQL database?
485+
Check your Spark configs to ensure that the information you entered about the database is correct, then restart the cluster. The Databricks cluster automatically initializes the Hive Metastore, so an issue with your config file will stop the table from being created. If you continue to encounter errors, [contact Segment Support](https://segment.com/help/contact/){:target="_blank"}.

src/connections/storage/data-lakes/index.md

Lines changed: 18 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ Segment Data Lakes sends Segment data to a cloud data store, either AWS S3 or Az
2222
2323
To learn more about Segment Data Lakes, check out the Segment blog post [Introducing Segment Data Lakes](https://segment.com/blog/introducing-segment-data-lakes/){:target="_blank"}.
2424

25-
## How Segment Data Lakes work
25+
## How Data Lakes work
2626

2727
Segment currently supports Data Lakes hosted on two cloud providers: Amazon Web Services (AWS) and Microsoft Azure. Each cloud provider has a similar system for managing data, but offer different query engines, post-processing systems, and analytics options.
2828

@@ -170,28 +170,27 @@ The Data Lakes and Warehouses products are compatible using a mapping, but do no
170170
When you use Data Lakes, you can either use Data Lakes as your _only_ source of data and query all of your data directly from S3 or ADLS or you can use Data Lakes in addition to a data warehouse.
171171

172172
## FAQ
173-
{% faq %}
174173

175-
{% faqitem What AWS Data Lake features are not supported in the Azure Data Lakes public beta? %}
174+
### What AWS Data Lake features are not supported in the Azure Data Lakes public beta?
176175
The following capabilities are supported by Segment Data Lakes but not by the Azure Data Lakes public beta:
177176
- EU region support
178177
- Deduplication
179178
- Sync History and Sync Health in Segment app
180-
{% endfaqitem %}
181179

182-
{% faqitem Can I send all of my Segment data into Data Lakes? %}
180+
181+
#### Can I send all of my Segment data into Data Lakes?
183182
Data Lakes supports data from all event sources, including website libraries, mobile, server and event cloud sources. Data Lakes doesn't support loading [object cloud source data](/docs/connections/sources/#object-cloud-sources), as well as the users and accounts tables from event cloud sources.
184-
{% endfaqitem %}
185183

186-
{% faqitem Are user deletions and suppression supported? %}
184+
185+
### Are user deletions and suppression supported?
187186
Segment doesn't support User deletions in Data Lakes, but supports [user suppression](/docs/privacy/user-deletion-and-suppression/#suppressed-users).
188-
{% endfaqitem %}
189187

190-
{% faqitem How does Data Lakes handle schema evolution? %}
188+
189+
### How does Data Lakes handle schema evolution?
191190
As the data schema evolves and new columns are added, Segment Data Lakes will detect any new columns. New columns will be appended to the end of the table in the Glue Data Catalog.
192-
{% endfaqitem %}
193191

194-
{% faqitem How does Data Lakes work with Protocols? %}
192+
193+
### How does Data Lakes work with Protocols?
195194
Data Lakes doesn't have a direct integration with [Protocols](/docs/protocols/).
196195

197196
Any changes to events at the source level made with Protocols also change the data for all downstream destinations, including Data Lakes.
@@ -204,21 +203,20 @@ Data types and labels available in Protocols aren't supported by Data Lakes.
204203

205204
- **Data Types** - Data Lakes infers the data type for each event using its own schema inference systems instead of using a data type set for an event in Protocols. This might lead to the data type set in a data lake being different from the data type in the tracking plan. For example, if you set `product_id` to be an integer in the Protocols Tracking Plan, but the event is sent into Segment as a string, then Data Lakes may infer this data type as a string in the Glue Data Catalog.
206205
- **Labels** - Labels set in Protocols aren't sent to Data Lakes.
207-
{% endfaqitem %}
208206

209-
{% faqitem How frequently does my Data Lake sync? %}
207+
208+
### How frequently does my Data Lake sync?
210209
Data Lakes offers 12 syncs in a 24 hour period and doesn't offer a custom sync schedule or selective sync.
211-
{% endfaqitem %}
212210

213-
{% faqitem What is the cost to use AWS Glue? %}
211+
212+
### What is the cost to use AWS Glue?
214213
You can find details on Amazon's [pricing for Glue](https://aws.amazon.com/glue/pricing/){:target="_blank"} page. For reference, Data Lakes creates 1 table per event type in your source, and adds 1 partition per hour to the event table.
215-
{% endfaqitem %}
216214

217-
{% faqitem What is the cost to use Microsoft Azure? %}
215+
### What is the cost to use Microsoft Azure?
218216
You can find details on Microsoft's [pricing for Azure](https://azure.microsoft.com/en-us/pricing/){:target="_blank"} page. For reference, Data Lakes creates 1 table per event type in your source, and adds 1 partition per hour to the event table.
219-
{% endfaqitem %}
220217

221-
{% faqitem What limits does AWS Glue have? %}
218+
219+
### What limits does AWS Glue have?
222220
AWS Glue has limits across various factors, such as number of databases per account, tables per account, and so on. See the [full list of Glue limits](https://docs.aws.amazon.com/general/latest/gr/glue.html#limits_glue){:target="_blank"} for more information.
223221

224222
The most common limits to keep in mind are:
@@ -230,14 +228,10 @@ Segment stops creating new tables for the events after you exceed this limit. Ho
230228

231229
You should also read the [additional considerations in Amazon's documentation](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-glue.html){:target="_blank"} when using AWS Glue Data Catalog.
232230

233-
{% endfaqitem %}
234-
235-
{% faqitem What analytics tools are available to use with my Azure Data Lake? %}
231+
### What analytics tools are available to use with my Azure Data Lake?
236232
Azure Data Lakes supports the following analytics tools:
237233
- PowerBI
238234
- Azure HDInsight
239235
- Azure Synapse Analytics
240236
- Databricks
241-
{% endfaqitem %}
242237

243-
{% endfaq %}

src/connections/storage/data-lakes/sync-history.md

Lines changed: 7 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -35,24 +35,18 @@ Above the Daily Row Volume table is an overview of the total syncs for the curre
3535
To access the Sync history page from the Segment app, open the **My Destinations** page and select the data lake. On the data lakes settings page, select the **Health** tab.
3636

3737
## Data Lakes Reports FAQ
38-
{% faq %}
39-
{% faqitem How long is a data point available? %}
38+
39+
### How long is a data point available?
4040
The health tab shows an aggregate view of the last 30 days worth of data, while the sync history retains the last 100 syncs.
41-
{% endfaqitem %}
4241

43-
{% faqitem How do sync history and health compare? %}
42+
### How do sync history and health compare?
4443
The sync history feature shows detailed information about the most recent 100 syncs to a data lake, while the health tab shows just the number of rows synced to the data lake over the last 30 days.
45-
{% endfaqitem %}
4644

47-
{% faqitem What timezone is the time and date information in? %}
45+
### What timezone is the time and date information in?
4846
All dates and times on the sync history and health pages are in the user's local time.
49-
{% endfaqitem %}
5047

51-
{% faqitem When does the data update? %}
48+
### When does the data update?
5249
The sync data for both reports updates in real time.
53-
{% endfaqitem %}
5450

55-
{% faqitem When do syncs occur? %}
56-
Syncs occur approximately every two hours. Users cannot choose how frequently the data lake syncs.
57-
{% endfaqitem %}
58-
{% endfaq %}
51+
### When do syncs occur?
52+
Syncs occur approximately every two hours. Users cannot choose how frequently the data lake syncs.

src/connections/storage/data-lakes/sync-reports.md

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -264,13 +264,10 @@ Internal errors occur in Segment's internal systems, and should resolve on their
264264

265265
## FAQ
266266

267-
{% faq %}
268-
{% faqitem How are Data Lakes sync reports different from the sync data for Segment Warehouses? %}
267+
### How are Data Lakes sync reports different from the sync data for Segment Warehouses?
269268
Both Warehouses and Data Lakes provide similar information about syncs, including the start and finish time, rows synced, and errors.
270269

271270
However, Warehouse sync information is only available in the Segment app: on the Sync History page and Warehouse Health pages. With Data Lakes sync reports, the raw sync information is sent directly to your data lake. This means you can query the raw data and answer your own questions about syncs, and use the data to power alerting and monitoring tools.
272-
{% endfaqitem %}
273-
{% faqitem What happens if a sync is partly successful? %}
274-
Sync reports are currently generated only when a sync completes, or when it fails. Partial failure reporting is not currently supported.
275-
{% endfaqitem %}
276-
{% endfaq %}
271+
272+
### What happens if a sync is partly successful?
273+
Sync reports are currently generated only when a sync completes, or when it fails. Partial failure reporting is not currently supported.

0 commit comments

Comments
 (0)