|
| 1 | +--- |
| 2 | +title: AWS Lake Formation |
| 3 | +--- |
| 4 | +AWS Lake Formation is a fully managed service built on top of the AWS Glue Data Catalog that provides one central set of tools to securely build and manage a Data Lake. The tools fall into one of two categories: setup and data management and security management. Setup and data management tools help import, catalog, transform, and deduplicate data, and optimize your storage and security. Security management tools help you to define and enforce encryption and access controls and implement audit logging. |
| 5 | + |
| 6 | +> note "Learn more about AWS Lake Formation features" |
| 7 | +> To learn more about AWS Lake Formation features, refer to the [Amazon Web Services documentation](https://aws.amazon.com/lake-formation/features/). |
| 8 | +
|
| 9 | +<!---add description of how the security works, because the secure aspect is a big selling point--> |
| 10 | + |
| 11 | +## Configuring Lake Formation |
| 12 | +You can configure Lake Formation using the [`IAMAllowedPrincipals` group](#configuring-lake-formation-using-the-iamallowedprincipals-group) or by [using IAM policies for access control](#configuring-lake-formation-using-iam-policies). With the `IAMAllowedPrincipals` group, |
| 13 | +<!--add use case explanation, finish sentence here--> |
| 14 | + |
| 15 | +> info "Permissions required to configure Data Lakes" |
| 16 | +> To configure Lake Formation, you must be logged in to AWS with data lake administrator or a database creator permissions. |
| 17 | +
|
| 18 | +### Configuring Lake Formation using the IAMAllowedPrincipals group |
| 19 | + |
| 20 | +#### Existing databases |
| 21 | +1. Open the [AWS Lake Formation service](https://console.aws.amazon.com/lakeformation/). |
| 22 | +2. Under **Data catalog**, select the settings tab. Ensure the check boxes under the **Default permissions for newly created databases and tables** are not checked. |
| 23 | +3. Under **Permissions**, select the **Admins and database creators** section and give your EMR instance profile role (`EMR_EC2-DEFAULT` if you created your EMR cluster manually, or `segment_emr_instance_profile` if you set it up using Terraform) to the **Database creators** section. |
| 24 | + |
| 25 | +#### New databases |
| 26 | +1. Open the [AWS Lake Formation service](https://console.aws.amazon.com/lakeformation/). |
| 27 | +2. Under **Data catalog**, select the settings tab. Ensure the check boxes under the **Default permissions for newly created databases and tables** are not checked. |
| 28 | +3. Select the Databases tab. Click the **Create database** button, and create your database: |
| 29 | + 1. Select the **Database** button. |
| 30 | + 2. Name your database. |
| 31 | + 3. Set the location to `s3://$datalake_bucket/segment-data/`. <br/> **Optional:** Add a description to your database. |
| 32 | + 4. Select the `Use only IAM access control for new tables in this database`. |
| 33 | + 5. Click **Create database**. |
| 34 | +4. |
| 35 | +<!---asked Udit where the next step lives for the new databases section: doc isn't super clear?--> |
| 36 | + |
| 37 | +### Configuring Lake Formation using IAM policies |
| 38 | + |
| 39 | +#### Existing databases |
| 40 | +1. Open the [AWS Lake Formation service](https://console.aws.amazon.com/lakeformation/). |
| 41 | + |
| 42 | +#### New databases |
| 43 | +1. Open the [AWS Lake Formation service](https://console.aws.amazon.com/lakeformation/). |
| 44 | +2. Under **Data catalog**, select the settings tab. Ensure the check boxes under the **Default permissions for newly created databases and tables** are not checked. |
| 45 | +3. Select the Databases tab. Click the **Create database** button, and create your database: |
| 46 | + 1. Select the **Database** button. |
| 47 | + 2. Name your database. |
| 48 | + 3. Set the location to `s3://$datalake_bucket/segment-data/`. <br/> **Optional:** Add a description to your database. |
| 49 | + 4. Click **Create database**. |
| 50 | +4. |
| 51 | +<!---same as note above: not sure where next step lives for either new/existing databases--> |
0 commit comments