Skip to content

Commit 3267609

Browse files
authored
Self-hosting: 'do it all for me' and 'bring my own infrastructure' setup guidance (#574)
1 parent 5c58bf2 commit 3267609

File tree

3 files changed

+255
-12
lines changed

3 files changed

+255
-12
lines changed

self-hosted/aws/onboard.mdx

Lines changed: 91 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,10 @@ sidebarTitle: Onboarding
1212
</Note>
1313

1414
After your organization has signed the self-hosting agreement with Unstructured, a member of the Unstructured technical enablement team will reach out to you to begin the
15-
deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. To do this, you
16-
must first set up your AWS account as follows.
15+
deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. Choose one of the following setup options:
16+
17+
- [Do it all for me](#do-it-all-for-me): Have Unstructured set up the required infrastructure in your AWS account and then deploy the Unstructured UI and API into that newly created infrastructure.
18+
- [Bring my own infrastructure](#bring-my-own-infrastructure): Set up the required infrastructure yourself in your AWS account, and then have Unstructured deploy the Unstructured UI and API into your existing infrastructure.
1719

1820
## Questions? Need help?
1921

@@ -22,9 +24,94 @@ email Unstructured Sales at [[email protected]](mailto:[email protected]
2224
[contact form](https://unstructured.io/contact) on the Unstructured website, and a member of the Unstructured sales or technical enablement teams
2325
will get back to you as soon as possible.
2426

25-
## Onboarding checklist
27+
## Do it all for me
28+
29+
If you want Unstructured to set up the required infrastructure for you in your AWS account and then deploy the Unstructured UI and API into that newly created infrastructure, then provide your Unstructured sales representative or technical enablement contact with
30+
the access credentials for an IAM user or service principal in your AWS account that has the following required permissions.
31+
32+
### Core networking permissions
33+
34+
For VPC and subnet management:
35+
36+
- `ec2:CreateVpc`
37+
- `ec2:CreateSubnet`
38+
- `ec2:CreateRouteTable`
39+
- `ec2:CreateInternetGateway`
40+
- `ec2:CreateNatGateway`
41+
- `ec2:ModifyVpcAttribute` (for DNS settings)
42+
- `ec2:AssociateRouteTable`, `ec2:CreateRoute` (for public and private route tables)
43+
- `ec2:AllocateAddress` (for Elastic IP assignment to the NAT Gateway)
44+
45+
For security group rules:
46+
47+
- `ec2:AuthorizeSecurityGroupIngress/Egress` (to configure cluster and node security groups to allow VPC CIDR traffic)
48+
49+
### EKS permissions
50+
51+
For the cluster role:
52+
53+
- Attach the managed policies `AmazonEKSClusterPolicy` and `AmazonEKSVPCResourceController` to a role with `sts:AssumeRole` trust for `eks.amazonaws.com`
54+
55+
For the node group role:
56+
57+
Attach these managed policies:
58+
59+
- `AmazonEKSWorkerNodePolicy` (for node operations)
60+
- `AmazonEKS_CNI_Policy` (for networking)
61+
- `AmazonEC2ContainerRegistryReadOnly` (for ECR access)
62+
63+
For OIDC integration:
64+
65+
- `iam:CreateOpenIDConnectProvider` (to associate the EKS cluster with IAM OIDC)
66+
- `iam:CreateRole` + `iam:AttachRolePolicy` (for service accounts in the `recommender`, `etl-operator`, and `data-broker` namespaces)
67+
68+
### Storage and database
69+
70+
These permissions:
71+
72+
- `s3:CreateBucket`
73+
- `s3:PutBucketVersioning`
74+
- `s3:PutBucketEncryption`
75+
76+
For these S3 buckets:
77+
78+
- `u10d-*-etl-blob-cache`
79+
- `u10d-*-etl-job-db`
80+
- `u10d-*-etl-job-status`
81+
- `u10d-*-job-files`
82+
83+
For RDS:
84+
85+
- `rds:CreateDBInstance`
86+
- `rds:CreateDBSubnetGroup`
87+
- `rds:CreateDBSecurityGroup` + `ec2:AuthorizeSecurityGroupIngress` (to allow VPC CIDR access)
88+
89+
### Add-ons and utilities
90+
91+
For the EBS CSI Driver:
92+
93+
- `eks:CreateAddon` with IAM role attachment permissions for the `ebs.csi.aws.com` service account
94+
95+
For the SSH Key:
96+
97+
- `ec2:CreateKeyPair` + `ec2:ExportKeyPair` (for node group remote access)
98+
99+
### Cross-service requirements
100+
101+
- For IAM: `iam:PassRole` (to assign roles to EKS, RDS, and S3)
102+
- For KMS: `kms:CreateKey` (if using CMK for S3 and RDS encryption)
103+
- For CloudFormation: `cloudformation:*`
104+
105+
For least privilege, scope resource ARNs in policies (for example, restrict S3 bucket names with wildcards such as `u10d-*-etl*`).
106+
The EKS Pod Identity Agent requires `eks-auth:AssumeRoleForPodIdentity` permission on node roles when used with IRSA.
107+
108+
## Bring my own infrastructure
109+
110+
If you want to set up the required infrastructure yourself, set things up as follows within your AWS account for Unstructured to deploy the Unstructured UI and API into.
26111

27-
Set up the following infrastructure within your AWS account for Unstructured to deploy the Unstructured UI and API into.
112+
You must also provide your Unstructured sales representative or technical enablement contact with
113+
the access credentials for an IAM user or service principal in your AWS account that has access to the target Amazon Elastic Kubernetes Service (EKS) cluster to deploy the
114+
Unstructured UI and API into.
28115

29116
### VPC and networking
30117

self-hosted/azure/onboard.mdx

Lines changed: 65 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,10 @@ sidebarTitle: Onboarding
1212
</Note>
1313

1414
After your organization has signed the self-hosting agreement with Unstructured, a member of the Unstructured technical enablement team will reach out to you to begin the
15-
deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. To do this, you
16-
must first set up your Azure account as follows.
15+
deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. Choose one of the following setup options:
16+
17+
- [Do it all for me](#do-it-all-for-me): Have Unstructured set up the required infrastructure in your AWS account and then deploy the Unstructured UI and API into that newly created infrastructure.
18+
- [Bring my own infrastructure](#bring-my-own-infrastructure): Set up the required infrastructure yourself in your AWS account, and then have Unstructured deploy the Unstructured UI and API into your existing infrastructure.
1719

1820
## Questions? Need help?
1921

@@ -22,9 +24,68 @@ email Unstructured Sales at [[email protected]](mailto:[email protected]
2224
[contact form](https://unstructured.io/contact) on the Unstructured website, and a member of the Unstructured sales or technical enablement teams
2325
will get back to you as soon as possible.
2426

25-
## Onboarding checklist
27+
## Do it all for me
28+
29+
If you want Unstructured to set up the required infrastructure for you into your Azure account and then deploy the Unstructured UI and API into that newly created infrastrucrure, then provide your Unstructured sales representative or technical enablement contact with
30+
the access credentials for a Microsoft Entra ID user or service principal in your Azure account that has the following required permissions.
31+
32+
### Subscription and resource group
33+
34+
- `Microsoft.Resources/subscriptions/resourceGroups/write` (to create the resource group)
35+
- `Microsoft.Resources/subscriptions/resourceGroups/read` (to read the resource group)
36+
37+
### VNet and networking
38+
39+
- `Microsoft.Network/virtualNetworks/write` (to create the VNet)
40+
- `Microsoft.Network/virtualNetworks/read` (to read the VNet)
41+
- `Microsoft.Network/publicIPAddresses/write` (to create the public IPs)
42+
- `Microsoft.Network/publicIPAddresses/read` (to read the public IPs)
43+
- `Microsoft.Network/natGateways/write` (to create the NAT Gateway)
44+
- `Microsoft.Network/natGateways/read` (to read the NAT Gateway)
45+
- `Microsoft.Network/routeTables/write` (to create the route tables)
46+
- `Microsoft.Network/routeTables/read` (to read the route tables)
47+
- `Microsoft.Network/networkSecurityGroups/write` (to create the NSGs)
48+
- `Microsoft.Network/networkSecurityGroups/read` (to read the NSGs)
49+
50+
### AKS cluster
51+
52+
- `Microsoft.ContainerService/managedClusters/write` (to create the AKS cluster)
53+
- `Microsoft.ContainerService/managedClusters/read` (to read the AKS cluster)
54+
- `Microsoft.ContainerService/agentPools/write` (to create the node pools)
55+
- `Microsoft.ContainerService/agentPools/read` (to read the node pools)
56+
57+
### Managed identities and RBAC
58+
59+
- `Microsoft.ManagedIdentity/userAssignedIdentities/write` (to create the managed identities)
60+
- `Microsoft.ManagedIdentity/userAssignedIdentities/read` (to read managed identities)
61+
- Assign built-in roles such as:
62+
63+
- **Contributor** or scoped **Network Contributor** for the AKS cluster identity
64+
- **Monitoring Metrics Publisher**, **AcrPull**, and **Storage Blob Data Reader** for the node pool identity
65+
- **Storage Blob Data Contributor** for workload identities
66+
67+
### Kubernetes add-ons
68+
69+
Permissions depend on the Helm/YAML installation, but Azure RBAC integration requires `Microsoft.ContainerService/managedClusters/accessProfiles/*/read` (to access kubeconfig)
70+
71+
### Storage class
72+
73+
- `Microsoft.Storage/storageAccounts/write` (to create the storage account for CSI driver provisioning)
74+
- `Microsoft.Storage/storageAccounts/read`
75+
76+
### PostgreSQL database
77+
78+
- `Microsoft.DBforPostgreSQL/flexibleServers/write` (to create the PostgreSQL server)
79+
- `Microsoft.DBforPostgreSQL/flexibleServers/read`
80+
- NSG permissions for database access: allow traffic from the VNet CIDR
81+
82+
## Bring my own infrastructure
83+
84+
If you want to set up the required infrastructure yourself, set things up as follows within your Azure account for Unstructured to deploy the Unstructured UI and API into.
2685

27-
Set up the following infrastructure within your Azure account for Unstructured to deploy the Unstructured UI and API into.
86+
You must also provide your Unstructured sales representative or technical enablement contact with
87+
the access credentials for an IAM user or service principal in your AWS account that has access to the target Azure Kubernetes Service (AKS) cluster to deploy the
88+
Unstructured UI and API into.
2889

2990
### **Azure subscription and resource group**
3091

self-hosted/gcp/onboard.mdx

Lines changed: 99 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,10 @@ sidebarTitle: Onboarding
1212
</Note>
1313

1414
After your organization has signed the self-hosting agreement with Unstructured, a member of the Unstructured technical enablement team will reach out to you to begin the
15-
deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. To do this, you
16-
must first set up your GCP account as follows.
15+
deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. Choose one of the following setup options:
16+
17+
- [Do it all for me](#do-it-all-for-me): Have Unstructured set up the required infrastructure in your AWS account and then deploy the Unstructured UI and API into that newly created infrastructure.
18+
- [Bring my own infrastructure](#bring-my-own-infrastructure): Set up the required infrastructure yourself in your AWS account, and then have Unstructured deploy the Unstructured UI and API into your existing infrastructure.
1719

1820
## Questions? Need help?
1921

@@ -22,9 +24,102 @@ email Unstructured Sales at [[email protected]](mailto:[email protected]
2224
[contact form](https://unstructured.io/contact) on the Unstructured website, and a member of the Unstructured sales or technical enablement teams
2325
will get back to you as soon as possible.
2426

25-
## Onboarding checklist
27+
## Do it all for me
28+
29+
If you want Unstructured to set up the required infrastructure for you in your GCP account and then deploy the Unstructured UI and API into that newly created infrastructure, then provide your Unstructured sales representative or technical enablement contact with
30+
the access credentials for an IAM user or service account in your GCP account that has the following required permissions:
31+
32+
### Core networking permissions
33+
34+
VPC/subnet management:
35+
36+
- `compute.networks.create`
37+
- `compute.subnetworks.create`
38+
- `compute.routers.create` (for Cloud NAT)
39+
- `compute.addresses.create` (for NAT IPs)
40+
- `compute.firewalls.create` (for intra-cluster traffic rules)
41+
42+
Shared VPC (if used):
43+
44+
- `compute.organizations.admin` (for the host project)
45+
- `compute.networks.use` (for the service project)
46+
47+
### GKE cluster permissions
48+
49+
Control plane:
50+
51+
- `container.clusters.create`
52+
- `container.clusters.update` (for private cluster settings)
53+
- `compute.networks.useExternalIp` (for public endpoint access)
54+
55+
Node pools:
56+
57+
- `compute.instances.create`
58+
- `compute.disks.create` (for node disks)
59+
- `compute.instanceGroups.create` (for autoscaling)
60+
61+
IAM roles:
62+
63+
- For the GKE cluster SA service account: `roles/container.hostServiceAgentUser`
64+
- For the node SA service account: `roles/container.nodeServiceAccount`
65+
- For the workload identity service account: `roles/iam.workloadIdentityUser`
66+
67+
### Storage and database
68+
69+
GCS buckets:
70+
71+
- `storage.buckets.create`
72+
- `storage.objects.create` (for versioning)
73+
- `storage.buckets.update` (for encryption/lifecycle rules)
74+
75+
Cloud SQL:
76+
77+
- `cloudsql.instances.create`
78+
- `cloudsql.instances.connect` (for private IPs)
79+
- `vpcaccess.connectors.use` (if using Serverless VPC Access)
80+
81+
Persistent disks (CSI):
82+
83+
- `compute.disks.create` (for `pd.csi.storage.gke.io`)
84+
- `compute.subnetworks.use` (for regional disks)
85+
86+
### Advanced configurations
87+
88+
Workload identity:
89+
90+
- `iam.serviceAccounts.getAccessToken` (for federated access)
91+
- `iam.serviceAccounts.setIamPolicy` (to bind Kubernetes SAs to GCP SAs)
92+
93+
Cloud NAT:
94+
95+
- `compute.routers.update` (for NAT configuration)
96+
- `compute.addresses.use` (for NAT IP allocation)
97+
98+
OS login/SSH:
99+
100+
- `compute.projects.setCommonInstanceMetadata` (for SSH key upload)
101+
- `compute.instances.osAdminLogin`
102+
103+
### Minimum required roles
104+
105+
Project level:
106+
107+
- `roles/editor` (broad access, or scope with custom roles)
108+
109+
Scoped roles:
110+
111+
- `roles/compute.networkAdmin` (for VPC and subnets)
112+
- `roles/container.admin` (for GKE)
113+
- `roles/storage.admin` (for GCS)
114+
- `roles/cloudsql.admin` (for Postgres)
115+
116+
## Bring my own infrastructure
117+
118+
If you want to set up the required infrastructure yourself, set things up as follows within your GCP account for Unstructured to deploy the Unstructured UI and API into.
26119

27-
Set up the following infrastructure within your GCP account for Unstructured to deploy the Unstructured UI and API into.
120+
You must also provide your Unstructured sales representative or technical enablement contact with
121+
the access credentials for an IAM user or service account in your GCP account that has access to the target Google Kubernetes Engine (GKE) cluster to deploy the
122+
Unstructured UI and API into.
28123

29124
### **VPC and networking (GCP equivalent)**
30125

0 commit comments

Comments
 (0)