Skip to content
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@

## Overview

This project contains Ansible code that creates a baseline cluster in an existing Kubernetes environment for use with the SAS Viya platform, generates the manifest for a SAS Viya platform software order, and then deploys that order into the specified Kubernetes environment. Here is a list of tasks that this tool can perform:
This project contains Ansible code that creates a baseline cluster in an existing Kubernetes environment for use with the SAS Viya platform, generates the manifest for a SAS Viya platform software order, and then deploys that order into the specified Kubernetes environment. Here is a list of tasks that this tool can perform (also see [playbook overview](./playbooks/README.md) for info on the default tasks):

- Prepare Kubernetes cluster
- Deploy [ingress-nginx](https://kubernetes.github.io/ingress-nginx)
Expand All @@ -43,7 +43,9 @@ This project contains Ansible code that creates a baseline cluster in an existin
- Deploy [metrics-server](https://github.com/bitnami/charts/tree/master/bitnami/metrics-server/) (AWS only)
- Deploy [aws-ebs-csi-driver](https://github.com/kubernetes-sigs/aws-ebs-csi-driver) (AWS only)
- Manage storageClass settings


*NOTE*: See the list of [supported third-party components](./docs/third-party-components.md) for more information. For information on networking considerations for these and other components, see [networking considerations](./docs/user/NetworkingConsiderations.md).

- Deploy the SAS Viya Platform
- Retrieve the deployment assets using [SAS Viya Orders CLI](https://github.com/sassoftware/viya4-orders-cli)
- Retrieve cloud configuration from tfstate (if using a SAS Viya 4 IaC project)
Expand Down
2 changes: 1 addition & 1 deletion docker-entrypoint.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env bash

# Copyright © 2020-2024, SAS Institute Inc., Cary, NC, USA. All Rights Reserved.
# Copyright © 2020-2025, SAS Institute Inc., Cary, NC, USA. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0

set -e
Expand Down
143 changes: 143 additions & 0 deletions docs/user/NetworkingConsiderations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
## Baseline Components and Networking Considerations

### 1. **Ingress Controllers**

- **ingress-nginx** is deployed as the ingress controller.
- It exposes internal Kubernetes services to external clients, typically via AWS Network Load Balancer (NLB) or cloud-specific load balancers.
- It manages routing of HTTP/HTTPS traffic into the cluster, enforcing TLS, host/path rules, and sometimes source IP restrictions.
- The choice of ingress controller and its configuration (e.g., annotations, load balancer type) directly affects how external traffic enters your cluster.
- **Note:**
- Ensure that your firewall and security groups allow inbound traffic to the load balancer and that DNS is configured to point to the ingress endpoint.
- **Common endpoints that may need to be allowed:**
- Ingress controller endpoints (HTTP/HTTPS)
- Cloud load balancer endpoints (varies by provider)
- **Container registries that may need to be allowlisted:**
- quay.io

### 2. **Load Balancers**

- The baseline roles configure and deploy cloud-native load balancers (e.g., AWS NLB) via ingress controllers and service annotations.
- These load balancers provide public or private endpoints for accessing cluster services.
- Annotations are used to make/specify that the load balancer is either public or private.
- **Note:**
- Exposing services via public load balancers may have security implications. Restrict access as needed using security groups, firewall rules, or Kubernetes network policies.
- **Common endpoints that may need to be allowed:**
- Load balancer endpoints (public/private IPs, HTTP/HTTPS)
- Health check endpoints (varies by provider)

### 3. **Cluster Autoscaler**

- The **cluster-autoscaler** by scaling nodes up or down, it can impact the availability of network endpoints and the distribution of pods across subnets.
- **Note:**
- Ensure that your cloud provider IAM roles and API access allow autoscaler operations, and that new nodes can join the cluster network without manual intervention.
- **Common endpoints that may need to be allowed:**
- Cloud provider APIs (for scaling, usually outbound HTTPS)
- Kubernetes API server (internal cluster traffic)

### 4. **Metrics Server, Cert-Manager, and Storage CSI Drivers**

- **metrics-server** and **cert-manager** may require network access to the Kubernetes API and external endpoints (for certificate validation).
- **CSI drivers** (such as NFS, EFS, etc.) may require network connectivity to storage backends (e.g., EFS, NFS servers).
- **ebs-csi-driver** (AWS only) does not expose services externally, but requires network connectivity to AWS APIs and EBS endpoints for dynamic volume provisioning. This enables persistent storage for pods on AWS and may require outbound access to AWS services.
- **Note:**
- For NFS/EFS: Ensure that all cluster nodes have network access (NFS/TCP 2049) to the NFS or EFS server. Firewalls and security groups must allow this traffic.
- For AWS EBS: Nodes must have outbound access to AWS APIs and the correct IAM permissions.
- For cert-manager: If using ACME (Let's Encrypt), ensure outbound HTTPS access to the internet.
- **Common endpoints that may need to be allowed:**
- Kubernetes API server (internal cluster traffic)
- NFS/EFS storage (TCP 2049)
- AWS APIs (for EBS, outbound HTTPS)
- Certificate authorities (e.g., Let's Encrypt ACME, outbound HTTPS)
- **Container registries that may need to be allowlisted:**
- metrics-server: registry.k8s.io
- cert-manager: quay.io
- csi-driver-nfs: registry.k8s.io, gcr.io
- ebs-csi-driver: public.ecr.aws

### 5. **Namespace and Resource Management**

- The baseline roles create namespaces and manage resources, which can include network policies or service accounts that affect pod-to-pod communication and access to external resources.
- **Note:**
- If network policies are enabled, ensure that required inter-pod and pod-to-service communications are allowed. Review any default deny policies.
- **Common endpoints that may need to be allowed:**
- Pod-to-pod and pod-to-service communication (internal cluster traffic)
- External services as required by workloads

### 6. **Jump Server (Bastion Host) and SSH Access**

- If a jump server is used, SSH access is required from the Ansible control node to the jump server, and from the jump server to the NFS server (if managing NFS exports or directories).
- **Note:**
- Ensure that SSH keys are properly configured and distributed.
- Security groups/firewalls must allow SSH (typically TCP 22) from the control node to the jump server, and from the jump server to the NFS server.
- The jump server must have the NFS share mounted and accessible at the configured path.
- **Common endpoints that may need to be allowed:**
- SSH (TCP 22) from control node to jump server
- SSH (TCP 22) from jump server to NFS server
- NFS (TCP 2049) from jump server to NFS server

### 7. **Viya Deployment Manager (VDM)**

- The Viya Deployment Manager (VDM) role orchestrates the deployment of core SAS Viya services and supporting infrastructure. It may create internal or external services (such as Postgres or Elasticsearch), configure ingress and TLS, expose endpoints (e.g., SAS/CONNECT, Consul UI), and manage storage overlays. VDM can also affect namespace isolation and network policies, especially in multi-tenant environments. Review VDM configuration and deployment options to ensure all required network access is permitted.
- **Note:**
- VDM may expose new endpoints or require connectivity to internal/external databases, storage, or certificate authorities. Ensure that firewalls, security groups, and network policies allow required traffic for all VDM-managed services and integrations, especially in multi-tenant or restricted environments.
- **Common endpoints that may need to be allowed:**
- Ingress controller endpoints (HTTP/HTTPS)
- SAS/CONNECT load balancer endpoints
- Consul UI (port 8500, if enabled)
- Internal/external Postgres (default port 5432)
- Internal Elasticsearch (default port 9200)
- NFS/EFS storage (TCP 2049)
- AWS APIs (for EBS, outbound HTTPS)
- Certificate authorities (e.g., Let's Encrypt ACME, outbound HTTPS)
- Container registries (for pulling images, outbound HTTPS)
- **Container registries that may need to be allowlisted:**
- quay.io
- registry.k8s.io
- gcr.io
- mcr.microsoft.com
- public.ecr.aws
- (plus any additional registries for SAS Viya images and other required workloads)

---

### Container Registries to Allowlist by Cloud Provider

#### AWS
- quay.io (ingress-nginx, cert-manager)
- registry.k8s.io (metrics-server, csi-driver-nfs)
- gcr.io (csi-driver-nfs)
- public.ecr.aws (ebs-csi-driver)

#### Azure
- quay.io (ingress-nginx, cert-manager)
- registry.k8s.io (csi-driver-nfs)
- gcr.io (csi-driver-nfs)

#### GCP
- quay.io (ingress-nginx, cert-manager)
- registry.k8s.io (csi-driver-nfs)
- gcr.io (csi-driver-nfs)

#### Generic K8s / NFS
- quay.io (ingress-nginx, cert-manager)
- registry.k8s.io (csi-driver-nfs)
- gcr.io (csi-driver-nfs)

**Notes:**
- `metrics-server` and `ebs-csi-driver` are AWS only, so their registries are not needed for Azure or GCP.
- If you are using only a specific cloud provider, you only need to allowlist the registries listed for that provider.

---

## **Summary Table**

|Component|Networking Considerations|
|---|---|
|ingress-nginx|Exposes services externally, manages HTTP/S routing, uses cloud load balancers|
|Cluster Autoscaler|Indirectly affects networking by scaling nodes/pods|
|metrics-server|Minimal, requires API access|
|cert-manager|Minimal, may require outbound access for ACME|
|CSI Drivers (NFS, EFS, etc.)|May require network access to storage backends|
|ebs-csi-driver|Requires network connectivity to AWS APIs and EBS endpoints for dynamic volume provisioning; does not expose services externally but enables persistent storage for pods on AWS|
|Jump Server|Requires SSH access from control node and to NFS server; must have NFS share mounted|
|VDM (Viya Deployment Manager)|May create internal/external services (e.g., Postgres, Elasticsearch), configure ingress/TLS, expose endpoints (e.g., SAS/CONNECT, Consul UI), and require network access to storage backends and certificate authorities. Multi-tenancy may affect namespace isolation and network policies.|
17 changes: 17 additions & 0 deletions docs/user/ThirdPartyComponents.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
### Third-Party Components
The following is a list of the third-party components currently in full support/used by the Viya4 DaC. You can also find the chart repo URLs referenced in the repo [README](../README.md).

| Component | Chart Name | Chart Source URL | Container Registry | Purpose | Cloud Provider Support |
|------------------|-------------------|----------------------------------------------------------------------------------------------------------|-------------------------------|---------------------------------------------------------------------------------------------------------|----------------------------------|
| ingress-nginx | ingress-nginx | https://github.com/kubernetes/ingress-nginx/tree/main/charts/ingress-nginx | quay.io | Provides ingress controller for Kubernetes. | AWS, Azure, GCP, generic K8s |
| cert-manager | cert-manager | https://github.com/cert-manager/cert-manager/tree/master/deploy/charts/cert-manager | quay.io | Manages TLS certificates in Kubernetes. | AWS, Azure, GCP, generic K8s |
| metrics-server | metrics-server | https://github.com/kubernetes-sigs/metrics-server/tree/master/charts/metrics-server | registry.k8s.io | Collects resource metrics from K8s nodes and pods. | AWS only |
| csi-driver-nfs | csi-driver-nfs | https://github.com/kubernetes-csi/csi-driver-nfs/tree/master/charts | registry.k8s.io, gcr.io | Provides NFS storage provisioning for Kubernetes (supports AWS EFS, GCP Filestore, Azure Files, NFS). | AWS, Azure, GCP, generic NFS |
| ebs-csi-driver | aws-ebs-csi-driver| https://github.com/kubernetes-sigs/aws-ebs-csi-driver/tree/master/charts/aws-ebs-csi-driver | public.ecr.aws | Provides dynamic provisioning of AWS EBS volumes for persistent storage in Kubernetes. | AWS only |

**Notes:**
- These are the only third-party components installed by default by the DaC.
- All components are installed and managed via Ansible playbooks and Helm charts.
- Chart versions are managed in the Ansible variables and can be overridden by the user if needed.
- All components are compatible with AWS, Azure, and GCP unless otherwise noted. `metrics-server` and `ebs-csi-driver` are for AWS only.
- For more details on how to add or update these components, see the main playbook in `playbooks/playbook.yaml` and the role documentation in the `roles/` directory.
64 changes: 64 additions & 0 deletions playbooks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
### Viya4 DaC Playbook Overview
1. **Create Global Temporary Directory**

- Task: `Global tmp dir`
- Action: Creates a temporary directory for use during the playbook run.
- Tags: `install`, `uninstall`, `update`, `onboard`, `cas-onboard`, `offboard`
- **Networking Considerations:** No anticipated network impact.
2. **Run Task Validations from Common Role**

- Task: `Common role - task validations`
- Action: Includes `common` role’s `task-validations` tasks.
- Tags: `always` (runs every time)
- **Networking Considerations:** No anticipated network impact.
3. **Include Main Tasks from Common Role**

- Task: `Common role`
- Action: Includes the main tasks from the `common` role, making its variables public.
- Tags: `install`, `uninstall`, `update`, `onboard`, `cas-onboard`, `offboard`
- **Networking Considerations:** No anticipated network impact.
4. **Optionally Include Jump-Server Role**

- Task: `jump-server role`
- Action: Includes the `jump-server` role.
- Condition: Runs only if all of these are defined: `JUMP_SVR_HOST`, `JUMP_SVR_USER`, `JUMP_SVR_PRIVATE_KEY`, `V4_CFG_MANAGE_STORAGE` and if `V4_CFG_MANAGE_STORAGE` is `true`.
- Tags: `viya`
- **Networking Considerations:** May require SSH access to the jump server and NFS server. See [NetworkingConsiderations.md](../docs/user/NetworkingConsiderations.md)
5. **Optionally Include Baseline Role for Install**

- Task: `baseline role install`
- Action: Includes the `baseline` role for install actions.
- Condition: Runs only if both `'baseline'` and `'install'` are in `ansible_run_tags`.
- Tags: `baseline`
- **Networking Considerations:** Deploys core components (ingress-nginx, cert-manager, metrics-server, csi-driver-nfs, ebs-csi-driver, etc.) that impact cluster networking, ingress, and storage. See [NetworkingConsiderations.md](../docs/user/NetworkingConsiderations.md)
6. **Optionally Include Multi-Tenancy Role**

- Task: `Multi-tenancy role`
- Action: Includes the `multi-tenancy` role.
- Condition: Runs only if `V4MT_ENABLE` is defined.
- Tags: `multi-tenancy`
- **Networking Considerations:** May create namespaces and network policies. See [NetworkingConsiderations.md](../docs/user/NetworkingConsiderations.md)
7. **Include VDM Role**

- Task: `vdm role`
- Action: Includes the `vdm` role.
- Tags: `viya`, `multi-tenancy`
- **Networking Considerations:** May create services and resources that affect networking within the cluster.
8. **Optionally Include Baseline Role for Uninstall**

- Task: `baseline role uninstall`
- Action: Includes the `baseline` role for uninstall actions.
- Condition: Runs only if both `'baseline'` and `'uninstall'` are in `ansible_run_tags`.
- Tags: `baseline`
- **Networking Considerations:** Removes core components and may affect networking and storage resources. See [NetworkingConsiderations.md](../docs/user/NetworkingConsiderations.md)
9. **Delete Temporary Directory**

- Task: `Delete tmpdir`
- Action: Removes the temporary directory created at the start.
- Tags: `install`, `uninstall`, `update`
- **Networking Considerations:** No anticipated network impact.
**Summary:**
- Tasks are executed in the order listed above.
- Some tasks/roles are conditionally included based on variables or tags.
- The playbook is designed to be flexible for different deployment scenarios by using tags and conditions.
- For a detailed summary of networking considerations, see [NetworkingConsiderations.md](../docs/user/NetworkingConsiderations.md)
Loading