Merge pull request #27466 from chrisnegus/prod-env-text

k8s-ci-robot · web-flow · commit dd9fa36afd29 · 2021-05-04T12:27:14.000-07:00
Adding new Production Environment section
diff --git a/content/en/docs/setup/production-environment/_index.md b/content/en/docs/setup/production-environment/_index.md
@@ -1,4 +1,293 @@
 ---
-title: Production environment
+title: "Production environment"
+description: Create a production-quality Kubernetes cluster
 weight: 30
+no_list: true
 ---
+<!-- overview -->
+
+A production-quality Kubernetes cluster requires planning and preparation.
+If your Kubernetes cluster is to run critical workloads, it must be configured to be resilient.
+This page explains steps you can take to set up a production-ready cluster,
+or to uprate an existing cluster for production use.
+If you're already familiar with production setup and want the links, skip to
+[What's next](#what-s-next).
+
+<!-- body -->
+
+## Production considerations
+
+Typically, a production Kubernetes cluster environment has more requirements than a
+personal learning, development, or test environment Kubernetes. A production environment may require
+secure access by many users, consistent availability, and the resources to adapt
+to changing demands.
+
+As you decide where you want your production Kubernetes environment to live
+(on premises or in a cloud) and the amount of management you want to take
+on or hand to others, consider how your requirements for a Kubernetes cluster
+are influenced by the following issues:
+
+- *Availability*: A single-machine Kubernetes [learning environment](/docs/setup/#learning-environment)
+has a single point of failure. Creating a highly available cluster means considering:
+  - Separating the control plane from the worker nodes.
+  - Replicating the control plane components on multiple nodes.
+  - Load balancing traffic to the cluster’s {{< glossary_tooltip term_id="kube-apiserver" text="API server" >}}.
+  - Having enough worker nodes available, or able to quickly become available, as changing workloads warrant it.
+
+- *Scale*: If you expect your production Kubernetes environment to receive a stable amount of
+demand, you might be able to set up for the capacity you need and be done. However,
+if you expect demand to grow over time or change dramatically based on things like
+season or special events, you need to plan how to scale to relieve increased
+pressure from more requests to the control plane and worker nodes or scale down to reduce unused
+resources.
+
+- *Security and access management*: You have full admin privileges on your own
+Kubernetes learning cluster. But shared clusters with important workloads, and
+more than one or two users, require a more refined approach to who and what can
+access cluster resources. You can use role-based access control
+([RBAC](/docs/reference/access-authn-authz/rbac/)) and other
+security mechanisms to make sure that users and workloads can get access to the
+resources they need, while keeping workloads, and the cluster itself, secure.
+You can set limits on the resources that users and workloads can access
+by managing [policies](https://kubernetes.io/docs/concepts/policy/) and
+[container resources](/docs/concepts/configuration/manage-resources-containers/).
+
+Before building a Kubernetes production environment on your own, consider
+handing off some or all of this job to 
+[Turnkey Cloud Solutions](/docs/setup/production-environment/turnkey-solutions/) 
+providers or other [Kubernetes Partners](https://kubernetes.io/partners/).
+Options include:
+
+- *Serverless*: Just run workloads on third-party equipment without managing
+a cluster at all. You will be charged for things like CPU usage, memory, and
+disk requests.
+- *Managed control plane*: Let the provider manage the scale and availability
+of the cluster's control plane, as well as handle patches and upgrades.
+- *Managed worker nodes*: Configure pools of nodes to meet your needs,
+then the provider makes sure those nodes are available and ready to implement
+upgrades when needed.
+- *Integration*: There are providers that integrate Kubernetes with other
+services you may need, such as storage, container registries, authentication
+methods, and development tools.
+
+Whether you build a production Kubernetes cluster yourself or work with
+partners, review the following sections to evaluate your needs as they relate
+to your cluster’s *control plane*, *worker nodes*, *user access*, and
+*workload resources*.
+
+## Production cluster setup
+
+In a production-quality Kubernetes cluster, the control plane manages the
+cluster from services that can be spread across multiple computers
+in different ways. Each worker node, however, represents a single entity that
+is configured to run Kubernetes pods.
+
+### Production control plane
+
+The simplest Kubernetes cluster has the entire control plane and worker node
+services running on the same machine. You can grow that environment by adding
+worker nodes, as reflected in the diagram illustrated in
+[Kubernetes Components](/docs/concepts/overview/components/).
+If the cluster is meant to be available for a short period of time, or can be
+discarded if something goes seriously wrong, this might meet your needs.
+
+If you need a more permanent, highly available cluster, however, you should
+consider ways of extending the control plane. By design, one-machine control
+plane services running on a single machine are not highly available.
+If keeping the cluster up and running
+and ensuring that it can be repaired if something goes wrong is important,
+consider these steps:
+
+- *Choose deployment tools*: You can deploy a control plane using tools such
+as kubeadm, kops, and kubespray. See
+[Installing Kubernetes with deployment tools](/docs/setup/production-environment/tools/)
+to learn tips for production-quality deployments using each of those deployment
+methods. Different [Container Runtimes](/docs/setup/production-environment/container-runtimes/)
+are available to use with your deployments.
+- *Manage certificates*: Secure communications between control plane services
+are implemented using certificates. Certificates are automatically generated
+during deployment or you can generate them using your own certificate authority.
+See [PKI certificates and requirements](/docs/setup/best-practices/certificates/) for details.
+- *Configure load balancer for apiserver*: Configure a load balancer
+to distribute external API requests to the apiserver service instances running on different nodes. See 
+[Create an External Load Balancer](/docs/tasks/access-application-cluster/create-external-load-balancer/)
+for details.
+- *Separate and backup etcd service*: The etcd services can either run on the
+same machines as other control plane services or run on separate machines, for
+extra security and availability. Because etcd stores cluster configuration data,
+backing up the etcd database should be done regularly to ensure that you can
+repair that database if needed.
+See the [etcd FAQ](https://etcd.io/docs/v3.4/faq/) for details on configuring and using etcd.
+See [Operating etcd clusters for Kubernetes](/docs/tasks/administer-cluster/configure-upgrade-etcd/)
+and [Set up a High Availability etcd cluster with kubeadm](/docs/setup/production-environment/tools/kubeadm/setup-ha-etcd-with-kubeadm/)
+for details.
+- *Create multiple control plane systems*: For high availability, the
+control plane should not be limited to a single machine. If the control plane
+services are run by an init service (such as systemd), each service should run on at
+least three machines. However, running control plane services as pods in
+Kubernetes ensures that the replicated number of services that you request
+will always be available.
+The scheduler should be fault tolerant,
+but not highly available. Some deployment tools set up [Raft](https://raft.github.io/)
+consensus algorithm to do leader election of Kubernetes services. If the
+primary goes away, another service elects itself and take over. 
+- *Span multiple zones*: If keeping your cluster available at all times is
+critical, consider creating a cluster that runs across multiple data centers,
+referred to as zones in cloud environments. Groups of zones are referred to as regions.
+By spreading a cluster across
+multiple zones in the same region, it can improve the chances that your
+cluster will continue to function even if one zone becomes unavailable.
+See [Running in multiple zones](/docs/setup/best-practices/multiple-zones/) for details.
+- *Manage on-going features*: If you plan to keep your cluster over time,
+there are tasks you need to do to maintain its health and security. For example,
+if you installed with kubeadm, there are instructions to help you with
+[Certificate Management](/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/)
+and [Upgrading kubeadm clusters](/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/).
+See [Administer a Cluster](/docs/tasks/administer-cluster/)
+for a longer list of Kubernetes administrative tasks.
+
+To learn about available options when you run control plane services, see
+[kube-apiserver](/docs/reference/command-line-tools-reference/kube-apiserver/),
+[kube-controller-manager](/docs/reference/command-line-tools-reference/kube-controller-manager/),
+and [kube-scheduler](/docs/reference/command-line-tools-reference/kube-scheduler/)
+component pages. For highly available control plane examples, see
+[Options for Highly Available topology](/docs/setup/production-environment/tools/kubeadm/ha-topology/),
+[Creating Highly Available clusters with kubeadm](/docs/setup/production-environment/tools/kubeadm/high-availability/),
+and [Operating etcd clusters for Kubernetes](/docs/tasks/administer-cluster/configure-upgrade-etcd/).
+See [Backing up an etcd cluster](/docs/tasks/administer-cluster/configure-upgrade-etcd/#backing-up-an-etcd-cluster)
+for information on making an etcd backup plan.
+
+### Production worker nodes
+
+Production-quality workloads need to be resilient and anything they rely
+on needs to be resilient (such as CoreDNS). Whether you manage your own
+control plane or have a cloud provider do it for you, you still need to
+consider how you want to manage your worker nodes (also referred to
+simply as *nodes*).  
+
+- *Configure nodes*: Nodes can be physical or virtual machines. If you want to
+create and manage your own nodes, you can install a supported operating system,
+then add and run the appropriate
+[Node services](/docs/concepts/overview/components/#node-components). Consider:
+  - The demands of your workloads when you set up nodes by having appropriate memory, CPU, and disk speed and storage capacity available.
+  - Whether generic computer systems will do or you have workloads that need GPU processors, Windows nodes, or VM isolation.
+- *Validate nodes*: See [Valid node setup](/docs/setup/best-practices/node-conformance/)
+for information on how to ensure that a node meets the requirements to join
+a Kubernetes cluster.
+- *Add nodes to the cluster*: If you are managing your own cluster you can
+add nodes by setting up your own machines and either adding them manually or
+having them register themselves to the cluster’s apiserver. See the
+[Nodes](/docs/concepts/architecture/nodes/) section for information on how to set up Kubernetes to add nodes in these ways.
+- *Add Windows nodes to the cluster*: Kubernetes offers support for Windows
+worker nodes, allowing you to run workloads implemented in Windows containers. See
+[Windows in Kubernetes](/docs/setup/production-environment/windows/) for details.
+- *Scale nodes*: Have a plan for expanding the capacity your cluster will
+eventually need. See [Considerations for large clusters](/docs/setup/best-practices/cluster-large/)
+to help determine how many nodes you need, based on the number of pods and
+containers you need to run. If you are managing nodes yourself, this can mean
+purchasing and installing your own physical equipment.
+- *Autoscale nodes*: Most cloud providers support
+[Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#readme)
+to replace unhealthy nodes or grow and shrink the number of nodes as demand requires. See the
+[Frequently Asked Questions](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md)
+for how the autoscaler works and
+[Deployment](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#deployment)
+for how it is implemented by different cloud providers. For on-premises, there
+are some virtualization platforms that can be scripted to spin up new nodes
+based on demand.
+- *Set up node health checks*: For important workloads, you want to make sure
+that the nodes and pods running on those nodes are healthy. Using the
+[Node Problem Detector](/docs/tasks/debug-application-cluster/monitor-node-health/)
+daemon, you can ensure your nodes are healthy.
+
+## Production user management
+
+In production, you may be moving from a model where you or a small group of
+people are accessing the cluster to where there may potentially be dozens or
+hundreds of people. In a learning environment or platform prototype, you might have a single
+administrative account for everything you do. In production, you will want
+more accounts with different levels of access to different namespaces.
+
+Taking on a production-quality cluster means deciding how you
+want to selectively allow access by other users. In particular, you need to
+select strategies for validating the identities of those who try to access your
+cluster (authentication) and deciding if they have permissions to do what they
+are asking (authorization):
+
+- *Authentication*: The apiserver can authenticate users using client
+certificates, bearer tokens, an authenticating proxy, or HTTP basic auth.
+You can choose which authentication methods you want to use.
+Using plugins, the apiserver can leverage your organization’s existing
+authentication methods, such as LDAP or Kerberos. See
+[Authentication](/docs/reference/access-authn-authz/authentication/)
+for a description of these different methods of authenticating Kubernetes users.
+- *Authorization*: When you set out to authorize your regular users, you will probably choose between RBAC and ABAC authorization. See [Authorization Overview](/docs/reference/access-authn-authz/authorization/) to review different modes for authorizing user accounts (as well as service account access to your cluster):
+  - *Role-based access control* ([RBAC](/docs/reference/access-authn-authz/rbac/)): Lets you assign access to your cluster by allowing specific sets of permissions to authenticated users. Permissions can be assigned for a specific namespace (Role) or across the entire cluster (ClusterRole). Then using RoleBindings and ClusterRoleBindings, those permissions can be attached to particular users.
+  - *Attribute-based access control* ([ABAC](/docs/reference/access-authn-authz/abac/)): Lets you create policies based on resource attributes in the cluster and will allow or deny access based on those attributes. Each line of a policy file identifies versioning properties (apiVersion and kind) and a map of spec properties to match the subject (user or group), resource property, non-resource property (/version or /apis), and readonly. See [Examples](/docs/reference/access-authn-authz/abac/#examples) for details.
+
+As someone setting up authentication and authorization on your production Kubernetes cluster, here are some things to consider:
+
+- *Set the authorization mode*: When the Kubernetes API server
+([kube-apiserver](/docs/reference/command-line-tools-reference/kube-apiserver/))
+starts, the supported authentication modes must be set using the *--authorization-mode*
+flag. For example, that flag in the *kube-adminserver.yaml* file (in */etc/kubernetes/manifests*)
+could be set to Node,RBAC. This would allow Node and RBAC authorization for authenticated requests.
+- *Create user certificates and role bindings (RBAC)*: If you are using RBAC
+authorization, users can create a CertificateSigningRequest (CSR) that can be
+signed by the cluster CA. Then you can bind Roles and ClusterRoles to each user.
+See [Certificate Signing Requests](/docs/reference/access-authn-authz/certificate-signing-requests/)
+for details.
+- *Create policies that combine attributes (ABAC)*: If you are using ABAC
+authorization, you can assign combinations of attributes to form policies to
+authorize selected users or groups to access particular resources (such as a
+pod), namespace, or apiGroup. For more information, see
+[Examples](/docs/reference/access-authn-authz/abac/#examples).
+- *Consider Admission Controllers*: Additional forms of authorization for
+requests that can come in through the API server include
+[Webhook Token Authentication](/docs/reference/access-authn-authz/authentication/#webhook-token-authentication).
+Webhooks and other special authorization types need to be enabled by adding
+[Admission Controllers](/docs/reference/access-authn-authz/admission-controllers/)
+to the API server.
+
+## Set limits on workload resources
+
+Demands from production workloads can cause pressure both inside and outside
+of the Kubernetes control plane. Consider these items when setting up for the
+needs of your cluster's workloads:
+
+- *Set namespace limits*: Set per-namespace quotas on things like memory and CPU. See
+[Manage Memory, CPU, and API Resources](/docs/tasks/administer-cluster/manage-resources/)
+for details. You can also set
+[Hierarchical Namespaces](/blog/2020/08/14/introducing-hierarchical-namespaces/)
+for inheriting limits.
+- *Prepare for DNS demand*: If you expect workloads to massively scale up,
+your DNS service must be ready to scale up as well. See
+[Autoscale the DNS service in a Cluster](/docs/tasks/administer-cluster/dns-horizontal-autoscaling/).
+- *Create additional service accounts*: User accounts determine what users can
+do on a cluster, while a service account defines pod access within a particular
+namespace. By default, a pod takes on the default service account from its namespace.
+See [Managing Service Accounts](/docs/reference/access-authn-authz/service-accounts-admin/)
+for information on creating a new service account. For example, you might want to:
+  - Add secrets that a pod could use to pull images from a particular container registry. See [Configure Service Accounts for Pods](/docs/tasks/configure-pod-container/configure-service-account/) for an example.
+  - Assign RBAC permissions to a service account. See [ServiceAccount permissions](/docs/reference/access-authn-authz/rbac/#service-account-permissions) for details.
+
+## What's next {#what-s-next}
+
+- Decide if you want to build your own production Kubernetes or obtain one from
+available [Turnkey Cloud Solutions](/docs/setup/production-environment/turnkey-solutions/)
+or [Kubernetes Partners](https://kubernetes.io/partners/).
+- If you choose to build your own cluster, plan how you want to
+handle [certificates](/docs/setup/best-practices/certificates/)
+and set up high availability for features such as
+[etcd](/docs/setup/production-environment/tools/kubeadm/setup-ha-etcd-with-kubeadm/)
+and the
+[API server](/docs/setup/production-environment/tools/kubeadm/ha-topology/).
+- Choose from [kubeadm](/docs/setup/production-environment/tools/kubeadm/), [kops](/docs/setup/production-environment/tools/kops/) or [Kubespray](/docs/setup/production-environment/tools/kubespray/)
+deployment methods.
+- Configure user management by determining your
+[Authentication](/docs/reference/access-authn-authz/authentication/) and
+[Authorization](docs/reference/access-authn-authz/authorization/) methods.
+- Prepare for application workloads by setting up
+[resource limits](docs/tasks/administer-cluster/manage-resources/),
+[DNS autoscaling](/docs/tasks/administer-cluster/dns-horizontal-autoscaling/)
+and [service accounts](/docs/reference/access-authn-authz/service-accounts-admin/).