|
1 | | -# oci-kubernetes-monitoring |
| 1 | +# Monitoring Solution for Kubernetes |
| 2 | + |
| 3 | +## About |
| 4 | + |
| 5 | +This provides an end-to-end monitoring solution for Oracle Container Engine for Kubernetes (OKE) and other forms of Kubernetes Clusters using Logging Analytics, Monitoring and other Oracle Cloud Infrastructure (OCI) Services. |
| 6 | + |
| 7 | +## Logs |
| 8 | + |
| 9 | +This solutions offers collection of various logs of a Kubernetes cluster into OCI Logging Analytics and offer rich analytics on top of the collected logs. Users may choose to customise the log collection by modifying the out of the box configuration that it provides. |
| 10 | + |
| 11 | +### Kubernetes System/Service Logs |
| 12 | + |
| 13 | +OKE or Kubernetes comes up with some built-in services where each one has different responsibilities and they run on one or more nodes in the cluster either as Deployments or DaemonSets. |
| 14 | + |
| 15 | +The following service logs are configured to be collected out of the box: |
| 16 | +- Kube Proxy |
| 17 | +- Kube Flannel |
| 18 | +- Kubelet |
| 19 | +- CoreDNS |
| 20 | +- CSI Node Driver |
| 21 | +- DNS Autoscaler |
| 22 | +- Cluster Autoscaler |
| 23 | +- Proxymux Client |
| 24 | + |
| 25 | +### Linux System Logs |
| 26 | + |
| 27 | +The following Linux system logs are configured to be collected out of the box: |
| 28 | +- Syslog |
| 29 | +- Secure logs |
| 30 | +- Cron logs |
| 31 | +- Mail logs |
| 32 | +- Audit logs |
| 33 | +- Ksplice Uptrack logs |
| 34 | +- Yum logs |
| 35 | + |
| 36 | +### Control Plane Logs |
| 37 | + |
| 38 | +The following are various Control Plane components in OKE/Kubernetes. |
| 39 | +- Kube API Server |
| 40 | +- Kube Scheduler |
| 41 | +- Kube Controller Manager |
| 42 | +- Cloud Controller Manager |
| 43 | +- etcd |
| 44 | + |
| 45 | +At present, control plane logs are not covered as part of out of the box collection, as these logs are not exposed to OKE customers. |
| 46 | +The out of the box collection for these logs will be available soon for generic Kubernetes clusters and for OKE (when OKE makes these logs accessible to end users). |
| 47 | + |
| 48 | +### Application Pod/Container Logs |
| 49 | +All the logs from application pods writing STDOUT/STDERR are typically available under /var/log/containers/. |
| 50 | +Application which are having custom log handlers (say log4j or similar) may route their logs differently but in general would be available on the node (through a volume). |
| 51 | + |
| 52 | +## Kubernetes Objects |
| 53 | + |
| 54 | +"Kubernetes objects are persistent entities in the Kubernetes system. Kubernetes uses these entities to represent the state of your cluster. Specifically, they can describe: |
| 55 | +- What containerized applications are running (and on which nodes) |
| 56 | +- The resources available to those applications |
| 57 | +- The policies around how those applications behave, such as restart policies, upgrades, and fault-tolerance" |
| 58 | + |
| 59 | +*Reference* : [Kubernetes Objects](https://kubernetes.io/docs/concepts/overview/working-with-objects/kubernetes-objects/) |
| 60 | + |
| 61 | +The following are the list of objects supported at present: |
| 62 | +- Nodes |
| 63 | +- Namespaces |
| 64 | +- Pods |
| 65 | +- DaemonSets |
| 66 | +- Deployments |
| 67 | +- ReplicaSets |
| 68 | +- Events |
| 69 | + |
| 70 | +## Installation Instructions |
| 71 | + |
| 72 | +### Pre-requisites |
| 73 | + |
| 74 | +- Logging Analytics Service must be enabled in the given OCI region before trying out the following Solution. Refer [Logging Analytics Quick Start](https://docs.oracle.com/en-us/iaas/logging-analytics/doc/quick-start.html) for details. |
| 75 | +- Create a Logging Analytics LogGroup(s) if not have done already. Refer [Create Log Group](https://docs.oracle.com/en-us/iaas/logging-analytics/doc/create-logging-analytics-resources.html#GUID-D1758CFB-861F-420D-B12F-34D1CC5E3E0E). |
| 76 | +- Enable access to the log group(s) to uploads logs from Kubernetes environment: |
| 77 | + - For InstancePrincipal based AuthZ (recommended for OKE and Kubernetes clusters running on OCI): |
| 78 | + - Create a dynamic group including relevant OCI Instances. Refer [this](https://docs.oracle.com/en-us/iaas/Content/Identity/Tasks/managingdynamicgroups.htm) for details about managing dynamic groups. |
| 79 | + - Add an IAM policy like, |
| 80 | + ``` |
| 81 | + Allow dynamic-group <dynamic_group_name> to {LOG_ANALYTICS_LOG_GROUP_UPLOAD_LOGS} in compartment <Logging Analytics LogGroup's compartment_name> |
| 82 | + ``` |
| 83 | + - For Config file based (user principal) AuthZ: |
| 84 | + - Add an IAM policy like, |
| 85 | + ``` |
| 86 | + Allow group <user_group_name> to {LOG_ANALYTICS_LOG_GROUP_UPLOAD_LOGS} in compartment <Logging Analytics LogGroup's compartment_name> |
| 87 | + ``` |
| 88 | + |
| 89 | +### Docker Image |
| 90 | +
|
| 91 | +We are in the process of building a docker image based off Oracle Linux 8 including Fluentd, OCI Logging Analytics Output Plugin and all the required dependencies. |
| 92 | +All the dependencies will be build from source and installed into the image. This image soon would be available to use as a pre-built image as is (OR) to create a custom image using this image as a base image. |
| 93 | +At present, for testing purposes follow the below mentioned steps to build an image using official Fluentd Docker Image as base image (off Debian). |
| 94 | +- Download all the files from [this dir](/logan/docker-images/v1.0/debian/) into a local machine having access to internet. |
| 95 | +- Run the following command to build the docker image. |
| 96 | + - *docker build -t fluentd_oci_la -f Dockerfile .* |
| 97 | +- The docker image built from the above step, can either be pushed to Docker Hub or OCI Container Registry (OCIR) or to a Local Docker Registry depending on the requirements. |
| 98 | + - [How to push the image to Docker Hub](https://docs.docker.com/docker-hub/repos/#pushing-a-docker-container-image-to-docker-hub) |
| 99 | + - [How to push the image to OCIR](https://www.oracle.com/webfolder/technetwork/tutorials/obe/oci/registry/index.html). |
| 100 | + - [How to push the image to Local Registry](https://docs.docker.com/registry/deploying/). |
| 101 | + |
| 102 | +### Deploying Kuberenetes resources using Kubectl |
| 103 | +
|
| 104 | +#### Pre-requisites |
| 105 | +
|
| 106 | +- A machine having kubectl installed and setup to point to your Kubernetes environment. |
| 107 | +
|
| 108 | +#### To enable Logs collection |
| 109 | +
|
| 110 | +Download all the yaml files from [this dir](/logan/kubernetes-resources/logs-collection/). |
| 111 | +These yaml files needs to be applied using kubectl to create the necessary resources that enables the logs collection into Logging Analytics through a Fluentd based DaemonSet. |
| 112 | +
|
| 113 | +##### configmap-docker.yaml | configmap-cri.yaml |
| 114 | +
|
| 115 | +- This file contains the necessary out of the box fluentd configuration to collect Kubernetes System/Service Logs, Linux System Logs and Application Pod/Container Logs. |
| 116 | +- Some log locations may differ for Kubernetes clusters other than OKE, EKS and may need modifications accordingly. |
| 117 | +- Use configmap-docker.yaml for Kubernetes clusters based off Docker runtime (e.g., OKE < 1.20) and configmap-cri.yaml for Kubernetes clusters based off CRI-O. |
| 118 | +- Inline comments are available in the file for each of the source/filter/match blocks for easy reference for making any changes to the configuration. |
| 119 | +- Refer [this](https://docs.oracle.com/en/learn/oci_logging_analytics_fluentd/) to learn about each of the Logging Analytics Fluentd Output plugin configuration parameters. |
| 120 | +- *Note*: A generic source with time only parser is defined/configured for collecting all application pod logs from /var/log/containers/ out of the box. |
| 121 | + It is recommended to define and use a LogSource/LogParser at Logging Analytics for a given log type and then modify the configuration accordingly. |
| 122 | + When adding a configuration (Source, Filter section) for any new container log, also exclude the log path from generic log collection, |
| 123 | + by adding the log path to *exclude_path* field in *in_tail_containerlogs* source block. This is to avoid the duplicate collection of logs through generic log collection. |
| 124 | +
|
| 125 | +##### fluentd-daemonset.yaml |
| 126 | +
|
| 127 | +- This file has all the necessary resources required to deploy and run the Fluentd docker image as Daemonset. |
| 128 | +- Inline comments are available in the file describing each of the fields/sections. |
| 129 | +- Make sure to replace the fields with actual values before deploying. |
| 130 | +- At minimum, <IMAGE_URL>, <OCI_LOGGING_ANALYTICS_LOG_GROUP_ID>, <OCI_TENANCY_NAMESPACE> needs to be updated. |
| 131 | +- It is recommended to update <KUBERNETES_CLUSTER_OCID>,<KUBERNETES_CLUSTER_NAME> too, to tag all the logs processed with corresponding Kubernetes cluster at Logging Analytics. |
| 132 | +
|
| 133 | +##### secrets.yaml (Optional) |
| 134 | +
|
| 135 | +- At present, InstancePrincipal and OCI Config File (UserPrincipal) based Auth/AuthZ are supported for Fluentd to talk to OCI Logging Analytics APIs. |
| 136 | +- We recommend to use InstancePrincipal based AuthZ for OKE and all clusters which are running on OCI VMs and that is the default auth type configured. |
| 137 | +- Applying this file is not required when using InstancePrincipal based auth type. |
| 138 | +- When config file based Authz is used, modify this file to fill out the values under config section with appropriate values. |
| 139 | +
|
| 140 | +##### Commands Reference |
| 141 | +
|
| 142 | +Apply the yaml files in the sequence of configmap-docker.yaml(or configmap-cri.yaml), secrets.yaml (not required for default auth type) and fluentd-daemonset.yaml. |
| 143 | +
|
| 144 | +``` |
| 145 | +$ kubectl apply -f configmap-docker.yaml |
| 146 | +configmap/oci-la-fluentd-logs-configmap created |
| 147 | + |
| 148 | +$ kubectl apply -f secrets.yaml |
| 149 | +secret/oci-la-credentials-secret created |
| 150 | + |
| 151 | +$ kubectl apply -f fluentd-daemonset.yaml |
| 152 | +serviceaccount/oci-la-fluentd-serviceaccount created |
| 153 | +clusterrole.rbac.authorization.k8s.io/oci-la-fluentd-logs-clusterrole created |
| 154 | +clusterrolebinding.rbac.authorization.k8s.io/oci-la-fluentd-logs-clusterrolebinding created |
| 155 | +daemonset.apps/oci-la-fluentd-daemonset created |
| 156 | +``` |
| 157 | +
|
| 158 | +Use the following command to restart DaemonSet after applying any modifications to configmap or secrets to reflect the changes into the Fluentd. |
| 159 | +
|
| 160 | +``` |
| 161 | +kubectl rollout restart daemonset oci-la-fluentd-daemonset -n=kube-system |
| 162 | +``` |
| 163 | +
|
| 164 | +#### To enable Kubernetes Objects collection |
| 165 | +
|
| 166 | +Download all the yaml files from [this dir](/logan/kubernetes-resources/objects-collection/). |
| 167 | +These yaml files needs to be applied using kubectl to create the necessary resources that enables the Kuberetes Objects collection into Logging Analytics. |
| 168 | +
|
| 169 | +##### configMap-objects.yaml |
| 170 | +
|
| 171 | +- This file contains the necessary out of the box fluentd configuration to collect Kubernetes Objects. |
| 172 | +- Refer [this](https://docs.oracle.com/en/learn/oci_logging_analytics_fluentd/) to learn about each of the Logging Analytics Fluentd Output plugin configuration parameters. |
| 173 | +
|
| 174 | +##### fluentd-deployment.yaml |
| 175 | +
|
| 176 | +Refer [this](#fluentd-daemonsetyaml) section. |
| 177 | +
|
| 178 | +##### secrets.yaml (Optional) |
| 179 | +
|
| 180 | +Refer [this](#secretsyaml-optional) section. |
| 181 | +
|
| 182 | +##### Commands Reference |
| 183 | +
|
| 184 | +Apply the yaml files in the sequence of configmap-objects.yaml, secrets.yaml (not required for default auth type) and fluentd-deployment.yaml. |
| 185 | +
|
| 186 | +``` |
| 187 | +$ kubectl apply -f configmap-objects.yaml |
| 188 | +configmap/oci-la-fluentd-objects-configmap configured |
| 189 | + |
| 190 | +$ kubectl apply -f fluentd-deployment.yaml |
| 191 | +serviceaccount/oci-la-fluentd-serviceaccount unchanged |
| 192 | +clusterrole.rbac.authorization.k8s.io/oci-la-fluentd-objects-clusterrole created |
| 193 | +clusterrolebinding.rbac.authorization.k8s.io/oci-la-fluentd-objects-clusterrolebinding created |
| 194 | +deployment.apps/oci-la-fluentd-deployment created |
| 195 | +``` |
| 196 | +
|
| 197 | +Use the following command to restart Deployment after applying any modifications to configmap or secrets to reflect the changes into the Fluentd. |
| 198 | +
|
| 199 | +``` |
| 200 | +kubectl rollout restart deployment oci-la-fluentd-deployment -n=kube-system |
| 201 | +``` |
| 202 | +
|
| 203 | +### Deploying Kuberenetes resources using Helm |
| 204 | +
|
| 205 | +Coming soon ... |
| 206 | +
|
| 207 | +
|
| 208 | +
|
| 209 | +
|
| 210 | +
|
| 211 | +
|
| 212 | +
|
| 213 | +
|
| 214 | +
|
| 215 | +
|
| 216 | + |
0 commit comments