|
| 1 | + |
| 2 | +# Using OCI Functions to create OCI Monitoring custom metric namespace: Services Limit monitoring example use case |
| 3 | + |
| 4 | +## 1. INTRODUCTION |
| 5 | + |
| 6 | +Describes how any user can create an OCI Monitoring ***custom metric namespace*** to being able to extend the default services metric namespaces. For that, we'll support on OCI Function written with python SDK. To cover this educational example, we'll use as an example the creation of a custom metric namespace to monitor the OCI Services Limits usage. With this custom metric namespace, OCI alarms can be created and OCI Notification Service can be used to send the alarm information by different means to allow to create proactively a Service Limit Service Request to increase the limit before causing any disruption in the running services or services to be provisioned. |
| 7 | + |
| 8 | +## 2. SOLUTION |
| 9 | +We can see the overall architecture in the following logical diagram: |
| 10 | + |
| 11 | + |
| 12 | + |
| 13 | +The OCI Functions service (Functions as a Service), has the following structure: |
| 14 | + |
| 15 | + |
| 16 | + |
| 17 | +The function will take care of gathering the OCI Services Limits information and post as a custom metric namespace. The metrics where to post the data will be: |
| 18 | + |
| 19 | +1. **used**: This is the amount of the Service Limit that it is being used |
| 20 | +2. **max_limit**: This is the amount of the Service Limit that we can use (and increase with a SR) |
| 21 | +3. **available**: This is the remaining amount of the Service Limit that we can use until get the max_limit |
| 22 | + |
| 23 | +Not all the Services Limits are equal as they depend of the scope they have (Global, Regional or Availability Domain). We'll introduce the characteristics of a Service Limit as **Dimensions** of the metric, so we can select the Service Limit, the limit name and the scope to filter the specific limit. Thus, we'll have as **Dimensions**: |
| 24 | + |
| 25 | +1. **service_name**: The name of the OCI Service that the Service Limit belongs to (e.g.: Compute) |
| 26 | +2. **limit_name**: The name of the Service Limit (e.g.: bm-standard2-52-count) |
| 27 | +3. **AD**: If the Service Limit has an AD availability, we can filter for the AD where we would like to filter the metric |
| 28 | + |
| 29 | +With this raw data, we could be able to build Alarm Definitions on the specific Services Limits that we would like to monitor, as usually customers do not use all the possible OCI services. |
| 30 | + |
| 31 | +Optionally, they could use any OCI Notification Service to being notified when the alarm fires receiving the notification message in any of the supported options. Some of them will enable the integration with 3rd party services. |
| 32 | + |
| 33 | +## 3. FUNCTION'S LOGIC |
| 34 | + |
| 35 | +Here we'll focus in reviewing the logic behind using the OCI Function Python SDK to get the objectives to monitor the Services Limits with OCI Monitoring. |
| 36 | + |
| 37 | +We use 3 different OCI services API calls to gather the needed information to create the custom metric namespace for the Services Limit monitoring, these are: |
| 38 | + |
| 39 | +1. [Identity and Access Management Service API](https://docs.oracle.com/en-us/iaas/api/#/en/identity/20160918/): To use the [ListAvailabilityDomains](https://docs.oracle.com/en-us/iaas/api/#/en/identity/20160918/AvailabilityDomain/ListAvailabilityDomains) API Call to get the list of the availability domains existing in the input tenancy. |
| 40 | +2. [Monitoring API](https://docs.oracle.com/en-us/iaas/api/#/en/monitoring/20180401/): To use the [PostMetricData](https://docs.oracle.com/en-us/iaas/api/#/en/monitoring/20180401/MetricData/PostMetricData), to post the metrics information in the existing (or non yet existing) custom service metric namespace. If it doesn't exist yet, it just creates it so we don't need to explicitly create it before. |
| 41 | +3. [Service Limits API](https://docs.oracle.com/en-us/iaas/api/#/en/limits/20181025/): To use the [ListLimitsDefinition](https://docs.oracle.com/en-us/iaas/api/#/en/limits/20181025/LimitDefinitionSummary/ListLimitDefinitions) to get the full list of Service Limits in a given compartment and the [GetResouceAvailability](https://docs.oracle.com/en-us/iaas/api/#/en/limits/20181025/ResourceAvailability/GetResourceAvailability), to gather the used and available limits depending in the scope (AD, regional, whole tenancy), of the known Service Limits from the previous list. |
| 42 | + |
| 43 | +Basically the **logic** is: |
| 44 | + |
| 45 | +```` |
| 46 | +Start |
| 47 | +Parse input arguments (compartment_ocid, region) |
| 48 | +Setup the resource principals authentication signer |
| 49 | +Initialise the clients for the different API calls (IAM, Monitoring, Service Limits) |
| 50 | +Gather the full list of Service Limits Definitions sorted by Service Limit name |
| 51 | +For the list of Service Limit names |
| 52 | + If the scope is Availability Domain |
| 53 | + For every AD |
| 54 | + Get the limits and usage for the Service Limit name within this AD |
| 55 | + Post in the custom metric namespace the metric with the dimension of the Service Limit, name and AD with the resources used, available and the service limit maximum |
| 56 | + For the non-AD Service Limit names (Global or Regional) |
| 57 | + Get the limits and usage for the Service Limit name |
| 58 | + Post in the custom metric namespace the metric with the dimension of the Service Limit and name with the resources used, available and the service limit maximum |
| 59 | +End |
| 60 | +```` |
| 61 | + |
| 62 | +## 4. REQUIREMENTS |
| 63 | + |
| 64 | +We have different requirements depending on the variant of this asset that we would use: |
| 65 | + |
| 66 | +1. A **compartment** exists where to locate the Application: Will depend of your landing zone design. Typically this monitoring applications/functions or integrations affecting whole tenancy are located in Security Compartment. |
| 67 | + |
| 68 | +2. A **VCN with a private subnet with Oracle Service Network (OSN) / Internet connectivity** exists: At the Application create time you'll need to choose a VCN and subnet which has the proper egress rule and route to gather the Oracle Service Network of your region through a Service Gateway and, if you're going to use the function in a given region X to gather the services limits on region Y, you'd need to have access to the Internet throught a NAT Gateway. That's why the regional services access through Service Gateway only gives you access to the API endpoints of region X, not the Y. |
| 69 | + |
| 70 | +3. The user must have the **proper permissions in a policy** to work with cloud-shell, container registry, logging service, functions as: |
| 71 | + * Allow group <group-name> to use cloud-shell in tenancy |
| 72 | + * Allow group <group-name> to manage repos in compartment <your application compartment> |
| 73 | + * Allow group <group-name> to manage logging-family in compartment <your application compartment> |
| 74 | + * Allow group <group-name> to read metrics in tenancy |
| 75 | + * Allow group <group-name> to manage functions-family in compartment <your application compartment> |
| 76 | + * Allow group <group-name> to use virtual-network-family in tenancy |
| 77 | + |
| 78 | + |
| 79 | +## 5. INPUT |
| 80 | + |
| 81 | +The required function configuration parameters are: |
| 82 | + |
| 83 | +* **compartment_ocid** with the OCID of your tenancy root compartment. |
| 84 | +* **region** where you want to get the Services Limits with regional scope and where to publish metrics |
| 85 | + |
| 86 | +Others requirements: |
| 87 | +* The **function's Timeout** must be configured with 120s, instead of the default of 30s to avoid to get timeout errors under certain circumstances. |
| 88 | + |
| 89 | +## 6. OUTPUT |
| 90 | + |
| 91 | +Every time the function is invoked, it will feed a custom metric namespace called "**limits_metric**" in the tenancy's root compartment with the information of the Services Limits usage. |
| 92 | + |
| 93 | +You can check the custom metric extension from the OCI Metrics Explorer, where you will be able also to create an alarms from an specific metric query. |
| 94 | + |
| 95 | +## 7. GETTING STARTED |
| 96 | + |
| 97 | +### 7.1 Create the application |
| 98 | + |
| 99 | +Login in your tenancy and navigate from the menu to **Developer Services → Functions → Applications**. |
| 100 | + |
| 101 | +Select the **region** where you want to create the OCI Function. |
| 102 | + |
| 103 | +1. Click **Create Application**. |
| 104 | +2. Specify: |
| 105 | + 1. *app-monitoring* (or the one you wish) as the name for the new application. |
| 106 | + 2. The VCN and subnet that the function will use. |
| 107 | + 3. Click **Create**. |
| 108 | + |
| 109 | +### 7.2 Set up your Cloud Shell dev environment |
| 110 | +1. Click your newly created application *app-monitoring* to display the application details |
| 111 | +2. Click **Getting Started → Cloud Shell Setup → Launch Cloud Shell**. |
| 112 | +3. Set up Fn Project CLI context from the Cloud Shell Terminal: |
| 113 | + * Find the name of the pre-created Fn Project context of the region where you're creating the application: |
| 114 | + ```` |
| 115 | + fn list context |
| 116 | + ```` |
| 117 | +
|
| 118 | + * Setup the Fn Project context: |
| 119 | + ```` |
| 120 | + fn use context <region-context> |
| 121 | + e.g.: |
| 122 | + fn use context eu-amsterdam-1 |
| 123 | + ```` |
| 124 | +
|
| 125 | + * Configure the Fn Project context with the OCID of the current compartment where we'll deploy the function: |
| 126 | + ```` |
| 127 | + fn update context oracle.compartment-id <compartment-ocid> |
| 128 | + e.g.: |
| 129 | + fn update context oracle.compartment-id ocid1.compartment.oc1..aaaaaaaarvdfa72n... |
| 130 | + ```` |
| 131 | +
|
| 132 | + * Configure the Fn Project context with the OCI Registry address in the current region: |
| 133 | + ```` |
| 134 | + fn update context registry <region-key>.ocir.io/<tenancy-namespace>/<repo-name-prefix> |
| 135 | + e.g.: |
| 136 | + fn update context registry ams.ocir.io/frxfz3gchXXX/app-monitoring |
| 137 | + ```` |
| 138 | + |
| 139 | + * Configure the Fn Project context with the OCID of the compartment for repository of images: |
| 140 | + ```` |
| 141 | + fn update context oracle.image-compartment-id <compartment-ocid> |
| 142 | + e.g.: |
| 143 | + fn update context oracle.image-compartment-id ocid1.compartment.oc1..aaaaaaaaquqe______z2q |
| 144 | + ```` |
| 145 | +
|
| 146 | +4. Generate the auth token: |
| 147 | + 1. Click **Generate an Auth Token**, you'll gather the **Auth Tokens** page and click **Generate Token**. |
| 148 | + 2. Enter a name for the token and click **Generate Token**. Copy the newly generated token secret in a safe location that you can retrieve later (the token will never be shown again in the console). |
| 149 | + 3. Close **Generate Token** dialog. |
| 150 | +
|
| 151 | +5. Log in to Registry: |
| 152 | + 1. On the Getting Started page, login in the Container Registry with the docker CLI command: |
| 153 | + ```` |
| 154 | + docker login -u '<tenancy-namespace>/<user-name>' <region-key>.ocir.io |
| 155 | + e.g.: |
| 156 | + docker login -u 'frxfz3gchXXX/oracleidentitycloudservice/[email protected]' ams.ocir.io |
| 157 | + ```` |
| 158 | + 2. When prompted for a password, enther the OCI auth token that you created earlier. |
| 159 | +
|
| 160 | +### 7.3 Deploy the function |
| 161 | +
|
| 162 | +1. Create the nutshell of the function: |
| 163 | + ```` |
| 164 | + fn init --runtime python servicelimits |
| 165 | + ```` |
| 166 | + * A directory called *serviceLimits* is created |
| 167 | + * In the directory you'll find the folloging files: requirements.txt, func.py and func.yaml |
| 168 | +
|
| 169 | +2. Replace the code with the code given: |
| 170 | + 1. Click on the **Cloud Shell settings → Upload** → Drag and Drop the **func.py** and **requirements.txt** files on the upload windows → Upload |
| 171 | + * The files are available here [func.py](./files/Function/func.py) and [requirements.txt](./files/Function/requirements.txt) |
| 172 | + 2. Move the files from your Cloud Shell home directory to the serviceLimits directory and replace the **func.py** and **requirements.txt** files |
| 173 | +
|
| 174 | +3. Deploy the function: |
| 175 | + 1. In the Cloud Shell terminal, move to the serviceLimits directory, execute: |
| 176 | + ```` |
| 177 | + fn -v deploy --app app-monitoring |
| 178 | + ```` |
| 179 | + 2. List the available apps: |
| 180 | + ```` |
| 181 | + fn list functions app-monitoring |
| 182 | + ```` |
| 183 | +
|
| 184 | + 3. Configure the function: |
| 185 | + 1. Increase default timeout: **Developer Services → Applications** → Click on *app-monitoring* → **Functions** → Click on *servicelimits* → **Edit** → Change Timeout from 30s to 120s → **Save Changes**. |
| 186 | + 2. In the **Resources** section → **Configuration**: |
| 187 | + 1. Set as **key: compartment_ocid → value**: <your root tenancy OCID> → **+** |
| 188 | + 2. Set as **key: region → value**: <region where you want to extract its services limits> → **+** |
| 189 | +
|
| 190 | +### 7.4 Invoke the function |
| 191 | +
|
| 192 | +#### 1. Cloud Shell |
| 193 | +
|
| 194 | +```` |
| 195 | +fn invoke app-monitoring servicelimits |
| 196 | +e.g.: |
| 197 | +fn invoke app-monitoring servicelimits |
| 198 | +{'Result': 'OK'} |
| 199 | +```` |
| 200 | +
|
| 201 | +#### 2. OCI CLI |
| 202 | +```` |
| 203 | +oci fn function invoke --function-id ocid1.fnfunc.oc1.eu-amsterdam-1.aaaaa... --body "" --file - |
| 204 | +```` |
| 205 | +
|
| 206 | +#### 3. Periodic execution using an "always true" alarm |
| 207 | +1. **Create the Notification topic with OCI Function subscription: Menu → Developer Services → Notifications → Create Topic** |
| 208 | + 1. **Name**: *tp-mon-functions* |
| 209 | + 2. **Create** |
| 210 | + 3. Select *tp-mon-functions* → **Create Subscription** |
| 211 | + 1. **Protocol**: Function |
| 212 | + 2. **Function compartment**: <your application compartment (usually shared security compartent) |
| 213 | + 3. **Oracle Functions application**: *app-monitoring* |
| 214 | + 4. **Function**: *servicelimits* |
| 215 | +2. **Create the alarm definition: Menu → Observability & Management → Alarm Definitions → Create Alarm**: |
| 216 | + 1. **Alarm name**: *al-runServicesLimitsFn* |
| 217 | + 2. **Alarm severity**: Info |
| 218 | + 3. **Alarm Body**: *Running serviceslimits function* |
| 219 | + 4. **Compartment**: Select a compartment where the VCN that uses the function is created |
| 220 | + 5. **Metric namespace**: oci_vcn |
| 221 | + 6. **Metric name**: VnicContractUtilPercent |
| 222 | + 7. **Internal**: 5 minute |
| 223 | + 8. **Statistic**: Count |
| 224 | + 9. **Trigger rule**: |
| 225 | + 1. **Operator**: greater than or equal to |
| 226 | + 2. **Value**: 1 |
| 227 | + 3. **Trigger delay minutes**: 5 |
| 228 | + 10. **Destination service**: Notifications |
| 229 | + 11. **Compartment**: <compartment where you've created the notification topic with functions subscription> |
| 230 | + 12. **Topic**: *tp-mon-functions* |
| 231 | + 13. **Repeat notification?** → Enabled |
| 232 | + 14. **Notification frequency**: 5 minutes |
| 233 | + 15. **Save alarm** |
| 234 | +
|
| 235 | +## 8.KNOWN PROBLEMS |
| 236 | +
|
| 237 | +None at this point. |
| 238 | +
|
| 239 | +## 9.RELEASE NOTES |
| 240 | +
|
| 241 | +2023-08-03 (version 0.1). Initial public release. |
| 242 | + |
| 243 | +# LICENSE |
| 244 | +
|
| 245 | +Copyright (c) 2023 Oracle and/or its affiliates. |
| 246 | +
|
| 247 | +Licensed under the Universal Permissive License (UPL), Version 1.0. |
| 248 | +
|
| 249 | +See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/folder-structure/LICENSE) for more details. |
0 commit comments