Skip to content

Commit 629d33d

Browse files
authored
Merge pull request #345 from oracle-devrel/custom-metric-FN-services-limit-monitoring
initial release of the OCI FN custom metric
2 parents 7d273f0 + c33574e commit 629d33d

File tree

7 files changed

+532
-0
lines changed

7 files changed

+532
-0
lines changed

manageability-and-operations/observability-and-manageability/oci-monitoring/custom-metrics/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ In this section will bring some examples about how to create OCI Monitoring cust
66
# Team Publications
77

88
- [Using python SDK to create OCI Monitoring custom metric namespace: Services Limit monitoring example use case](./custom-metric-python-SDK-services-limit-monitoring/README.md)
9+
- [Using OCI Functions to create OCI Monitoring custom metric namespace: Services Limit monitoring example use case](./custom-metric-FN-services-limit-monitoring/README.md)
910

1011
# License
1112

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
Copyright (c) 2023 Oracle and/or its affiliates.
2+
3+
The Universal Permissive License (UPL), Version 1.0
4+
5+
Subject to the condition set forth below, permission is hereby granted to any
6+
person obtaining a copy of this software, associated documentation and/or data
7+
(collectively the "Software"), free of charge and under any and all copyright
8+
rights in the Software, and any and all patent rights owned or freely
9+
licensable by each licensor hereunder covering either (i) the unmodified
10+
Software as contributed to or provided by such licensor, or (ii) the Larger
11+
Works (as defined below), to deal in both
12+
13+
(a) the Software, and
14+
(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if
15+
one is included with the Software (each a "Larger Work" to which the Software
16+
is contributed by such licensors),
17+
18+
without restriction, including without limitation the rights to copy, create
19+
derivative works of, display, perform, and distribute the Software and make,
20+
use, sell, offer for sale, import, export, have made, and have sold the
21+
Software and the Larger Work(s), and to sublicense the foregoing rights on
22+
either these or other terms.
23+
24+
This license is subject to the following condition:
25+
The above copyright notice and either this complete permission notice or at
26+
a minimum a reference to the UPL must be included in all copies or
27+
substantial portions of the Software.
28+
29+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
30+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
31+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
32+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
33+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
34+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
35+
SOFTWARE.
Lines changed: 249 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,249 @@
1+
2+
# Using OCI Functions to create OCI Monitoring custom metric namespace: Services Limit monitoring example use case
3+
4+
## 1. INTRODUCTION
5+
6+
Describes how any user can create an OCI Monitoring ***custom metric namespace*** to being able to extend the default services metric namespaces. For that, we'll support on OCI Function written with python SDK. To cover this educational example, we'll use as an example the creation of a custom metric namespace to monitor the OCI Services Limits usage. With this custom metric namespace, OCI alarms can be created and OCI Notification Service can be used to send the alarm information by different means to allow to create proactively a Service Limit Service Request to increase the limit before causing any disruption in the running services or services to be provisioned.
7+
8+
## 2. SOLUTION
9+
We can see the overall architecture in the following logical diagram:
10+
11+
![Logical diagram](./files/Diagrams/services-limit_FN-solution.png)
12+
13+
The OCI Functions service (Functions as a Service), has the following structure:
14+
15+
![FaaS diagram](./files/Diagrams/oci-functions-arq.png)
16+
17+
The function will take care of gathering the OCI Services Limits information and post as a custom metric namespace. The metrics where to post the data will be:
18+
19+
1. **used**: This is the amount of the Service Limit that it is being used
20+
2. **max_limit**: This is the amount of the Service Limit that we can use (and increase with a SR)
21+
3. **available**: This is the remaining amount of the Service Limit that we can use until get the max_limit
22+
23+
Not all the Services Limits are equal as they depend of the scope they have (Global, Regional or Availability Domain). We'll introduce the characteristics of a Service Limit as **Dimensions** of the metric, so we can select the Service Limit, the limit name and the scope to filter the specific limit. Thus, we'll have as **Dimensions**:
24+
25+
1. **service_name**: The name of the OCI Service that the Service Limit belongs to (e.g.: Compute)
26+
2. **limit_name**: The name of the Service Limit (e.g.: bm-standard2-52-count)
27+
3. **AD**: If the Service Limit has an AD availability, we can filter for the AD where we would like to filter the metric
28+
29+
With this raw data, we could be able to build Alarm Definitions on the specific Services Limits that we would like to monitor, as usually customers do not use all the possible OCI services.
30+
31+
Optionally, they could use any OCI Notification Service to being notified when the alarm fires receiving the notification message in any of the supported options. Some of them will enable the integration with 3rd party services.
32+
33+
## 3. FUNCTION'S LOGIC
34+
35+
Here we'll focus in reviewing the logic behind using the OCI Function Python SDK to get the objectives to monitor the Services Limits with OCI Monitoring.
36+
37+
We use 3 different OCI services API calls to gather the needed information to create the custom metric namespace for the Services Limit monitoring, these are:
38+
39+
1. [Identity and Access Management Service API](https://docs.oracle.com/en-us/iaas/api/#/en/identity/20160918/): To use the [ListAvailabilityDomains](https://docs.oracle.com/en-us/iaas/api/#/en/identity/20160918/AvailabilityDomain/ListAvailabilityDomains) API Call to get the list of the availability domains existing in the input tenancy.
40+
2. [Monitoring API](https://docs.oracle.com/en-us/iaas/api/#/en/monitoring/20180401/): To use the [PostMetricData](https://docs.oracle.com/en-us/iaas/api/#/en/monitoring/20180401/MetricData/PostMetricData), to post the metrics information in the existing (or non yet existing) custom service metric namespace. If it doesn't exist yet, it just creates it so we don't need to explicitly create it before.
41+
3. [Service Limits API](https://docs.oracle.com/en-us/iaas/api/#/en/limits/20181025/): To use the [ListLimitsDefinition](https://docs.oracle.com/en-us/iaas/api/#/en/limits/20181025/LimitDefinitionSummary/ListLimitDefinitions) to get the full list of Service Limits in a given compartment and the [GetResouceAvailability](https://docs.oracle.com/en-us/iaas/api/#/en/limits/20181025/ResourceAvailability/GetResourceAvailability), to gather the used and available limits depending in the scope (AD, regional, whole tenancy), of the known Service Limits from the previous list.
42+
43+
Basically the **logic** is:
44+
45+
````
46+
Start
47+
Parse input arguments (compartment_ocid, region)
48+
Setup the resource principals authentication signer
49+
Initialise the clients for the different API calls (IAM, Monitoring, Service Limits)
50+
Gather the full list of Service Limits Definitions sorted by Service Limit name
51+
For the list of Service Limit names
52+
If the scope is Availability Domain
53+
For every AD
54+
Get the limits and usage for the Service Limit name within this AD
55+
Post in the custom metric namespace the metric with the dimension of the Service Limit, name and AD with the resources used, available and the service limit maximum
56+
For the non-AD Service Limit names (Global or Regional)
57+
Get the limits and usage for the Service Limit name
58+
Post in the custom metric namespace the metric with the dimension of the Service Limit and name with the resources used, available and the service limit maximum
59+
End
60+
````
61+
62+
## 4. REQUIREMENTS
63+
64+
We have different requirements depending on the variant of this asset that we would use:
65+
66+
1. A **compartment** exists where to locate the Application: Will depend of your landing zone design. Typically this monitoring applications/functions or integrations affecting whole tenancy are located in Security Compartment.
67+
68+
2. A **VCN with a private subnet with Oracle Service Network (OSN) / Internet connectivity** exists: At the Application create time you'll need to choose a VCN and subnet which has the proper egress rule and route to gather the Oracle Service Network of your region through a Service Gateway and, if you're going to use the function in a given region X to gather the services limits on region Y, you'd need to have access to the Internet throught a NAT Gateway. That's why the regional services access through Service Gateway only gives you access to the API endpoints of region X, not the Y.
69+
70+
3. The user must have the **proper permissions in a policy** to work with cloud-shell, container registry, logging service, functions as:
71+
* Allow group <group-name> to use cloud-shell in tenancy
72+
* Allow group <group-name> to manage repos in compartment <your application compartment>
73+
* Allow group <group-name> to manage logging-family in compartment <your application compartment>
74+
* Allow group <group-name> to read metrics in tenancy
75+
* Allow group <group-name> to manage functions-family in compartment <your application compartment>
76+
* Allow group <group-name> to use virtual-network-family in tenancy
77+
78+
79+
## 5. INPUT
80+
81+
The required function configuration parameters are:
82+
83+
* **compartment_ocid** with the OCID of your tenancy root compartment. 
84+
* **region** where you want to get the Services Limits with regional scope and where to publish metrics
85+
86+
Others requirements:
87+
* The **function's Timeout** must be configured with 120s, instead of the default of 30s to avoid to get timeout errors under certain circumstances.
88+
89+
## 6. OUTPUT
90+
91+
Every time the function is invoked, it will feed a custom metric namespace called "**limits_metric**" in the tenancy's root compartment with the information of the Services Limits usage.
92+
93+
You can check the custom metric extension from the OCI Metrics Explorer, where you will be able also to create an alarms from an specific metric query.
94+
95+
## 7. GETTING STARTED
96+
97+
### 7.1 Create the application
98+
99+
Login in your tenancy and navigate from the menu to **Developer Services → Functions → Applications**.
100+
101+
Select the **region** where you want to create the OCI Function.
102+
103+
1. Click **Create Application**.
104+
2. Specify:
105+
1. *app-monitoring* (or the one you wish) as the name for the new application.
106+
2. The VCN and subnet that the function will use.
107+
3. Click **Create**.
108+
109+
### 7.2 Set up your Cloud Shell dev environment
110+
1. Click your newly created application *app-monitoring* to display the application details
111+
2. Click **Getting Started → Cloud Shell Setup → Launch Cloud Shell**.
112+
3. Set up Fn Project CLI context from the Cloud Shell Terminal:
113+
* Find the name of the pre-created Fn Project context of the region where you're creating the application:
114+
````
115+
fn list context
116+
````
117+
118+
* Setup the Fn Project context:
119+
````
120+
fn use context <region-context>
121+
e.g.:
122+
fn use context eu-amsterdam-1
123+
````
124+
125+
* Configure the Fn Project context with the OCID of the current compartment where we'll deploy the function:
126+
````
127+
fn update context oracle.compartment-id <compartment-ocid>
128+
e.g.:
129+
fn update context oracle.compartment-id ocid1.compartment.oc1..aaaaaaaarvdfa72n...
130+
````
131+
132+
* Configure the Fn Project context with the OCI Registry address in the current region:
133+
````
134+
fn update context registry <region-key>.ocir.io/<tenancy-namespace>/<repo-name-prefix>
135+
e.g.:
136+
fn update context registry ams.ocir.io/frxfz3gchXXX/app-monitoring
137+
````
138+
139+
* Configure the Fn Project context with the OCID of the compartment for repository of images:
140+
````
141+
fn update context oracle.image-compartment-id <compartment-ocid>
142+
e.g.:
143+
fn update context oracle.image-compartment-id ocid1.compartment.oc1..aaaaaaaaquqe______z2q
144+
````
145+
146+
4. Generate the auth token:
147+
1. Click **Generate an Auth Token**, you'll gather the **Auth Tokens** page and click **Generate Token**.
148+
2. Enter a name for the token and click **Generate Token**. Copy the newly generated token secret in a safe location that you can retrieve later (the token will never be shown again in the console).
149+
3. Close **Generate Token** dialog.
150+
151+
5. Log in to Registry:
152+
1. On the Getting Started page, login in the Container Registry with the docker CLI command:
153+
````
154+
docker login -u '<tenancy-namespace>/<user-name>' <region-key>.ocir.io
155+
e.g.:
156+
docker login -u 'frxfz3gchXXX/oracleidentitycloudservice/[email protected]' ams.ocir.io
157+
````
158+
2. When prompted for a password, enther the OCI auth token that you created earlier.
159+
160+
### 7.3 Deploy the function
161+
162+
1. Create the nutshell of the function:
163+
````
164+
fn init --runtime python servicelimits
165+
````
166+
* A directory called *serviceLimits* is created
167+
* In the directory you'll find the folloging files: requirements.txt, func.py and func.yaml
168+
169+
2. Replace the code with the code given:
170+
1. Click on the **Cloud Shell settings → Upload** → Drag and Drop the **func.py** and **requirements.txt** files on the upload windows → Upload
171+
* The files are available here [func.py](./files/Function/func.py) and [requirements.txt](./files/Function/requirements.txt)
172+
2. Move the files from your Cloud Shell home directory to the serviceLimits directory and replace the **func.py** and **requirements.txt** files
173+
174+
3. Deploy the function:
175+
1. In the Cloud Shell terminal, move to the serviceLimits directory, execute:
176+
````
177+
fn -v deploy --app app-monitoring
178+
````
179+
2. List the available apps:
180+
````
181+
fn list functions app-monitoring
182+
````
183+
184+
3. Configure the function:
185+
1. Increase default timeout: **Developer Services → Applications** → Click on *app-monitoring* → **Functions** → Click on *servicelimits* → **Edit** → Change Timeout from 30s to 120s → **Save Changes**.
186+
2. In the **Resources** section → **Configuration**:
187+
1. Set as **key: compartment_ocid → value**: <your root tenancy OCID> → **+**
188+
2. Set as **key: region → value**: <region where you want to extract its services limits> → **+**
189+
190+
### 7.4 Invoke the function
191+
192+
#### 1. Cloud Shell
193+
194+
````
195+
fn invoke app-monitoring servicelimits
196+
e.g.:
197+
fn invoke app-monitoring servicelimits
198+
{'Result': 'OK'}
199+
````
200+
201+
#### 2. OCI CLI
202+
````
203+
oci fn function invoke --function-id ocid1.fnfunc.oc1.eu-amsterdam-1.aaaaa... --body "" --file -
204+
````
205+
206+
#### 3. Periodic execution using an "always true" alarm
207+
1. **Create the Notification topic with OCI Function subscription: Menu → Developer Services → Notifications → Create Topic**
208+
1. **Name**: *tp-mon-functions*
209+
2. **Create**
210+
3. Select *tp-mon-functions* → **Create Subscription**
211+
1. **Protocol**: Function
212+
2. **Function compartment**: <your application compartment (usually shared security compartent)
213+
3. **Oracle Functions application**: *app-monitoring*
214+
4. **Function**: *servicelimits*
215+
2. **Create the alarm definition: Menu → Observability & Management → Alarm Definitions → Create Alarm**:
216+
1. **Alarm name**: *al-runServicesLimitsFn*
217+
2. **Alarm severity**: Info
218+
3. **Alarm Body**: *Running serviceslimits function*
219+
4. **Compartment**: Select a compartment where the VCN that uses the function is created
220+
5. **Metric namespace**: oci_vcn
221+
6. **Metric name**: VnicContractUtilPercent
222+
7. **Internal**: 5 minute
223+
8. **Statistic**: Count
224+
9. **Trigger rule**:
225+
1. **Operator**: greater than or equal to
226+
2. **Value**: 1
227+
3. **Trigger delay minutes**: 5
228+
10. **Destination service**: Notifications
229+
11. **Compartment**: <compartment where you've created the notification topic with functions subscription>
230+
12. **Topic**: *tp-mon-functions*
231+
13. **Repeat notification?** → Enabled
232+
14. **Notification frequency**: 5 minutes
233+
15. **Save alarm**
234+
235+
## 8.KNOWN PROBLEMS
236+
237+
None at this point.
238+
239+
## 9.RELEASE NOTES
240+
241+
2023-08-03 (version 0.1). Initial public release.
242+
243+
# LICENSE
244+
245+
Copyright (c) 2023 Oracle and/or its affiliates.
246+
247+
Licensed under the Universal Permissive License (UPL), Version 1.0.
248+
249+
See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/folder-structure/LICENSE) for more details.

0 commit comments

Comments
 (0)