Skip to content

Commit b04c315

Browse files
authored
Merge pull request #196974 from adwilso/patch-4
Scheduled Events Windows Improvements
2 parents 1d49bfb + 6950c4a commit b04c315

File tree

1 file changed

+185
-43
lines changed

1 file changed

+185
-43
lines changed

articles/virtual-machines/windows/scheduled-events.md

Lines changed: 185 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -40,10 +40,10 @@ With Scheduled Events, your application can discover when maintenance will occur
4040

4141
Scheduled Events provides events in the following use cases:
4242

43-
- [Platform initiated maintenance](../maintenance-and-updates.md?bc=/azure/virtual-machines/windows/breadcrumb/toc.json&toc=/azure/virtual-machines/windows/toc.json) (for example, VM reboot, live migration or memory preserving updates for host)
44-
- Virtual machine is running on [degraded host hardware](https://azure.microsoft.com/blog/find-out-when-your-virtual-machine-hardware-is-degraded-with-scheduled-events) that is predicted to fail soon
45-
- Virtual machine was running on a host that suffered a hardware failure
46-
- User-initiated maintenance (for example, a user restarts or redeploys a VM)
43+
- [Platform initiated maintenance](../maintenance-and-updates.md?bc=/azure/virtual-machines/windows/breadcrumb/toc.json&toc=/azure/virtual-machines/windows/toc.json) (for example, VM reboot, live migration or memory preserving updates for host).
44+
- Virtual machine is running on [degraded host hardware](https://azure.microsoft.com/blog/find-out-when-your-virtual-machine-hardware-is-degraded-with-scheduled-events) that is predicted to fail soon.
45+
- Virtual machine was running on a host that suffered a hardware failure.
46+
- User-initiated maintenance (for example, a user restarts or redeploys a VM).
4747
- [Spot VM](../spot-vms.md) and [Spot scale set](../../virtual-machine-scale-sets/use-spot.md) instance evictions.
4848

4949
## The Basics
@@ -97,7 +97,7 @@ Scheduled Events is disabled for your service if it does not make a request for
9797
### User-initiated maintenance
9898
User-initiated VM maintenance via the Azure portal, API, CLI, or PowerShell results in a scheduled event. You then can test the maintenance preparation logic in your application, and your application can prepare for user-initiated maintenance.
9999

100-
If you restart a VM, an event with the type `Reboot` is scheduled. If you redeploy a VM, an event with the type `Redeploy` is scheduled.
100+
If you restart a VM, an event with the type `Reboot` is scheduled. If you redeploy a VM, an event with the type `Redeploy` is scheduled. Typically events with a user event source can be immediately approved to avoid a delay on user-initiated actions.
101101

102102
## Use the API
103103

@@ -115,6 +115,22 @@ curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-versio
115115
```
116116
Invoke-RestMethod -Headers @{"Metadata"="true"} -Method GET -Uri "http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01" | ConvertTo-Json -Depth 64
117117
```
118+
#### Python sample
119+
````
120+
import json
121+
import requests
122+
123+
metadata_url ="http://169.254.169.254/metadata/scheduledevents"
124+
header = {'Metadata' : 'true'}
125+
query_params = {'api-version':'2020-07-01'}
126+
127+
def get_scheduled_events():
128+
resp = requests.get(metadata_url, headers = header, params = query_params)
129+
data = resp.json()
130+
return data
131+
132+
````
133+
118134

119135
A response contains an array of scheduled events. An empty array means that currently no events are scheduled.
120136
In the case where there are scheduled events, the response contains an array of events.
@@ -140,13 +156,13 @@ In the case where there are scheduled events, the response contains an array of
140156
### Event properties
141157
|Property | Description |
142158
| - | - |
143-
| Document Incarnation | Integer that increases when the events contained in the array of scheduled events changes. Documents with the same incarnation contain the same event information, and the incarnation will be incremented when an event changes. |
159+
| Document Incarnation | Integer that increases when the events array changes. Documents with the same incarnation contain the same event information, and the incarnation will be incremented when an event changes. |
144160
| EventId | Globally unique identifier for this event. <br><br> Example: <br><ul><li>602d9444-d2cd-49c7-8624-8643e7171297 |
145161
| EventType | Impact this event causes. <br><br> Values: <br><ul><li> `Freeze`: The Virtual Machine is scheduled to pause for a few seconds. CPU and network connectivity may be suspended, but there is no impact on memory or open files.<li>`Reboot`: The Virtual Machine is scheduled for reboot (non-persistent memory is lost). <li>`Redeploy`: The Virtual Machine is scheduled to move to another node (ephemeral disks are lost). <li>`Preempt`: The Spot Virtual Machine is being deleted (ephemeral disks are lost). This event is made available on a best effort basis <li> `Terminate`: The virtual machine is scheduled to be deleted. |
146162
| ResourceType | Type of resource this event affects. <br><br> Values: <ul><li>`VirtualMachine`|
147163
| Resources| List of resources this event affects. The list is guaranteed to contain machines from at most one [update domain](../availability.md), but it might not contain all machines in the UD. <br><br> Example: <br><ul><li> ["FrontEnd_IN_0", "BackEnd_IN_0"] |
148164
| EventStatus | Status of this event. <br><br> Values: <ul><li>`Scheduled`: This event is scheduled to start after the time specified in the `NotBefore` property.<li>`Started`: This event has started.</ul> No `Completed` or similar status is ever provided. The event is no longer returned when the event is finished.
149-
| NotBefore| Time after which this event can start. The event is guaranteed to not start before this time. <br><br> Example: <br><ul><li> Mon, 19 Sep 2016 18:29:47 GMT |
165+
| NotBefore| Time after which this event can start. The event is guaranteed to not start before this time. Will be blank if the event has already started <br><br> Example: <br><ul><li> Mon, 19 Sep 2016 18:29:47 GMT |
150166
| Description | Description of this event. <br><br> Example: <br><ul><li> Host server is undergoing maintenance. |
151167
| EventSource | Initiator of the event. <br><br> Example: <br><ul><li> `Platform`: This event is initiated by platform. <li>`User`: This event is initiated by user. |
152168
| DurationInSeconds | The expected duration of the interruption caused by the event. <br><br> Example: <br><ul><li> `9`: The interruption caused by the event will last for 9 seconds. <li>`-1`: The default value used if the impact duration is either unknown or not applicable. |
@@ -164,18 +180,20 @@ Each event is scheduled a minimum amount of time in the future based on the even
164180

165181
> [!NOTE]
166182
> In some cases, Azure is able to predict host failure due to degraded hardware and will attempt to mitigate disruption to your service by scheduling a migration. Affected virtual machines will receive a scheduled event with a `NotBefore` that is typically a few days in the future. The actual time varies depending on the predicted failure risk assessment. Azure tries to give 7 days' advance notice when possible, but the actual time varies and might be smaller if the prediction is that there is a high chance of the hardware failing imminently. To minimize risk to your service in case the hardware fails before the system-initiated migration, we recommend that you self-redeploy your virtual machine as soon as possible.
183+
167184
>[!NOTE]
168-
> In the case the host node experiences a hardware failure Azure will bypass the minimum notice period an immediately begin the recovery process for affected virtual machines. This reduces recovery time in the case that the affected VMs are unable to respond. During the recovery process an event will be created for all impacted VMs with EventType = Reboot and EventStatus = Started
185+
> In the case the host node experiences a hardware failure Azure will bypass the minimum notice period an immediately begin the recovery process for affected virtual machines. This reduces recovery time in the case that the affected VMs are unable to respond. During the recovery process an event will be created for all impacted VMs with `EventType = Reboot` and `EventStatus = Started`.
169186
170187
### Polling frequency
171188

172189
You can poll the endpoint for updates as frequently or infrequently as you like. However, the longer the time between requests, the more time you potentially lose to react to an upcoming event. Most events have 5 to 15 minutes of advance notice, although in some cases advance notice might be as little as 30 seconds. To ensure that you have as much time as possible to take mitigating actions, we recommend that you poll the service once per second.
173190

174191
### Start an event
175192

176-
After you learn of an upcoming event and finish your logic for graceful shutdown, you can approve the outstanding event by making a `POST` call to Metadata Service with `EventId`. This call indicates to Azure that it can shorten the minimum notification time (when possible).
193+
After you learn of an upcoming event and finish your logic for graceful shutdown, you can approve the outstanding event by making a `POST` call to Metadata Service with `EventId`. This call indicates to Azure that it can shorten the minimum notification time (when possible). The event may not start immediately upon approval, in some cases Azure will require the approval of all the VMs hosted on the node before proceeding with the event.
177194

178195
The following JSON sample is expected in the `POST` request body. The request should contain a list of `StartRequests`. Each `StartRequest` contains `EventId` for the event you want to expedite:
196+
179197
```
180198
{
181199
"StartRequests" : [
@@ -186,6 +204,9 @@ The following JSON sample is expected in the `POST` request body. The request sh
186204
}
187205
```
188206

207+
The service will always return a 200 success code in the case of a valid event ID, even if it was already approved by a different VM. A 400 error code indicates that the request header or payload was malformed.
208+
209+
189210
#### Bash sample
190211
```
191212
curl -H Metadata:true -X POST -d '{"StartRequests": [{"EventId": "f020ba2e-3bc0-4c40-a10b-86575a9eabd5"}]}' http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01
@@ -194,63 +215,184 @@ curl -H Metadata:true -X POST -d '{"StartRequests": [{"EventId": "f020ba2e-3bc0-
194215
```
195216
Invoke-RestMethod -Headers @{"Metadata" = "true"} -Method POST -body '{"StartRequests": [{"EventId": "5DD55B64-45AD-49D3-BBC9-F57D4EA97BD7"}]}' -Uri http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01 | ConvertTo-Json -Depth 64
196217
```
218+
#### Python sample
219+
````
220+
import json
221+
import requests
222+
223+
def confirm_scheduled_event(event_id):
224+
# This payload confirms a single event with id event_id
225+
payload = json.dumps({"StartRequests": [{"EventId": event_id }]})
226+
response = requests.post("http://169.254.169.254/metadata/scheduledevents",
227+
headers = {'Metadata' : 'true'},
228+
params = {'api-version':'2020-07-01'},
229+
data = payload)
230+
return response.status_code
231+
````
197232

198233
> [!NOTE]
199234
> Acknowledging an event allows the event to proceed for all `Resources` in the event, not just the VM that acknowledges the event. Therefore, you can choose to elect a leader to coordinate the acknowledgement, which might be as simple as the first machine in the `Resources` field.
200235
201-
## Python Sample
236+
## Example responses
237+
The following is an example of a series of events that were seen by two VMs that were live migrated to another node.
202238

203-
The following sample queries Metadata Service for scheduled events and approves each outstanding event:
239+
The `DocumentIncarnation` is changing every time there is new information in `Events`. An approval of the event would allow the freeze to proceed for both WestNO_0 and WestNO_1. The `DurationInSeconds` of -1 indicates that the platform does not know how long the operation will take.
204240

205-
```python
206-
#!/usr/bin/python
241+
```JSON
242+
{
243+
"DocumentIncarnation": 1,
244+
"Events": [
245+
]
246+
}
207247

208-
import json
209-
import socket
210-
import urllib2
248+
{
249+
"DocumentIncarnation": 2,
250+
"Events": [
251+
{
252+
"EventId": "C7061BAC-AFDC-4513-B24B-AA5F13A16123",
253+
"EventStatus": "Scheduled",
254+
"EventType": "Freeze",
255+
"ResourceType": "VirtualMachine",
256+
"Resources": [
257+
"WestNO_0",
258+
"WestNO_1"
259+
],
260+
"NotBefore": "Mon, 11 Apr 2022 22:26:58 GMT",
261+
"Description": "Virtual machine is being paused because of a memory-preserving Live Migration operation.",
262+
"EventSource": "Platform",
263+
"DurationInSeconds": -1
264+
}
265+
]
266+
}
211267

212-
metadata_url = "http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01"
213-
this_host = socket.gethostname()
268+
{
269+
"DocumentIncarnation": 3,
270+
"Events": [
271+
{
272+
"EventId": "C7061BAC-AFDC-4513-B24B-AA5F13A16123",
273+
"EventStatus": "Started",
274+
"EventType": "Freeze",
275+
"ResourceType": "VirtualMachine",
276+
"Resources": [
277+
"WestNO_0",
278+
"WestNO_1"
279+
],
280+
"NotBefore": "",
281+
"Description": "Virtual machine is being paused because of a memory-preserving Live Migration operation.",
282+
"EventSource": "Platform",
283+
"DurationInSeconds": -1
284+
}
285+
]
286+
}
214287

288+
{
289+
"DocumentIncarnation": 4,
290+
"Events": [
291+
]
292+
}
215293

216-
def get_scheduled_events():
217-
req = urllib2.Request(metadata_url)
218-
req.add_header('Metadata', 'true')
219-
resp = urllib2.urlopen(req)
220-
data = json.loads(resp.read())
221-
return data
294+
```
222295

296+
## Python Sample
297+
298+
The following sample queries Metadata Service for scheduled events and approves each outstanding event:
223299

224-
def handle_scheduled_events(data):
225-
for evt in data['Events']:
226-
eventid = evt['EventId']
227-
status = evt['EventStatus']
228-
resources = evt['Resources']
229-
eventtype = evt['EventType']
230-
resourcetype = evt['ResourceType']
231-
notbefore = evt['NotBefore'].replace(" ", "_")
232-
description = evt['Description']
233-
eventSource = evt['EventSource']
234-
if this_host in resources:
235-
print("+ Scheduled Event. This host " + this_host +
236-
" is scheduled for " + eventtype +
237-
" by " + eventSource +
238-
" with description " + description +
239-
" not before " + notbefore)
240-
# Add logic for handling events here
300+
```python
301+
#!/usr/bin/python
302+
import json
303+
import requests
304+
from time import sleep
305+
306+
# The URL to access the metadata service
307+
metadata_url ="http://169.254.169.254/metadata/scheduledevents"
308+
# This must be sent otherwise the request will be ignored
309+
header = {'Metadata' : 'true'}
310+
# Current version of the API
311+
query_params = {'api-version':'2020-07-01'}
312+
313+
def get_scheduled_events():
314+
resp = requests.get(metadata_url, headers = header, params = query_params)
315+
data = resp.json()
316+
return data
241317

318+
def confirm_scheduled_event(event_id):
319+
# This payload confirms a single event with id event_id
320+
# You can confirm multiple events in a single request if needed
321+
payload = json.dumps({"StartRequests": [{"EventId": event_id }]})
322+
response = requests.post(metadata_url,
323+
headers= header,
324+
params = query_params,
325+
data = payload)
326+
return response.status_code
327+
328+
def log(event):
329+
# This is an optional placeholder for logging events to your system
330+
print(event["Description"])
331+
return
332+
333+
def advanced_sample(last_document_incarnation):
334+
# Poll every second to see if there are new scheduled events to process
335+
# Since some events may have necessarily short warning periods, it is
336+
# recommended to poll frequently
337+
found_document_incarnation = last_document_incarnation
338+
while (last_document_incarnation == found_document_incarnation):
339+
sleep(1)
340+
payload = get_scheduled_events()
341+
found_document_incarnation = payload["DocumentIncarnation"]
342+
343+
# We recommend processing all events in a document together,
344+
# even if you won't be actioning on them right away
345+
for event in payload["Events"]:
346+
347+
# Events that have already started, logged for tracking
348+
if (event["EventStatus"] == "Started"):
349+
log(event)
350+
351+
# Approve all user initiated events. These are typically created by an
352+
# administrator and approving them immediately can help to avoid delays
353+
# in admin actions
354+
elif (event["EventSource"] == "User"):
355+
confirm_scheduled_event(event["EventId"])
356+
357+
# For this application, freeze events less that 9 seconds are considered
358+
# no impact. This will immediately approve them
359+
elif (event["EventType"] == "Freeze" and
360+
int(event["DurationInSeconds"]) >= 0 and
361+
int(event["DurationInSeconds"]) < 9):
362+
confirm_scheduled_event(event["EventId"])
363+
364+
# Events that may be impactful (eg. Reboot or redeploy) may need custom
365+
# handling for your application
366+
else:
367+
#TODO Custom handling for impactful events
368+
log(event)
369+
print("Processed events from document: " + str(found_document_incarnation))
370+
return found_document_incarnation
242371

243372
def main():
244-
data = get_scheduled_events()
245-
handle_scheduled_events(data)
373+
# This will track the last set of events seen
374+
last_document_incarnation = "-1"
375+
376+
input_text = "\
377+
Press 1 to poll for new events \n\
378+
Press 2 to exit \n "
379+
program_exit = False
246380

381+
while program_exit == False:
382+
user_input = input(input_text)
383+
if (user_input == "1"):
384+
last_document_incarnation = advanced_sample(last_document_incarnation)
385+
elif (user_input == "2"):
386+
program_exit = True
247387

248388
if __name__ == '__main__':
249389
main()
250390
```
251391

252392
## Next steps
253393
- Review the Scheduled Events code samples in the [Azure Instance Metadata Scheduled Events GitHub repository](https://github.com/Azure-Samples/virtual-machines-scheduled-events-discover-endpoint-for-non-vnet-vm).
394+
- Review the Node.js Scheduled Events code samples in [Azure Samples GitHub repository](https://github.com/Azure/vm-scheduled-events).
254395
- Read more about the APIs that are available in the [Instance Metadata Service](instance-metadata-service.md).
255396
- Learn about [planned maintenance for Windows virtual machines in Azure](../maintenance-and-updates.md?bc=/azure/virtual-machines/windows/breadcrumb/toc.json&toc=/azure/virtual-machines/windows/toc.json).
256397
- Learn how to [monitor scheduled events for your VMs through Log Analytics](./scheduled-event-service.md).
398+
- Learn how to log scheduled events using Azure Event Hub in the [Azure Samples GitHub repository](https://github.com/Azure-Samples/virtual-machines-python-scheduled-events-central-logging).

0 commit comments

Comments
 (0)