Skip to content

Commit a6b8954

Browse files
committed
move to mdx
1 parent 6135f96 commit a6b8954

File tree

5 files changed

+82
-75
lines changed

5 files changed

+82
-75
lines changed

src/content/docs/support/ai.mdx

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
title: Support AI
3+
tableOfContents: false
4+
sidebar:
5+
order: 8
6+
---
7+
8+
import SupportAI from "~/components/SupportAI.tsx";
9+
10+
<SupportAI client:load />

src/content/docs/support/cloudflare-status.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
---
22
pcx_content_type: concept
33
title: Cloudflare Status
4+
sidebar:
5+
order: 5
46
---
57

68
Cloudflare provides updates on the status of our services and network at https://www.cloudflarestatus.com/, which you should check if you notice unexpected behavior with Cloudflare.

src/content/docs/support/customer-incident-management-policy.mdx

Lines changed: 68 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,8 @@
22
pcx_content_type: troubleshooting
33
source: https://support.cloudflare.com/hc/en-us/articles/230054288-Customer-Incident-Management-Policy
44
title: Customer Incident Management Policy
5-
5+
sidebar:
6+
order: 6
67
---
78

89
## Purpose
@@ -11,30 +12,30 @@ Cloudflare believes that openness and transparency are intrinsic to the delivery
1112

1213
This Standard Operating Procedure (SOP) defines how Cloudflare deals with all incidents and problems impacting its production environment and the ways in which Cloudflare communicates the nature and impact of these incidents to Enterprise customers, both planned and unplanned, regardless of severity.  This procedure specifies how these efforts are uniformly followed in order to
1314

14-
* maximize environment uptime,
15-
* minimize client impact,
16-
* reduce the time to repair, and
17-
* share information with our customers and the Internet community.
15+
- maximize environment uptime,
16+
- minimize client impact,
17+
- reduce the time to repair, and
18+
- share information with our customers and the Internet community.
1819

19-
***
20+
---
2021

2122
## Scope
2223

2324
This SOP applies to Cloudflare customers and customer services as consumed by customers. The SOP is applicable to all customer production environments at Cloudflare including:
2425

25-
* Cloudflare’s public website ([www.cloudflare.com](http://www.cloudflare.com/))
26-
* Cloudflare’s APIs (Application Programming Interfaces)
27-
* Outbound third-party interfaces (e.g. credit card authorizations, etc.)
28-
* Network infrastructure owned or managed by Cloudflare for production services
29-
* Vendor software, hardware and services that affect any part of Cloudflare production
26+
- Cloudflare’s public website ([www.cloudflare.com](http://www.cloudflare.com/))
27+
- Cloudflare’s APIs (Application Programming Interfaces)
28+
- Outbound third-party interfaces (e.g. credit card authorizations, etc.)
29+
- Network infrastructure owned or managed by Cloudflare for production services
30+
- Vendor software, hardware and services that affect any part of Cloudflare production
3031

31-
***
32+
---
3233

3334
## Background
3435

3536
Cloudflare wants to build a better Internet. In order to deliver an improved experience to millions of Internet users, Cloudflare’s internal operations must follow excellent service delivery processes and procedures.  Cloudflare’s procedures therefore follow many industry-standard best practices, some of which specifically follow patterns of the Information Library Infrastructure Technology (ITIL).  This SOP follows the best practices of the ITIL Problem Management methodology.
3637

37-
***
38+
---
3839

3940
## Definitions
4041

@@ -120,7 +121,7 @@ The primary tool which Cloudflare uses to publicly share information about its s
120121

121122
The Status Page is hosted by a Third Party ([Statuspage.io](http://statuspage.io/)) which is not dependent on Cloudflare’s services for operation.
122123

123-
***
124+
---
124125

125126
## Roles and responsibilities
126127

@@ -150,32 +151,32 @@ The overall Systems Reliability Engineering team who support the efforts of the
150151

151152
Support the Incident Manager during problem resolution. Join bridge calls, if requested. Ensure documentation is captured while diagnosing and correcting issues and proper escalation to other responsible groups is executed. Participate in Post Mortem reviews of some Incident Reports, as requested by Cloudflare Management.
152153

153-
***
154+
---
154155

155156
## Standard Operating Procedure
156157

157158
This section details the procedures for incident and problem management.  At a high-level, these processes relate as follows:
158159

159-
* Incident Management:  The overall process for observing and responding to alerts, including: assessing the potential impact and severity of an Incident, classifying the Incident as a Problem, assigning a priority to the Problem, or dismissing the Incident as a non-impacting event if a problem condition cannot be identified.
160+
- Incident Management:  The overall process for observing and responding to alerts, including: assessing the potential impact and severity of an Incident, classifying the Incident as a Problem, assigning a priority to the Problem, or dismissing the Incident as a non-impacting event if a problem condition cannot be identified.
160161

161-
* Problem Management:  The process of identifying the scope and extent of a Problem, assigning an appropriate severity level (P0, P1, P2, P3),  the actions to resolve the Problem and restore the optimal state for production services, and the communication of the Problem to appropriate parties.
162+
- Problem Management:  The process of identifying the scope and extent of a Problem, assigning an appropriate severity level (P0, P1, P2, P3),  the actions to resolve the Problem and restore the optimal state for production services, and the communication of the Problem to appropriate parties.
162163

163-
* Resolution Management:  The process of investigating the causes and conditions which lead to a problem condition, reporting on the overall manner by which a problem was managed and resolved, and any subsequent analysis of how the conditions and causes of a problem may be prevented in the future. 
164+
- Resolution Management:  The process of investigating the causes and conditions which lead to a problem condition, reporting on the overall manner by which a problem was managed and resolved, and any subsequent analysis of how the conditions and causes of a problem may be prevented in the future. 
164165

165-
***
166+
---
166167

167168
The primary goal of Incident Management is to identify and react to potential problems as quickly as possible, and thereby minimize impact to production services and provide the best possible levels of service quality and availability.  The best possible levels of service quality and availability would be that all services operated exactly as designed 100% of the time, and were available and accessible 100% of the time.
168169

169170
Because we accept that a combination of forces within our control, and forces beyond our control, will eventually impact service health, we define Service Level Objectives (SLOs), and Service Level Agreements (SLAs), to describe what degradations in service health are acceptable for various services within Cloudflare’s network.   SLAs and SLOs are expressed as percentages of periods of time (monthly and annually.)
170171

171172
The level of information given about an incident may vary, but the following information must be collected before an incident is classified and prioritized:
172173

173-
* Submitter Source (monitoring alert or alternate source)
174-
* Customer(s) (if applicable)
175-
* System or application (and hostname, if applicable)
176-
* Time of alert
177-
* Scope of impact:  estimated number of systems, users, or regions impacted
178-
* Type of impact:  general scope of service impairment (e.g., loss of all access, degraded performance, dependent applications impacted, observed customer impact)
174+
- Submitter Source (monitoring alert or alternate source)
175+
- Customer(s) (if applicable)
176+
- System or application (and hostname, if applicable)
177+
- Time of alert
178+
- Scope of impact:  estimated number of systems, users, or regions impacted
179+
- Type of impact:  general scope of service impairment (e.g., loss of all access, degraded performance, dependent applications impacted, observed customer impact)
179180

180181
All Incidents which are classified as Problems, regardless of source, which have a priority of P0 or P1, will be logged within the Cloudflare ticketing system, JIRA.  Some alerts will indicate conditions which may not be immediately impacting to service levels, and as necessary, will be categorized as Problems with a P2 or P3 priority.   
181182

@@ -191,31 +192,31 @@ All tickets will be categorized according to the following 4 levels of priority.
191192

192193
**P0**
193194

194-
* Complete loss of access to the Cloudflare application or API.
195-
* Degraded access to the Cloudflare application or API (⪯ 98% as measured worldwide or from any major region).
196-
* Complete loss of access to, or major performance degradation to, a Tier-1 Data Center.
197-
* Degraded performance of any Tier-1 global transit provider (⪰ 20% packet loss worldwide or 30% packet loss from any major region).
198-
* Degraded access to or performance of any critical system.
195+
- Complete loss of access to the Cloudflare application or API.
196+
- Degraded access to the Cloudflare application or API (⪯ 98% as measured worldwide or from any major region).
197+
- Complete loss of access to, or major performance degradation to, a Tier-1 Data Center.
198+
- Degraded performance of any Tier-1 global transit provider (⪰ 20% packet loss worldwide or 30% packet loss from any major region).
199+
- Degraded access to or performance of any critical system.
199200

200201
**P1**
201202

202-
* Intermittent or degraded Site-wide performance degradation.
203-
* Loss of an important function such as reporting.
204-
* Loss of access to the Cloudflare application from one of the social media or external CloudFlare websites
205-
* Outage to important outbound third-party interface.
206-
* Inoperability of the site for one of the enterprise clients or distribution partners.
207-
* Corruption or loss of customer data.
203+
- Intermittent or degraded Site-wide performance degradation.
204+
- Loss of an important function such as reporting.
205+
- Loss of access to the Cloudflare application from one of the social media or external CloudFlare websites
206+
- Outage to important outbound third-party interface.
207+
- Inoperability of the site for one of the enterprise clients or distribution partners.
208+
- Corruption or loss of customer data.
208209

209210
**P2**
210211

211-
* Sporadic or localized performance issue.
212-
* System issues with no noticeable client impact yet (e.g. high CPU).
213-
* Single client outage/degradation.
212+
- Sporadic or localized performance issue.
213+
- System issues with no noticeable client impact yet (e.g. high CPU).
214+
- Single client outage/degradation.
214215

215216
**P3**
216217

217-
* Operational issues, procedural problems or service requests that have little or no effect on end-users and can be handled on an as-available basis.
218-
* The default severity assigned to all tickets that have not yet been reviewed or assigned a severity level.
218+
- Operational issues, procedural problems or service requests that have little or no effect on end-users and can be handled on an as-available basis.
219+
- The default severity assigned to all tickets that have not yet been reviewed or assigned a severity level.
219220

220221
### Category
221222

@@ -235,36 +236,36 @@ P0 and P1 incidents obviously have more impact to the business and therefore, ha
235236

236237
For all P0 and P1 issues, the on-duty Incident Manager should be contacted immediately.  A schedule of incident managers will be posted to ensure that SRE knows who to contact at any given time.  The incident manager is a critical resource responsible for the following:
237238

238-
* Validation of the severity of an issue
239-
* Tracking of the issue from submission to resolution
240-
* Representation of clients’ best interest
241-
* Logging of all actions and times
242-
* Direction of personnel toward the fastest possible resolution
243-
* Ensuring that clients and internal management are notified of status according to pre-determined time periods (or upon change in status)
244-
* Performing client, internal or third-party escalations when time limits are being exceeded or appropriate progress is not being made
245-
* Ensuring that a meaningful explanation is applied to the ticket upon resolution
246-
* Making certain that the initial submitter agrees that the issue is resolved before the ticket is closed 
239+
- Validation of the severity of an issue
240+
- Tracking of the issue from submission to resolution
241+
- Representation of clients’ best interest
242+
- Logging of all actions and times
243+
- Direction of personnel toward the fastest possible resolution
244+
- Ensuring that clients and internal management are notified of status according to pre-determined time periods (or upon change in status)
245+
- Performing client, internal or third-party escalations when time limits are being exceeded or appropriate progress is not being made
246+
- Ensuring that a meaningful explanation is applied to the ticket upon resolution
247+
- Making certain that the initial submitter agrees that the issue is resolved before the ticket is closed 
247248

248-
***
249+
---
249250

250251
## Incident Communications
251252

252253
External communications during an incident are critical for:
253254

254-
* Notifying the stakeholders that Cloudflare is aware of the issue and is pursuing resolution
255-
* Reassuring clients that the matter is under review and that Cloudflare is looking out for their best interests
256-
* Issues do not drag on unnecessarily and appropriate escalations are being made
257-
* Informing key internal stakeholders of important incidents
255+
- Notifying the stakeholders that Cloudflare is aware of the issue and is pursuing resolution
256+
- Reassuring clients that the matter is under review and that Cloudflare is looking out for their best interests
257+
- Issues do not drag on unnecessarily and appropriate escalations are being made
258+
- Informing key internal stakeholders of important incidents
258259

259260
Major types of communications during an incident include:
260261

261-
* [StatusPage](https://www.cloudflarestatus.com/)
262-
* [Support tickets](/support/contacting-cloudflare-support/)
263-
* Incident Reports 
262+
- [StatusPage](https://www.cloudflarestatus.com/)
263+
- [Support tickets](/support/contacting-cloudflare-support/)
264+
- Incident Reports 
264265

265266
Status Page will be created using templates by CSUP team member on-call as soon as an incident is identified.
266267

267-
***
268+
---
268269

269270
## Post-Mortem reviews
270271

@@ -288,7 +289,7 @@ The Incident Report (“IR”) is the primary method of communication to the cli
288289

289290
The person writing the report will vary depending on the severity of the issue and the responsible area.  Upon completion of the draft report, it is critical to ensure that the report is reviewed by Cloudflare management for content, commitments and professional presentation.  Once the report is approved it may be published to the client.
290291

291-
***
292+
---
292293

293294
## Problem review
294295

@@ -298,10 +299,10 @@ The above sections have detailed the handling of the incident and the root cause
298299

299300
The ticket criteria that need to be reported for both open and closed tickets include the following:
300301

301-
* Severity
302-
* Category/Sub-category
303-
* Responsible Group
304-
* Age/Days Open
302+
- Severity
303+
- Category/Sub-category
304+
- Responsible Group
305+
- Age/Days Open
305306

306307
Wherever possible, this data should be reported graphically to show visible trends.  These reports should be published to internal Cloudflare managers and area owners.
307308

@@ -313,6 +314,6 @@ Each area owner for tickets will be responsible for not only ensuring that their
313314

314315
As part of all departmental staff meetings, group managers should be reviewing the ticket open and trending reports with the following objectives:
315316

316-
* Discussion of areas of success or concern
317-
* Review of opportunities for improvement by the area owners
318-
* Agreement on areas that warrant a new Problem ticket to be opened for remediation tracking
317+
- Discussion of areas of success or concern
318+
- Review of opportunities for improvement by the area owners
319+
- Agreement on areas that warrant a new Problem ticket to be opened for remediation tracking

src/content/docs/support/disruptive-maintenance.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22
pcx_content_type: troubleshooting
33
source: https://support.cloudflare.com/hc/en-us/articles/360060050511-Disruptive-Maintenance-Windows
44
title: Disruptive Maintenance
5+
sidebar:
6+
order: 7
57
---
68

79
import { AvailableNotifications, Render } from "~/components";

src/pages/support/ai.astro

Lines changed: 0 additions & 8 deletions
This file was deleted.

0 commit comments

Comments
 (0)