Skip to content

Commit 2dfe500

Browse files
Update sre-function.md
typo
1 parent a7b278f commit 2dfe500

File tree

1 file changed

+14
-14
lines changed
  • manageability-and-operations/operations-advisory/operating-model

1 file changed

+14
-14
lines changed

manageability-and-operations/operations-advisory/operating-model/sre-function.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -8,24 +8,24 @@
88

99
# SRE Function in Cloud Operating Model
1010
When defining the Cloud Operating Model, the Site Reliability Engineering (SRE) function embodies the core of Cloud Operations.
11-
SRE team will have a size of a minimun of 8 engineers for operations and on-call duties. There are several theories around the ideal ratio of SREs vs Developers. The truth is the magic number will change as the organization, and workloads, evolve and mature.
12-
The more automation and AiOps are leveraged the less repetitive tasks and manual intervetion will be needed, allowing SRE team members to focus on the real egineering part.
11+
The SRE team will have a size of a minimum of 8 engineers for operations and on-call duties. There are several theories around the ideal ratio of SREs vs Developers. The truth is the magic number will change as the organization, and workloads, evolve and mature.
12+
The more automation and AIOps are leveraged the less repetitive tasks and manual intervention will be needed, allowing SRE team members to focus on the real engineering part.
1313

1414

1515
# SRE Role in Day-2 Operations
1616

17-
**SRE** funtion encompasses reliability concepts into DevOps, focusing on designing and implementing highly scalable and resilient systems, addressing automatically potential and in-progress issues. In other words, each service can run an repair itself, extending the concept of 'autonomous' to virtually any service.
17+
**SRE** function encompasses reliability concepts in DevOps, focusing on designing and implementing highly scalable and resilient systems, addressing automatically potential and in-progress issues. In other words, each service can run and repair itself, extending the concept of 'autonomous' to virtually any service.
1818
[SRE on Git](https://github.com/dastergon/awesome-sre?tab=readme-ov-file#sre-tools).
1919

2020
**DevOps** is more a philosophy, than a function, focusing on streamlining development and deployment processes, increasing the speed at which new features are delivered. Tasks in development - Dev- and operations - Ops - are part of a continuous loop that includes building, deploying, testing, and monitoring applications and services.
21-
To achieve this, [DevOps](https://docs.oracle.com/en-us/iaas/Content/GSG/Reference/getting-started-as-devops.htm) relies on methodologies, such as CI/CD, Agile Development and automation.
21+
To achieve this, [DevOps](https://docs.oracle.com/en-us/iaas/Content/GSG/Reference/getting-started-as-devops.htm) relies on methodologies, such as CI/CD, Agile Development, and automation.
2222

2323
[OCI DevOps Service](https://docs.public.oneportal.content.oci.oraclecloud.com/en-us/iaas/Content/devops/using/devops_overview.htm)
24-
provides a powerful end-to-end platform for your DevOps practice, including private Git repositories as well as connection capability to GitHub, GitLab and other external repos.
24+
provides a powerful end-to-end platform for your DevOps practice, including private Git repositories as well as connection capability to GitHub, GitLab, and other external repositories.
2525

2626
1. Adopt a version control system in the form of a single repository.
2727

28-
2. Automate building, testing and deployment.
28+
2. Automate building, testing, and deployment.
2929

3030
3. Exploit IaC.
3131

@@ -35,29 +35,29 @@ provides a powerful end-to-end platform for your DevOps practice, including priv
3535

3636
# SRE Best Practises and OCI Support for them
3737

38-
1. Define Service Level Objectives: these should be identified based on relevance to the business. Each organization will need to think it through, and most likely will define SLOs based on 'internal' SLOs, like resource utilization and response time, and 'end-users' SLOs like availabilty and end-user experience. They could also be device-dependent (like a Mobile Apps adoption or availability).
38+
1. Define Service Level Objectives: these should be identified based on relevance to the business. Each organization will need to think it through, and most likely will define SLOs based on 'internal' SLOs, like resource utilization and response time, and 'end-user' SLOs like availability and end-user experience. They could also be device-dependent (like a Mobile Apps adoption or availability).
3939

40-
2. Unify the Observability platform, possibly with native SLOs features. OCI provides the capability to define thresholds and [custom metrics](https://docs.oracle.com/en-us/iaas/Content/Monitoring/Tasks/publishingcustommetrics.htm) to achieve this. Besides, available plug-ins and APIs, can expose the same metrics available to external tools such as [Grafana](https://grafana.com/grafana/plugins/oci-metrics-datasource/).
40+
2. Unify the Observability platform, possibly with native SLO features. OCI provides the capability to define thresholds and [custom metrics](https://docs.oracle.com/en-us/iaas/Content/Monitoring/Tasks/publishingcustommetrics.htm) to achieve this. Besides, available plug-ins and APIs can expose the same metrics available to external tools such as [Grafana](https://grafana.com/grafana/plugins/oci-metrics-datasource/).
4141

42-
3. Define granularity and frequency (resolution) of metrics collection based on architecture, usefulness and related effort/cost per metric. Review these parameters as your architecture evolves over time.
42+
3. Define granularity and frequency (resolution) of metrics collection based on architecture, usefulness, and related effort/cost per metric. Review these parameters as your architecture evolves over time.
4343

4444
4. Implement Alerting tools for quick detection of potential issues. With [OCI Notifications](https://docs.oracle.com/en-us/iaas/Content/Notification/Concepts/notificationoverview.htm), you can easily detect and be notified in human-readable format, when something happens in OCI. Keep Alerts definition and triggering as simple as possible.
4545

46-
5. Leverage Automation. EaC -Everything as a Code- and Ansible support SRE work throught the entire lifecycle management, from provisioning to configuration changes. Ansible playbooks promote consistency and idempotency, for repetitive tasks as well as rollback when needed.
46+
5. Leverage Automation. EaC -Everything as a Code- and Ansible support SRE work through the entire lifecycle management, from provisioning to configuration changes. Ansible playbooks promote consistency and idempotency, for repetitive tasks as well as rollback when needed.
4747

48-
6. Use 'canary deployments' approach to minimize effects on a limited number of users and for early detection of defects. Select metrics, canary population and duration depending on the
48+
6. Use the 'canary deployments' approach to minimize effects on a limited number of users and for early detection of defects. Select metrics, canary population, and duration depending on the
4949

5050
7. Automate remediation mechanisms. Once Notifications are implemented, automation can be easily achieved via [Functions](https://docs.oracle.com/en-us/iaas/Content/Notification/Concepts/notificationoverview.htm#automation). Examples may be from filing Jira Tickets to resizing VMs and many more.
5151

5252
8. Unify the ticketing platform: OCI gives the chance to integrate MyOracleSupport with your ticketing system via [Support Management APIs](https://docs.oracle.com/en-us/iaas/api/#/en/incidentmanagement/20181231/).
5353

54-
9. Define After Action Review Process (AAR) and post-mortem analysis.
54+
9. Define the After Action Review Process (AAR) and post-mortem analysis.
5555

5656
10. Plan for Capacity. OCI offers a powerful tool to help you with forecasting your capacity needs via [Operations Insight](https://docs.oracle.com/en-us/iaas/operations-insights/doc/capacity-planning.html#GUID-B2A3E104-494B-46A5-9F3E-8E3977C9328F).
5757

58-
11. Avoid proliferation of tools and maximize integrations among those used.
58+
11. Avoid the proliferation of tools and maximize integrations among those used.
5959

60-
12. Document standards, processes and tools.
60+
12. Document standards, processes, and tools.
6161

6262
13. Evolve your SRE ecosystem along your environment lifecycle.
6363

0 commit comments

Comments
 (0)