Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
title: Introduction to business uptime with the New Relic platform
tags:
- Observability maturity
- Intelligent observability
- Instrumentation
- Implementation guide
metaDescription: Introduction to business uptime
redirects:
- /docs/new-relic-solutions/observability-maturity
- /docs/full-stack-observability
- /docs/new-relic-solutions/best-practices-guides/full-stack-observability
freshnessValidatedDate: never
---

# Overview

Business uptime is a critical metric for any organization, reflecting the reliability and availability of services that directly impact customer satisfaction and business results. The New Relic Observability platform provides a comprehensive suite of tools and practices to enhance business uptime through improved service delivery. This document outlines a maturity progression model that leverages observability practices to drive business-focused results, specifically targeting business uptime.

# Maturity progression model

The maturity progression model is designed to guide organizations through a structured journey from reactive to proactive and ultimately mastery levels of observability. Each level is characterized by specific practices and metrics that you will find in the related scorecard to help measure and improve business uptime.

## Level 1: Reactive approach

At the reactive level, organizations respond to incidents as they occur, often without prior warning. The focus is on establishing basic alert mechanisms to ensure that issues are detected promptly. The following rules are used to evaluate the effectiveness of a reactive approach:

- ***[Infrastructure Alert Coverage](/docs/new-relic-solutions/observability-maturity/business-uptime/l1-infrastructure-alert-coverage):*** Ensures alert definitions are present for INFRA-HOST or INFRA-KUBERNETES-POD entities. A lack of alerts results in a failure score.
- ***[Service Delivery Alert Coverage](/docs/new-relic-solutions/observability-maturity/business-uptime/l1-service-delivery-alert-coverage):*** Checks for alert definitions on APM-APPLICATION, BROWSER-APPLICATION, MOBILE-APPLICATION, or SYNTH-MONITOR entities. Missing alerts lead to a failure score.
- ***[Critical Alert Coverage](/docs/new-relic-solutions/observability-maturity/business-uptime/l1-critical-alert-coverage):*** Evaluates a 7-day sample of alert incidents per target entity to determine the percentage due to critical versus warning violations.
- ***[Alert Noise](/docs/new-relic-solutions/observability-maturity/business-uptime/l1-alert-noise):*** Assesses incidents over a 7-day period to determine if a specific policy is responsible for more than 14 incidents during that time.

## Level 2: Proactive approach

The proactive level involves anticipating potential issues before they impact business operations. Organizations at this stage use observability practices to continuously improve service delivery. The following rules and metrics are evaluated:

- ***[Service Level Coverage](/docs/new-relic-solutions/observability-maturity/business-uptime/l2-service-level-coverage):*** Assesses whether entities have defined Service Level Indicators (SLIs) during the latest entity harvest. Defined SLIs indicate proactive monitoring.
- ***[Alerts Mean Time To Close](/docs/new-relic-solutions/observability-maturity/business-uptime/l2-alerts-mean-time-to-close):*** Measures the time taken to close incidents, with resolutions under 30 minutes considered successful. This metric reflects the efficiency of incident management processes.
- ***[APM Criticality Tag Coverage](/docs/new-relic-solutions/observability-maturity/business-uptime/l2-apm-criticality-tag-coverage):*** Evaluates the assignment of criticality ratings (low, medium, high) to entities, highlighting their importance for business operations.

## Level 3: Mastery

At the mastery level, organizations achieve direct business benefits from their observability practices, transcending mere incident remediation. The focus is on service level attainment:

- ***[Service Level Attainment](/docs/new-relic-solutions/observability-maturity/business-uptime/l3-service-level-attainment):*** Evaluates the latest service level compliance score for each defined SLI. A success rate above 95% is considered successful, indicating high reliability and uptime.

# Observability practices

Observability practices are the actionable components of the maturity model, enabling organizations to realize the potential value of the New Relic platform. These practices include:

- ***[Alert quality management (AQM)](/docs/new-relic-solutions/observability-maturity/uptime-performance-reliability/aqm-implementation-guide/)***: Reduces alert fatigue by focusing on alerts with true business impact. AQM improves response times and increases awareness of critical events, leading to higher uptime and availability.
- ***[Service level management (SLM)](/docs/new-relic-solutions/observability-maturity/uptime-performance-reliability/slm-implementation-guide/)***: Standardizes data into a universal language, improving communication between IT and business stakeholders. SLM enhances reliability by reducing business-impacting incidents and their duration.

The New Relic Observability platform provides a structured approach to improving business uptime through a maturity progression model. By advancing from reactive to proactive and mastery levels, organizations can achieve significant improvements in service delivery and business results. Observability practices such as AQM and SLM play a crucial role in this journey, ensuring that organizations focus on the right metrics and actions to enhance reliability and uptime.

# Next steps

Organizations are encouraged to explore New Relic's resources and guides to tailor their observability journey according to their specific needs. By leveraging the maturity progression model and observability practices, businesses can unlock the full potential of the New Relic platform and achieve their uptime goals.
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
title: Level 1 - Alert noise scorecard rule
tags:
- Observability maturity
- Intelligent observability
- Instrumentation
- Implementation guide
metaDescription: Observability maturity business uptime alert noise
redirects:
- /docs/new-relic-solutions/observability-maturity
- /docs/full-stack-observability
- /docs/new-relic-solutions/best-practices-guides/full-stack-observability
freshnessValidatedDate: never
---

# Overview

The Alert Noise rule has produced a score based on the number of incidents attributed to specific policies over a 7-day window. This document explains the interpretation of your score and offers guidance on actions you can take to optimize your incident management strategy.

# Description

The score evaluates incidents over a 7-day period to determine if a specific policy is responsible for more than 14 incidents during that time.

# Interpretation

The goal is to maintain a manageable number of incidents that can be effectively addressed. If incidents reoccur from specific policies, the underlying source of instability should be identified and remediated. If an incident isn't suitable for remediation, consider whether it should be classified as a critical violation.

# Actions to Consider

- ***Evaluate Target Cohort:*** Ensure the rule is targeting the correct account and cohort of entity types. Modify the rule to address alert incidents for target entities as needed.

- ***Adjust Incident Occurrence Threshold:*** Review and adjust the number of incident occurrences to align with your expectations and systems management standards. Customers may find more frequent or less frequent incidents appropriate for their alerting strategy.

- ***Develop Long-Term Assessment:*** Assess policy violations over time, considering longer evaluation periods to identify systems indicative of persistent reliability challenges. Create a prioritized list with risk assessments for each system to determine if they could benefit from architectural or implementation improvements.

# Important Considerations

* Custom Evaluation: Remember, these rules and scores are not an exact science. It's crucial to evaluate them based on your specific needs and conditions. Tailor your measurements to align with your systems management standards and best practices.
* Continuous Improvement: Incident management strategies should evolve. Regularly review and adjust your approach to ensure it meets your current requirements.

By understanding your score and taking the recommended actions, you can enhance your policy incident management and ensure it aligns with your broader systems management strategy.
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
title: Level 1 - Critical alert coverage scorecard rule
tags:
- Observability maturity
- Intelligent observability
- Instrumentation
- Implementation guide
metaDescription: Observability maturity business uptime critical alert coverage
redirects:
- /docs/new-relic-solutions/observability-maturity
- /docs/full-stack-observability
- /docs/new-relic-solutions/best-practices-guides/full-stack-observability
freshnessValidatedDate: never
---

# Overview

The Critical Alert Coverage rule has produced a score based on the critical alert coverage of your systems. This document explains the interpretation of your score and offers guidance on actions you can take to enhance your alerting strategy.

# Description

The score evaluates a 7-day sample of alert incidents per target entity to determine what percentage are due to critical versus warning violations.

# Interpretation

An overreliance on critical alert conditions may indicate a lack of progressive alerting and incident response processes. This could lead to alert fatigue and hinder continual improvement in system and service reliability or quality.

It's important to balance your alerts by defining an alerting strategy that includes:

* Immediately Actionable Alerts: These are critical alerts that indicate negative business-impacting events requiring immediate attention.
* Anticipatory Alerts: These alerts signal unexpected conditions that are not immediately business-impacting but may require future adjustments.
* Retrospective Alerts: These alerts are not meant for immediate action but should be evaluated through thoughtful periodic analysis of system behavior.

# Actions to Consider

- ***Evaluate Target Cohort:*** Ensure the rule targets the correct cohort of incidents, focusing on production systems.
- ***Adjust Success Threshold:*** Review and adjust the defined success threshold. The default is 25% of alerts should be critical. If your use of New Relic is primarily for critical alert conditions or if there are other reasons for a higher proportion of critical alerts, adjust accordingly.
- ***Review Alerting Strategy:*** Conduct a broad review of your alerting strategy. Ensure there are well-established expectations for system operations and a progression of alert design that includes anticipatory and retrospective alerting conditions, in addition to those needing immediate attention.

# Important Considerations

* Custom Evaluation: Remember, these rules and scores are not an exact science. It's crucial to evaluate them based on your specific needs and conditions. Tailor your measurements to align with your observability goals.
* Continuous Improvement: Observability strategies should evolve. Regularly review and adjust your approach to ensure it meets your current requirements.

By understanding your score and taking the recommended actions, you can enhance your critical alert coverage and ensure it aligns with your broader monitoring strategy.
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
title: Level 1 - Infrastructure alert coverage scorecard rule
tags:
- Observability maturity
- Intelligent observability
- Instrumentation
- Implementation guide
metaDescription: Observability maturity business uptime infrastructure alert coverage
redirects:
- /docs/new-relic-solutions/observability-maturity
- /docs/full-stack-observability
- /docs/new-relic-solutions/best-practices-guides/full-stack-observability
freshnessValidatedDate: never
---

# Overview

The Infrastructure Alert Coverage rule has produced a score based on the alert coverage of your infrastructure entities. This document explains the interpretation of your score and offers guidance on actions you can take to improve your observability program.

# Description

The score is generated from a rule that checks for alert definitions on your INFRA-HOST or INFRA-KUBERNETES-POD entities. If any of these entities lack a defined alert, the rule will register as a failure.

# Interpretation

A low score in alert coverage may suggest that infrastructure management is not prioritized within your observability strategy, or it may indicate the absence of a standardized approach to infrastructure alerting.

# Actions to Consider

- ***Review Infrastructure Entity Coverage:*** Evaluate your infrastructure entities in relation to your observability goals. Consider entities from cloud integrations, agent extensions, or Prometheus if they play a significant role in your telemetry data. Update the scorecard rule to reflect the unique aspects of your infrastructure.
- ***Adjust Rule Applicability:*** If New Relic is not central to your infrastructure observability, consider disabling or removing the rule.
- ***Refine Rule Query:*** A low score might result from capturing an inappropriate cohort of entities. Modify the NRQL query to focus on production infrastructure entities more accurately.
- ***Develop an Alerting Strategy:*** Optimize the rule for your needs, then review or develop an alerting strategy that includes infrastructure alerting.

# Important Considerations

* Custom Evaluation: Remember, these rules and scores are not an exact science. It's crucial to evaluate them based on your specific needs and conditions. Tailor your measurements to align with your observability goals.
* Continuous Improvement: Observability strategies should evolve. Regularly review and adjust your approach to ensure it meets your current requirements.

By understanding your score and taking the recommended actions, you can enhance your infrastructure observability and ensure it aligns with your broader monitoring strategy.
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
title: Level 1 - Service delivery alert coverage scorecard rule
tags:
- Observability maturity
- Intelligent observability
- Instrumentation
- Implementation guide
metaDescription: Observability maturity business uptime service delivery alert coverage
redirects:
- /docs/new-relic-solutions/observability-maturity
- /docs/full-stack-observability
- /docs/new-relic-solutions/best-practices-guides/full-stack-observability
freshnessValidatedDate: never
---

# Overview

The Service Delivery Alert Coverage rule has produced a score based on the alert coverage of your service delivery entities. This document explains the interpretation of your score and offers guidance on actions you can take to enhance your observability program.

# Description

The score is derived from a rule that checks for alert definitions on your APM-APPLICATION, BROWSER-APPLICATION, MOBILE-APPLICATION, or SYNTH-MONITOR entities. If any of these entities lack a defined alert, the rule will register as a failure.

# Interpretation

A low score in alert coverage may suggest that the management of service entities, such as APM and Browser applications, is not prioritized within your observability strategy.

# Actions to Consider

- ***Review Service Delivery Entity Coverage:*** Assess your service delivery entities in relation to your observability goals. These entities are typically associated with supporting or executing customer business processes. Modify the rule to include entities that align with your service delivery architecture, such as Lambda or Databricks.
- ***Adjust Rule Applicability:*** If New Relic is not central to your service delivery observability, consider disabling or removing the rule.
- ***Refine Rule Query:*** A low score might result from capturing an inappropriate cohort of entities. Modify the NRQL query to focus more accurately on production service delivery entities.
- ***Develop an Alerting Strategy:*** Optimize the rule for your needs, then review or develop an alerting strategy that includes service delivery alerting.

# Important Considerations

* Custom Evaluation: Remember, these rules and scores are not an exact science. It's crucial to evaluate them based on your specific needs and conditions. Tailor your measurements to align with your observability goals.
* Continuous Improvement: Observability strategies should evolve. Regularly review and adjust your approach to ensure it meets your current requirements.

By understanding your score and taking the recommended actions, you can enhance your service delivery observability and ensure it aligns with your broader monitoring strategy.
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
title: Level 2 - Alerts, mean time to close scorecard rule
tags:
- Observability maturity
- Intelligent observability
- Instrumentation
- Implementation guide
metaDescription: Observability maturity business uptime alerts mean time to close
redirects:
- /docs/new-relic-solutions/observability-maturity
- /docs/full-stack-observability
- /docs/new-relic-solutions/best-practices-guides/full-stack-observability
freshnessValidatedDate: never
---

# Overview

The Alerts Mean Time to Close rule has produced a score based on the time taken to close incidents. This document explains the interpretation of your score and offers guidance on actions you can take to optimize your incident management strategy.

# Description

The score evaluates the time taken to close each incident, with those resolved in under 30 minutes considered successful incident resolutions.

# Interpretation

Long-running incident open times, especially those related to specific alert policies and conditions, may indicate sub-optimal detection and resolution processes or volatility in the targeted entity. Consider the following:

* Entity Behavior and Alert Thresholds: Evaluate the behavior of the entity and the alert thresholds intended for it. Aim to improve the alert-to-action incident management procedure.
* Entity Importance: Some entities may not warrant rapid remediation. Consider alternative methods for being informed of unexpected telemetry values from such entities.

# Actions to Consider

- ***Evaluate Target Cohort:*** Determine if the cohort of incidents-to-entities needs modification to exclude entities prone to long-running incident times.
- ***Review Incident Management Practices:*** Assess whether New Relic is capturing the close event accurately. If incident management occurs outside New Relic AIOps/Alerts, the rule logic may need revision. In some cases, disabling or deleting the rule may be more realistic.
- ***Develop Alerting and Incident Management Strategy:*** Ensure you have a well-defined alerting and incident management strategy. If not, engage in an [Alert Quality Management (AQM)](/docs/new-relic-solutions/observability-maturity/uptime-performance-reliability/aqm-implementation-guide/) workshop to introduce the need for a comprehensive, well-documented approach to alerting maintenance, incident management, and regular program review.

# Important Considerations

* Custom Evaluation: Remember, these rules and scores are not an exact science. It's crucial to evaluate them based on your specific needs and conditions. Tailor your incident management strategy to align with your business objectives and operational requirements.
* Continuous Improvement: Incident management strategies should evolve. Regularly review and adjust your approach to ensure it meets your current requirements.

By understanding your score and taking the recommended actions, you can enhance your incident resolution times and ensure they align with your broader business objectives and observability strategy.
Loading
Loading