Skip to content

Python for Chaos Engineering: Observability During Failure Scenarios #374

@saurabhmi2212

Description

@saurabhmi2212

Talk title

Python for Chaos Engineering: Observability During Failure Scenarios

Short talk description

Chaos Engineering is a discipline that tests system resilience by injecting controlled failures into distributed systems. Python, with its rich ecosystem of libraries, plays a pivotal role in automating chaos experiments and enhancing observability during failure scenarios. This presentation explores how Python can be used to simulate failures in cloud-native applications, monitor system behavior, and collect critical observability data such as logs, metrics, and traces. By integrating Python with tools like OpenTelemetry, Prometheus, and Grafana, developers can gain insights into system weaknesses and improve reliability. The session highlights practical examples of Python scripts for chaos experiments and observability in modern infrastructures.

Long talk description

Chaos Engineering is a discipline that tests system resilience by injecting controlled failures into distributed systems. Python, with its rich ecosystem of libraries, plays a pivotal role in automating chaos experiments and enhancing observability during failure scenarios. This presentation explores how Python can be used to simulate failures in cloud-native applications, monitor system behavior, and collect critical observability data such as logs, metrics, and traces. By integrating Python with tools like OpenTelemetry, Prometheus, and Grafana, developers can gain insights into system weaknesses and improve reliability. The session highlights practical examples of Python scripts for chaos experiments and observability in modern infrastructures.

What format do you have in mind?

Talk (20-25 minutes + Q&A)

Talk outline / Agenda

Introduction (5 min)

What is Chaos Engineering?
Importance of observability during failures.
Role of Python in automating chaos experiments.
Chaos Engineering Basics (10 min)

Principles of Chaos Engineering.
Common failure scenarios in distributed systems.
Overview of Chaos Engineering tools and Python integration.
Observability Overview (10 min)

Three pillars: logs, metrics, and traces.
Tools for observability: OpenTelemetry, Prometheus, Grafana.
Why observability is critical during chaos experiments.
Python for Chaos Engineering (15 min)

Simulating failures with Python (chaoslib, Kubernetes integration).
Automating chaos experiments with Python scripts.
Python for Observability (15 min)

Collecting logs, metrics, and traces with Python.
Visualizing failure impact using Prometheus and Grafana APIs.
Tracing microservices with OpenTelemetry.
Case Study (10 min)

Real-world example: Injecting failures in Kubernetes.
Monitoring system behavior with Python.
Analyzing results to identify weaknesses.
Best Practices & Q&A (10 min)

Best practices for combining Chaos Engineering and observability.
Lessons learned and common pitfalls.
Audience Q&A and closing remarks.

Key takeaways

Python enables automated, observable chaos experiments—letting teams inject failures, collect real-time metrics/logs/traces, and quickly verify system resilience by turning failures into measurable, actionable insights rather than surprises.

What domain would you say your talk falls under?

Core Python

Duration (including Q&A)

40

Prerequisites and preparation

No response

Resources and references

No response

Link to slides/demos (if available)

No response

Twitter/X handle (optional)

No response

LinkedIn profile (optional)

https://www.linkedin.com/in/connectsaurabhmishra/

Profile picture URL (optional)

No response

Speaker bio

Saurabh Mishra is a Cloud Evangelist with a deep passion for cloud architecture, DevOps, and automation. He actively engages with the global tech community, sharing insights on cloud-native technologies, security best practices, and multi-cloud strategies.As an experienced speaker and mentor, Saurabh has delivered sessions at conferences, meetups, and workshops, helping teams accelerate their cloud adoption, modernization, and optimization journeys. His work bridges innovation and practical implementation, empowering organizations to build resilient, scalable, and secure cloud solutions.

Availability

Jan/2026

Accessibility & special requirements

No response

Speaker checklist

  • I have read and understood the PyDelhi guidelines for submitting proposals and giving talks
  • I will make my talk accessible to all attendees and will proactively ask for any accommodations or special requirements I might need
  • I agree to share slides, code snippets, and other materials used during the talk with the community
  • I will follow PyDelhi's Code of Conduct and maintain a welcoming, inclusive environment throughout my participation
  • I understand that PyDelhi meetups are community-centric events focused on learning, knowledge sharing, and networking, and I will respect this ethos by not using this platform for self-promotion or hiring pitches during my presentation, unless explicitly invited to do so by means of a sponsorship or similar arrangement
  • If the talk is recorded by the PyDelhi team, I grant permission to release the video on PyDelhi's YouTube channel under the CC-BY-4.0 license, or a different license of my choosing if I am specifying it in my proposal or with the materials I share

Additional comments

No response

Metadata

Metadata

Assignees

Labels

needs more informationThis proposal needs more information in order for a decision to be made on its acceptanceon holdThis proposal is on hold for organisational reasons, or as requested by the author, or other reasonsproposalWish to present at PyDelhi? This label gets added when the "Talk Proposal" option is chosen.review in progressThis proposal is currently under review

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions