-
Notifications
You must be signed in to change notification settings - Fork 52
Description
Talk title
Python for Chaos Engineering: Observability During Failure Scenarios
Short talk description
Chaos Engineering is a discipline that tests system resilience by injecting controlled failures into distributed systems. Python, with its rich ecosystem of libraries, plays a pivotal role in automating chaos experiments and enhancing observability during failure scenarios. This presentation explores how Python can be used to simulate failures in cloud-native applications, monitor system behavior, and collect critical observability data such as logs, metrics, and traces. By integrating Python with tools like OpenTelemetry, Prometheus, and Grafana, developers can gain insights into system weaknesses and improve reliability. The session highlights practical examples of Python scripts for chaos experiments and observability in modern infrastructures.
Long talk description
Chaos Engineering is a discipline that tests system resilience by injecting controlled failures into distributed systems. Python, with its rich ecosystem of libraries, plays a pivotal role in automating chaos experiments and enhancing observability during failure scenarios. This presentation explores how Python can be used to simulate failures in cloud-native applications, monitor system behavior, and collect critical observability data such as logs, metrics, and traces. By integrating Python with tools like OpenTelemetry, Prometheus, and Grafana, developers can gain insights into system weaknesses and improve reliability. The session highlights practical examples of Python scripts for chaos experiments and observability in modern infrastructures.
What format do you have in mind?
Talk (20-25 minutes + Q&A)
Talk outline / Agenda
Introduction (5 min)
What is Chaos Engineering?
Importance of observability during failures.
Role of Python in automating chaos experiments.
Chaos Engineering Basics (10 min)
Principles of Chaos Engineering.
Common failure scenarios in distributed systems.
Overview of Chaos Engineering tools and Python integration.
Observability Overview (10 min)
Three pillars: logs, metrics, and traces.
Tools for observability: OpenTelemetry, Prometheus, Grafana.
Why observability is critical during chaos experiments.
Python for Chaos Engineering (15 min)
Simulating failures with Python (chaoslib, Kubernetes integration).
Automating chaos experiments with Python scripts.
Python for Observability (15 min)
Collecting logs, metrics, and traces with Python.
Visualizing failure impact using Prometheus and Grafana APIs.
Tracing microservices with OpenTelemetry.
Case Study (10 min)
Real-world example: Injecting failures in Kubernetes.
Monitoring system behavior with Python.
Analyzing results to identify weaknesses.
Best Practices & Q&A (10 min)
Best practices for combining Chaos Engineering and observability.
Lessons learned and common pitfalls.
Audience Q&A and closing remarks.
Key takeaways
Python enables automated, observable chaos experiments—letting teams inject failures, collect real-time metrics/logs/traces, and quickly verify system resilience by turning failures into measurable, actionable insights rather than surprises.
What domain would you say your talk falls under?
Core Python
Duration (including Q&A)
40
Prerequisites and preparation
No response
Resources and references
No response
Link to slides/demos (if available)
No response
Twitter/X handle (optional)
No response
LinkedIn profile (optional)
https://www.linkedin.com/in/connectsaurabhmishra/
Profile picture URL (optional)
No response
Speaker bio
Saurabh Mishra is a Cloud Evangelist with a deep passion for cloud architecture, DevOps, and automation. He actively engages with the global tech community, sharing insights on cloud-native technologies, security best practices, and multi-cloud strategies.As an experienced speaker and mentor, Saurabh has delivered sessions at conferences, meetups, and workshops, helping teams accelerate their cloud adoption, modernization, and optimization journeys. His work bridges innovation and practical implementation, empowering organizations to build resilient, scalable, and secure cloud solutions.
Availability
Jan/2026
Accessibility & special requirements
No response
Speaker checklist
- I have read and understood the PyDelhi guidelines for submitting proposals and giving talks
- I will make my talk accessible to all attendees and will proactively ask for any accommodations or special requirements I might need
- I agree to share slides, code snippets, and other materials used during the talk with the community
- I will follow PyDelhi's Code of Conduct and maintain a welcoming, inclusive environment throughout my participation
- I understand that PyDelhi meetups are community-centric events focused on learning, knowledge sharing, and networking, and I will respect this ethos by not using this platform for self-promotion or hiring pitches during my presentation, unless explicitly invited to do so by means of a sponsorship or similar arrangement
- If the talk is recorded by the PyDelhi team, I grant permission to release the video on PyDelhi's YouTube channel under the CC-BY-4.0 license, or a different license of my choosing if I am specifying it in my proposal or with the materials I share
Additional comments
No response