Skip to content
View AbhishekDatta's full-sized avatar

Block or report AbhishekDatta

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
AbhishekDatta/README.md

Hi, I'm Abhishek 👋

20+ years building resilient systems that don't break when things go wrong. I help engineering teams move faster while keeping production stable, secure, and actually reliable.

Currently leading Chaos Engineering for a major financial services firm, teaching systems to fail gracefully before customers notice.

Want to know more about my career journey? Chat with my AI Alter Ego - it responds exactly like I would, trained on my complete professional background and available 24/7.

What I Do

Chaos Engineering
Breaking things on purpose so they don't break by accident. Reduced high severity incidents by 30% and medium/low incidents by 70% through Game Days and controlled failure injection.

Site Reliability Engineering
Keeping production running when everyone else is asleep. Built monitoring, automation, and incident response systems that cut MTTD by 37% and MTTR by 28%.

Cloud & Infrastructure
15 years deep in AWS, managing everything from a few servers to massive distributed systems. Saved companies $9M+ through smart automation and resource optimization.

DevOps & Platform Engineering
Built CI/CD pipelines that took deployment time from days to minutes. Made developers 3x more productive by removing friction and automating the boring stuff.

AI in Operations
Applying Generative AI, LLMs, and Agentic AI to make incident response smarter and faster. Building chatbots that actually help instead of just looking cool.

Leadership
Mentored teams from 4 to 50+ engineers across the US, India, China, and Australia. I believe in teaching people to solve problems, not just following runbooks.

Real Results

  • Uptime: Maintained five 9's availability (99.999%) for financial services applications handling billions in transactions
  • Speed: Cut release cycle time by 50-80% through Kubernetes, Terraform, and smart automation
  • Cost: Delivered $10.83M in savings across Expedia and Arcesium through intelligent tooling and cloud optimization
  • Resilience: Achieved 100% compliance to RTO/RPO in disaster recovery drills quarter after quarter
  • Security: Zero security incidents across multiple organizations by baking security into developer workflows (SAST, DAST, SCA)
  • Incidents: Reduced production downtime by 20-50% through proactive monitoring and chaos engineering

Tech I Work With

Chaos Engineering: Gremlin (certified), Litmus Chaos, Chaos Monkey, custom Python tools
Cloud: AWS (Solution Architect certified), 15 years of production experience
Containers: Kubernetes, Docker, EKS, ECS, Fargate, Helm, Istio
Infrastructure as Code: Terraform, AWS CloudFormation
CI/CD: Jenkins, ArgoCD, GitOps workflows, MLOps
Monitoring: Datadog, Prometheus, Grafana, CloudWatch, New Relic, Splunk, ELK
Programming: Python (automation, ML, chatbots), Bash
AI/ML: AWS Bedrock, Sagemaker, building LLM powered tools for SRE, AIOps

What I Write About

I share what I learn on Medium, focusing on Chaos Engineering, SRE practices, Generative AI applications in reliability engineering, and making systems more resilient:

Where I've Worked

Morgan Stanley (via Capgemini) - Senior Manager, Chaos Engineering
Leading global team driving resilience across wealth management platforms

RingCentral - Director, DevOps & SRE
Built the foundation for their India market launch, won Best Debut and Best Team awards

Arcesium - Associate Director, Platform SRE
Implemented chaos engineering, disaster recovery, and error budget frameworks for hedge fund systems

Expedia Group - Senior Manager
Led global SRE teams, built automation that won hackathons and saved millions

Plus earlier roles at Guavus, HCL, and startups where I learned how to keep things running with duct tape and determination.

What I'm Looking For

I love working with teams that care about both speed and reliability. If you're building systems that need to stay up when it matters, let's talk.

Open to discussing:

  • Chaos Engineering leadership roles
  • SRE/Platform Engineering positions (Staff+ level)
  • DevOps leadership with focus on reliability
  • Advisory or consulting on resilience engineering

Let's Connect

If you're fighting fires in production or want to make your systems more resilient, I'm happy to chat. I've probably broken and fixed the same thing you're dealing with.

Pinned Loading

  1. automated-architecture-discovery automated-architecture-discovery Public

    AI Powered Automated Architecture Discovery System with automatic documentation and drift detection using Claude AI.

    Python 2

  2. ai-career-chatbot ai-career-chatbot Public

    AI-powered career chatbot using OpenAI GPT-4o-mini and Gradio. Interactive assistant for answering questions about professional background, skills, and experience.

    Python 1

  3. environment-realism-toolkit environment-realism-toolkit Public

    Practical toolkit for auditing environment gaps before chaos testing.

    Python 1