Skip to content

AbhishekDatta/AbhishekDatta

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 

Repository files navigation

Hi, I'm Abhishek ๐Ÿ‘‹

20+ years building resilient systems that don't break when things go wrong. I help engineering teams move faster while keeping production stable, secure, and actually reliable.

Currently leading Chaos Engineering for a major financial services firm, teaching systems to fail gracefully before customers notice.

Want to know more about my career journey? Chat with my AI Alter Ego - it responds exactly like I would, trained on my complete professional background and available 24/7.

What I Do

Chaos Engineering
Breaking things on purpose so they don't break by accident. Reduced high severity incidents by 30% and medium/low incidents by 70% through Game Days and controlled failure injection.

Site Reliability Engineering
Keeping production running when everyone else is asleep. Built monitoring, automation, and incident response systems that cut MTTD by 37% and MTTR by 28%.

Cloud & Infrastructure
15 years deep in AWS, managing everything from a few servers to massive distributed systems. Saved companies $9M+ through smart automation and resource optimization.

DevOps & Platform Engineering
Built CI/CD pipelines that took deployment time from days to minutes. Made developers 3x more productive by removing friction and automating the boring stuff.

AI in Operations
Applying Generative AI, LLMs, and Agentic AI to make incident response smarter and faster. Building chatbots that actually help instead of just looking cool.

Leadership
Mentored teams from 4 to 50+ engineers across the US, India, China, and Australia. I believe in teaching people to solve problems, not just following runbooks.

Real Results

  • Uptime: Maintained five 9's availability (99.999%) for financial services applications handling billions in transactions
  • Speed: Cut release cycle time by 50-80% through Kubernetes, Terraform, and smart automation
  • Cost: Delivered $10.83M in savings across Expedia and Arcesium through intelligent tooling and cloud optimization
  • Resilience: Achieved 100% compliance to RTO/RPO in disaster recovery drills quarter after quarter
  • Security: Zero security incidents across multiple organizations by baking security into developer workflows (SAST, DAST, SCA)
  • Incidents: Reduced production downtime by 20-50% through proactive monitoring and chaos engineering

Tech I Work With

Chaos Engineering: Gremlin (certified), Litmus Chaos, Chaos Monkey, custom Python tools
Cloud: AWS (Solution Architect certified), 15 years of production experience
Containers: Kubernetes, Docker, EKS, ECS, Fargate, Helm, Istio
Infrastructure as Code: Terraform, AWS CloudFormation
CI/CD: Jenkins, ArgoCD, GitOps workflows, MLOps
Monitoring: Datadog, Prometheus, Grafana, CloudWatch, New Relic, Splunk, ELK
Programming: Python (automation, ML, chatbots), Bash
AI/ML: AWS Bedrock, Sagemaker, building LLM powered tools for SRE, AIOps

What I Write About

I share what I learn on Medium, focusing on Chaos Engineering, SRE practices, Generative AI applications in reliability engineering, and making systems more resilient:

Where I've Worked

Morgan Stanley (via Capgemini) - Senior Manager, Chaos Engineering
Leading global team driving resilience across wealth management platforms

RingCentral - Director, DevOps & SRE
Built the foundation for their India market launch, won Best Debut and Best Team awards

Arcesium - Associate Director, Platform SRE
Implemented chaos engineering, disaster recovery, and error budget frameworks for hedge fund systems

Expedia Group - Senior Manager
Led global SRE teams, built automation that won hackathons and saved millions

Plus earlier roles at Guavus, HCL, and startups where I learned how to keep things running with duct tape and determination.

What I'm Looking For

I love working with teams that care about both speed and reliability. If you're building systems that need to stay up when it matters, let's talk.

Open to discussing:

  • Chaos Engineering leadership roles
  • SRE/Platform Engineering positions (Staff+ level)
  • DevOps leadership with focus on reliability
  • Advisory or consulting on resilience engineering

Let's Connect

If you're fighting fires in production or want to make your systems more resilient, I'm happy to chat. I've probably broken and fixed the same thing you're dealing with.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors