Skip to content

Commit 9469384

Browse files
committed
improve about me
1 parent ae013af commit 9469384

File tree

1 file changed

+79
-85
lines changed

1 file changed

+79
-85
lines changed

docs/notes/about.md

Lines changed: 79 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -5,28 +5,32 @@ permalink: /about/
55
cover: /images/specht-labs-rounded.png
66
---
77

8+
## Hi, I'm Cedric
89

9-
Hi, I'm Cedric — but most people know me as *cedi*.
10+
but most people know me as *cedi*
11+
12+
<CardGrid>
13+
<ImageCard image="https://avatars.githubusercontent.com/u/1952599?v=4" />
14+
<Card>
1015

1116
I'm a **Senior Site Reliability Engineering and Tech Lead** at [Microsoft Azure](https://github.com/microsoft), working on **distributed systems, chaos engineering, and platform resilience** at scale.
1217
If it's complex, distributed, and needs to stay up — I'm into it.
1318

1419
I specialize in building and maintaining large-scale distributed systems, driving reliability, and leading technical initiatives to improve platform resilience.
1520

16-
<LinkCard title="Download CV" icon="pepicons-pencil:cv" href="/assets/CV_Cedric-Kienzler_2025.pdf"/>
21+
</Card>
22+
</CardGrid>
1723

18-
---
24+
<LinkCard title="Download my CV" icon="pepicons-pencil:cv" href="/assets/CV_Cedric-Kienzler_2025.pdf"/>
1925

20-
### 🔧 What I Do
26+
## What I Do
2127

2228
- Building reliable, large-scale systems with a focus on **resilience, SLOs, and automation**
2329
- Leading teams and setting technical direction in high-stakes, high-scale environments
2430
- Designing chaos experiments, improving release workflows, and modernizing infrastructure
2531
- Evangelizing good SRE practices through talks, docs, and community work
2632

27-
---
28-
29-
### 🛠️ Community & Chaos
33+
## Community & Chaos
3034

3135
I regularly help with infrastructure, planning, and logistics for events like the Chaos Communication Congress.
3236
During the pandemic, I helped build:
@@ -35,25 +39,19 @@ During the pandemic, I helped build:
3539
- *Open Infrastructure* - A collective of people building desperately needed infrastructure for education institutions to keep classes going
3640
- The Kubernetes stack powering the [rC3 - NOW HERE](https://rc3.world/2021/) virtual workd
3741

38-
---
39-
40-
### 🧪 What I Tinker With
42+
## What I Tinker With
4143

4244
- Home lab with **Raspberry Pi K3s cluster**, **CEPH storage**, and a **Stratum 1 NTP/PTP time server**
4345
- **Cluster API** managed cloud Kubernetes cluster running a full **Grafana LGTM Stack**
4446
- Kernel recompilation just for fun (and for weird hardware drivers)
4547
- Low-level **distributed systems** algorithms to explore gossip and consensus protocols
4648

47-
---
48-
49-
### 📸 When Not Writing YAML
49+
## When Not Writing YAML
5050

5151
I’m also a hobbyist **analog photographer** with a small collection of 35mm and medium format cameras (Leica M6, Hasselblad 500 c/m, Canon A1).
5252
I develop film at home and have a tiny darkroom with a 35mm enlarger.
5353

54-
---
55-
56-
### 💡 Things I Believe in
54+
## Things I Believe in
5755

5856
- Be excellent to each other 🤝
5957
- Focus on fundamentals > chasing hype
@@ -63,9 +61,7 @@ I develop film at home and have a tiny darkroom with a 35mm enlarger.
6361
- [How Complex Systems Fail](https://how.complexsystems.fail) is required reading
6462
- Your beloved system architecture exists mostly **in your head** and ~~behaves~~ fails differently than you'd expect. (See the [Above the line/below the line](https://snafucatchers.github.io/#2_3_The_above-the-line/below-the-line_framework) framework)
6563

66-
---
67-
68-
### ☕ Let’s Connect
64+
## Let’s Connect
6965

7066
[![GitHub Specht Labs](https://img.shields.io/badge/SpechtLabs-008080?style=for-the-badge&logo=github&logoColor=white)](https://github.com/specht-labs)
7167
[![Mastodon](https://img.shields.io/badge/Mastodon-6364FF?style=for-the-badge&logo=mastodon&logoColor=white)](http://hachyderm.io/@cedi)
@@ -74,6 +70,70 @@ I develop film at home and have a tiny darkroom with a 35mm enlarger.
7470

7571
## Experiences
7672

73+
::: timeline
74+
75+
- Senior Site Reliability Engineer / Technical Lead - Microsoft Azure
76+
time=02/2022 to 05/2025
77+
78+
Technical Lead in Azure’s Safe Change infrastructure SRE Team, responsible for chaos engineering, resiliency validation, and release infrastructure Harmonisation: Led the modernisation of Azure's release infrastructure, migrating 60+ repositories and 600+ pipelines, increasing deployment reliability and speed across multiple critical customer-facing services including, among others, Azure Cosmos DB, Log Analytics, Web Apps & Function Apps.
79+
80+
- **Platorm Engineering & DevOps Expertise:** Developed Platform tooling improvements to streamline engineering workflows and improve developer experience and led shift-left initiatives, integrating early validation mechanisms to catch issues earlier in the development lifecycle.
81+
- **Chaos Engineering & Resilience Validation:** Designed and implemented Chaos Engineering experiments to validate system failure hypotheses covering 80% of high-impact critical customer scenarios and improve resilience strategies and built synthetic monitoring and business validation testing to proactively identify and mitigate reliability risks.
82+
- **Organised multiple internal learning sessions**, developing a 9-part self-guided onboarding tutorial as part of the SRE Academy, enabling new engineers to onboard 75% faster to the new release system.
83+
- **Leadership and Team Management:** Technical lead & Scrum Master for my immediate team of 5 engineers, responsible for setting technical direction, mentoring, and defining strategies and goals for the team as well as the broader department, serving as the technical lead for a newly formed team within the Safe Change Infrastructure SRE organisation, and supporting to multiple program managers and teams from across three other SRE organisations in bootstrapping new SRE engagements.
84+
- **Cross-Org Collaboration and Stakeholder Engagement:** Partnered with 10+ service teams across Azure to help them migrate to the new release system, contributing high-quality pull requests to their repositories as best-practice examples driving down change related outages by 20%.
85+
- **SRE Best Practices an Knowledge Sharing:** Core Contributor & Commimee Member for the Azure SRE Playbook, authoring a new SRE patterns with 3 sub-patterns and overseeing the review and integration of 3 additional major patterns.
86+
- **Technical Evangelism & Internal Training:** Speaker at Azure SRE Tech Talks, delivering sessions on reliability,
87+
deployment strategies, and Platform engineering.
88+
- **Maintained and expanded the Azure SRE Wiki**, working across all SRE organisations to standardise and
89+
document operational excellence.
90+
- **Recognition & Awards:** Azure Reliability Quality Star – Leadership Excellence Award for sustained high-
91+
quality contributions to Azure’s engineering culture and reliability improvements
92+
93+
- Tech-lead Manager Kubernetes SRE - German Edge Cloud
94+
time=07/2020 to 01/2022
95+
96+
Built and led the Kubernetes SRE Team: Established and scaled a remote team from two to 6 highly skilled SREs, taking full ownership of the company’s Managed Kubernetes Platform, spread across 3 availability zones and hosting 50+ customer clusters. Ensured only high-quality changes made it into production by reviewing code, design documents, and architecture changes daily, implementing state of the art GitOps tooling and observability, resulting in a 75% reduction in change related outages over 12 months.
97+
98+
- **Incident & Change Management:** Developed and implemented new incident, change, and problem management processes, improving reliability and operational efficiency, enabling an average 10 minute time-to-engage and reducing time-to-mitigation by several hours on average through more streamlined and efficient incident management processes and standard operating procedures.
99+
- **Cross-Functional Collaboration:** Worked closely with the Service Management team to improve incident response, change reviews, and operational excellence as well as the Infrastructure, OpenStack, and CEPH Storage teams, ensuring seamless integration and optimised performance across compute, storage, and networking layers resulting in 10% increased storage throughput and decrease in etcd commit latencies driving customer satisfaction.
100+
- **Platform & Product Leadership:** Took on the Product and Platform Owner role, shielding the team from unnecessary business complexity while aligning priorities with company strategy and CTO directives.
101+
- **Financial Oversight & Cost Optimisation:** Managed the budget for the Managed Kubernetes Service, including forecasting infrastructure costs and collaborating with accounting on financial planning.
102+
- **Sales & Pricing Strategy:** Worked with Sales and Finance leadership to define a competitive pricing structure for the Kubernetes offering.
103+
- **Cloud-Native & Open Source Advocacy:** Fostered a culture of open-source collaboration, contributing improvements back to the cloud-native community and positioning the company’s offerings within CNCF certification programs.
104+
105+
- Site Reliability Engineer 2, SharePoint Online - Microsoft
106+
time=01/2019 to 03/2020
107+
108+
Running Live-Site operations for one of the largest M365 services with over 200 million monthly active users and over 1 exabyte of data, including incident response and management, rapidly diagnosing and resolving critical issues to maintain SharePoint Online’s 99,99% SLA.
109+
110+
- **Disaster Recovery & Infrastructure Modernisation:** Led an initiative to improve disaster recovery playbooks using a more resilient storage solution, ensuring recovery procedures remained accessible even during blackout scenarios.
111+
- **Onboarding & Global Expansion:** Played a key role in onboarding and training a new SRE team in China, enabling 24/7 follow-the-sun operations.
112+
- **Community & Knowledge Sharing:** Organised meet-ups for MicrosoK Ireland’s Open Source Club
113+
114+
- Software Engineer, Network Security - Sophos
115+
time=01/2017 to 12/2018
116+
117+
- **Network Security & Threat Detection:** Worked on the Synchronised Security Engine, significantly improving network threat detec;on rates compared to competing vendors.
118+
- **IPSec & Network Protocol Implementation:** Worked on the implementation of IPsec IKEv2 in the Linux Kernel for the Firewall Appliance
119+
- Scalability & Load Testing: Implemented extensive firewall load testing using the Ixia BreakingPoint plaAorm, ensuring performance under high traffic loads. Developed custom load-testing frameworks with Python Mininet SDN, simulating concurrent user traffic.
120+
- **Testing & Release Acceleration:** Expanded the integration test suite for firewall products, leading to faster and more reliable release cycles.
121+
122+
- Software Engineer - MARKANT Handels and Service GmbH
123+
time=08/2015 to 12/2016
124+
125+
- **Infrastructure Modernisation:** Led a department-wide initiative migrating from CVS to Git, upgrading IDE versions, and implementing a CI/CD pipeline for improved development workflows, increasing deployment velocity from once a week to multiple times a day.
126+
- **Operational Support Tooling:** Built custom tools to assist operations teams, enhancing incident response times in highly time-sensitive trading systems.
127+
- **Mentorship & Training:** Trained apprentices and junior engineers in soKware architecture, clean code principles, and design pamerns.
128+
129+
- Junior Software Engineer, Streit Datentechnik GmbH
130+
time=09/2012 to 07/2015
131+
132+
- **Software Development:** Learned MS Visual C++, C# .NET, MS T-SQL, MFC, and the Win32 API, broadening problem-solving capabilities across multiple technologies.
133+
- **Reverse Engineering & Analysis:** Developed a disassembler to read dependencies from Windows-PE and C# executables for debugging and system analysis.
134+
135+
:::
136+
77137
::: tabs
78138

79139
@tab Experience
@@ -160,69 +220,3 @@ I develop film at home and have a tiny darkroom with a 35mm enlarger.
160220
- Tailscale
161221
- mininet
162222
:::
163-
164-
---
165-
166-
::: timeline
167-
168-
- Senior Site Reliability Engineer / Technical Lead - Microsoft Azure
169-
time=02/2022 to 05/2025
170-
171-
Technical Lead in Azure’s Safe Change infrastructure SRE Team, responsible for chaos engineering, resiliency validation, and release infrastructure Harmonisation: Led the modernisation of Azure's release infrastructure, migrating 60+ repositories and 600+ pipelines, increasing deployment reliability and speed across multiple critical customer-facing services including, among others, Azure Cosmos DB, Log Analytics, Web Apps & Function Apps.
172-
173-
- **Platorm Engineering & DevOps Expertise:** Developed Platform tooling improvements to streamline engineering workflows and improve developer experience and led shift-left initiatives, integrating early validation mechanisms to catch issues earlier in the development lifecycle.
174-
- **Chaos Engineering & Resilience Validation:** Designed and implemented Chaos Engineering experiments to validate system failure hypotheses covering 80% of high-impact critical customer scenarios and improve resilience strategies and built synthetic monitoring and business validation testing to proactively identify and mitigate reliability risks.
175-
- **Organised multiple internal learning sessions**, developing a 9-part self-guided onboarding tutorial as part of the SRE Academy, enabling new engineers to onboard 75% faster to the new release system.
176-
- **Leadership and Team Management:** Technical lead & Scrum Master for my immediate team of 5 engineers, responsible for setting technical direction, mentoring, and defining strategies and goals for the team as well as the broader department, serving as the technical lead for a newly formed team within the Safe Change Infrastructure SRE organisation, and supporting to multiple program managers and teams from across three other SRE organisations in bootstrapping new SRE engagements.
177-
- **Cross-Org Collaboration and Stakeholder Engagement:** Partnered with 10+ service teams across Azure to help them migrate to the new release system, contributing high-quality pull requests to their repositories as best-practice examples driving down change related outages by 20%.
178-
- **SRE Best Practices an Knowledge Sharing:** Core Contributor & Commimee Member for the Azure SRE Playbook, authoring a new SRE patterns with 3 sub-patterns and overseeing the review and integration of 3 additional major patterns.
179-
- **Technical Evangelism & Internal Training:** Speaker at Azure SRE Tech Talks, delivering sessions on reliability,
180-
deployment strategies, and Platform engineering.
181-
- **Maintained and expanded the Azure SRE Wiki**, working across all SRE organisations to standardise and
182-
document operational excellence.
183-
- **Recognition & Awards:** Azure Reliability Quality Star – Leadership Excellence Award for sustained high-
184-
quality contributions to Azure’s engineering culture and reliability improvements
185-
186-
- Tech-lead Manager Kubernetes SRE - German Edge Cloud
187-
time=07/2020 to 01/2022
188-
189-
Built and led the Kubernetes SRE Team: Established and scaled a remote team from two to 6 highly skilled SREs, taking full ownership of the company’s Managed Kubernetes Platform, spread across 3 availability zones and hosting 50+ customer clusters. Ensured only high-quality changes made it into production by reviewing code, design documents, and architecture changes daily, implementing state of the art GitOps tooling and observability, resulting in a 75% reduction in change related outages over 12 months.
190-
191-
- **Incident & Change Management:** Developed and implemented new incident, change, and problem management processes, improving reliability and operational efficiency, enabling an average 10 minute time-to-engage and reducing time-to-mitigation by several hours on average through more streamlined and efficient incident management processes and standard operating procedures.
192-
- **Cross-Functional Collaboration:** Worked closely with the Service Management team to improve incident response, change reviews, and operational excellence as well as the Infrastructure, OpenStack, and CEPH Storage teams, ensuring seamless integration and optimised performance across compute, storage, and networking layers resulting in 10% increased storage throughput and decrease in etcd commit latencies driving customer satisfaction.
193-
- **Platform & Product Leadership:** Took on the Product and Platform Owner role, shielding the team from unnecessary business complexity while aligning priorities with company strategy and CTO directives.
194-
- **Financial Oversight & Cost Optimisation:** Managed the budget for the Managed Kubernetes Service, including forecasting infrastructure costs and collaborating with accounting on financial planning.
195-
- **Sales & Pricing Strategy:** Worked with Sales and Finance leadership to define a competitive pricing structure for the Kubernetes offering.
196-
- **Cloud-Native & Open Source Advocacy:** Fostered a culture of open-source collaboration, contributing improvements back to the cloud-native community and positioning the company’s offerings within CNCF certification programs.
197-
198-
- Site Reliability Engineer 2, SharePoint Online - Microsoft
199-
time=01/2019 to 03/2020
200-
201-
Running Live-Site operations for one of the largest M365 services with over 200 million monthly active users and over 1 exabyte of data, including incident response and management, rapidly diagnosing and resolving critical issues to maintain SharePoint Online’s 99,99% SLA.
202-
203-
- **Disaster Recovery & Infrastructure Modernisation:** Led an initiative to improve disaster recovery playbooks using a more resilient storage solution, ensuring recovery procedures remained accessible even during blackout scenarios.
204-
- **Onboarding & Global Expansion:** Played a key role in onboarding and training a new SRE team in China, enabling 24/7 follow-the-sun operations.
205-
- **Community & Knowledge Sharing:** Organised meet-ups for MicrosoK Ireland’s Open Source Club
206-
207-
- Software Engineer, Network Security - Sophos
208-
time=01/2017 to 12/2018
209-
210-
- **Network Security & Threat Detection:** Worked on the Synchronised Security Engine, significantly improving network threat detec;on rates compared to competing vendors.
211-
- **IPSec & Network Protocol Implementation:** Worked on the implementation of IPsec IKEv2 in the Linux Kernel for the Firewall Appliance
212-
- Scalability & Load Testing: Implemented extensive firewall load testing using the Ixia BreakingPoint plaAorm, ensuring performance under high traffic loads. Developed custom load-testing frameworks with Python Mininet SDN, simulating concurrent user traffic.
213-
- **Testing & Release Acceleration:** Expanded the integration test suite for firewall products, leading to faster and more reliable release cycles.
214-
215-
- Software Engineer - MARKANT Handels and Service GmbH
216-
time=08/2015 to 12/2016
217-
218-
- **Infrastructure Modernisation:** Led a department-wide initiative migrating from CVS to Git, upgrading IDE versions, and implementing a CI/CD pipeline for improved development workflows, increasing deployment velocity from once a week to multiple times a day.
219-
- **Operational Support Tooling:** Built custom tools to assist operations teams, enhancing incident response times in highly time-sensitive trading systems.
220-
- **Mentorship & Training:** Trained apprentices and junior engineers in soKware architecture, clean code principles, and design pamerns.
221-
222-
- Junior Software Engineer, Streit Datentechnik GmbH
223-
time=09/2012 to 07/2015
224-
225-
- **Software Development:** Learned MS Visual C++, C# .NET, MS T-SQL, MFC, and the Win32 API, broadening problem-solving capabilities across multiple technologies.
226-
- **Reverse Engineering & Analysis:** Developed a disassembler to read dependencies from Windows-PE and C# executables for debugging and system analysis.
227-
228-
:::

0 commit comments

Comments
 (0)