A collection of my technical talks across conferences and meetups, slide decks and recordings.
| # | Date | Event | Title | Slide Deck | Recording | Description |
|---|---|---|---|---|---|---|
| 1 | 2024-07-06 | Bengaluru Streams Meetup | Batch to Near-Realtime: Inspired by a Real Production Incident | View Slides | Watch Recording | Insights into transitioning from batch to real-time processing. |
| 2 | 2024-02-03 | MyDBOps Open Source Database Meetup | Navigating Transactions: ACID Complexity in Modern Databases | View Slides | Watch Recording | Understanding ACID properties in contemporary databases. |
| 3 | 2023-12-06 | Druid Summit 2023 | Changing Druid Ingestion from 3 Hours to 5 Minutes | View Slides | Watch Recording | Optimizing Druid ingestion processes. |
| 4 | Pulsar Summit Asia 2022 | Streaming Wars: How Apache Pulsar is Acing the Battle | View Slides | Watch Recording | Exploring Apache Pulsar's role in the streaming ecosystem. | |
| 5 | Pulsar Summit Asia 2021 | Designing Pulsar for Isolation | View Slides | Watch Recording | Strategies for isolating workloads in Apache Pulsar. | |
| 6 | ApacheCon 2021 | Structured Data Streaming with Apache Pulsar | View Slides | Watch Recording | Leveraging Apache Pulsar for structured data streaming. | |
| 7 | ApacheCon 2021 | Apache BookKeeper Key-Value Store and Use Cases | View Slides | Watch Recording | Insights into Apache BookKeeper's key-value store capabilities. | |
| 8 | Pulsar NA Summit 2021 | How Pulsar Stores Data | View Slides | Watch Recording | Understanding Apache Pulsar's data storage mechanisms. | |
| 9 | Pulsar Summit Asia | Running a Secure Pulsar Cluster | View Slides | Watch Recording | Best practices for securing Apache Pulsar deployments. | |
| 10 | Pulsar Summit Asia | Lessons from Managing a Pulsar Cluster | View Slides | Watch Recording | Experiences and lessons learned from managing Apache Pulsar clusters. | |
| 11 | FOSSASIA 2015 | MySQL Group Replication | View Slides | N/A | Deep dive into MySQL's group replication features. | |
| 12 | Open Source India 2014 | MySQL High Availability with Replication New Features | View Slides | N/A | Exploring new features in MySQL replication for high availability. | |
| 13 | MySQL Developer Day Conference | MySQL Replication and Scalability | View Slides | N/A | Strategies for scaling MySQL using replication techniques. | |
| 14 | MySQL User Camp | GTIDs in MySQL | View Slides | N/A | Understanding Global Transaction Identifiers in MySQL replication. | |
| 15 | 2023-10-20 | Open Source India, 2023 | Building Blocks of Open Source Databases | View Slides | Watch Recording | Building Blocks of Open Source Databases |
| 16 | 1st Apache Druid Meetup, Bangalore | Apache Druid on kubernetes | View Slides | Watch Recording | Druid on kubernetes | |
| 17 | Bengaluru Streams Meetup | Apache Pulsar - The anatomy of Pub & Sub | View Slides | Watch Recording | Apache Pulsar - The anatomy of Pub & Sub | |
| 18 | Pulsar Summit Asia 2022 | Keeping on top of hybrid cloud usage with Pulsar | View Slides | Watch Recording | Keeping on top of hybrid cloud usage with Pulsar | |
| 19 | 2024-11-21 | Open Source Analytics Conference | Unified Data Management with ClickHouse® and Postgres | View Slides | Watch Recording | Unified Data Management with ClickHouse® and Postgres |
| 20 | 2021-10-08 | EventSourcing Live 2021 | Streaming app changes to Event Store | View Slides | Watch Recording | Streaming Event Changes to App Via Events or CDC, tradeoff and challenges |
| 21 | 2023-06-14 | 1st Apache Pulsar India User Meet | Apache Pulsar Design Choices & use-cases | View Slides | Watch Recording | Design Choices to love in Pulsar Asrchitecture and the Trade-Offs |
| 22 | 2024-05-24 | SNIA Webinar | Navigating Transactions: ACID Complexity in Modern Databases | View Slides | Watch Recording | Understanding ACID properties in contemporary databases. |
| 23 | 2024-03-23 | Clickhouse India Meetup | Clickhouse Bangalore meetup: Doors Open & Community syncup | N/A | Watch Recording | Clickhouse Community Usage Stories and Questionaire |
| 24 | 2024-04-15 | Clickhouse India Webinar | Live Q&A Forum with Database & Cloud Experts | N/A | Watch Recording | Panelist in Clickhouse Live Webinar hosted by Clickhouse Inc for questions left from #24 |
| 25 | 2025-03-06 | PGConf India 2025 | Pushing PostgreSQL to the Limits: Tackling Analytics Workloads with Extensions | View Slides | Watch Recording | Run OLAP benchmarks on postgres, find issues & ideate on how to fix them. Read Abstract for more details |
| 26 | 2025-05-10 | Lakehouse Days Bengaluru | Hacking Iceberg on Your Existing Databases | View Slides | Watch Recording | Hacking Clickhouse & Postgres Open source code to support Iceberg Table Format |
| 27 | 2025-06-27 | Clickhouse Bangalore Meetup | Squeezing Performance: Clickhouse@4GB on K8s | View Slides | N/A | Benchmarking ClickHouse on Low-Memory Kubernetes Environments |
| 28 | 2025-07-12 | Clickhouse Mumbai Meetup | Rebalncing shards in Clickhouse open source | View Slides | N/A | Clickhouse doesn't rebalance shards when a new shard is added. Presented options, open proposals and how we solved it |
| 29 | 2025-08-07 | Kubecon + CLoudNative Con India 2025 | Bridging Big Data and Machine Learning ecosystems : A cloudNative Approach using Kubeflow | View Slides | Watch Recording | In today's data-driven landscape, bridging the gap between scalable big data systems (e.g., Apache Spark, Iceberg) and machine learning frameworks (e.g., PyTorch) while minimizing data movement and serialization overhead is a critical challenge. Traditional workflows require costly data serialization between storage (e.g., Parquet/Iceberg) and training frameworks, creating bottlenecks leading to inefficient resource utilization in distributed training. This talk explores a cloud-native solution using Kubeflow for end-to-end ML orchestration and Apache Arrow for high-performance data interchange, enabling seamless integration of analytics and ML workflows. |
| 30 | 2025-11-05 | Open Source Analytics Conference 2025 | ClickHouse® Chronicles: Real-World War Rooms with Human and AI Agents | View Slides | Watch Recording | We walk you through some of the toughest incidents we’ve faced: what broke, what we thought was wrong, what actually was wrong, and how we got to the root cause. Along the way, we’ll introduce a practical framework for tackling such issues—combining human intuition, AI assistance, and the messy negotiations that often define real-world problem-solving |
- AI Agents: The Future of SaaS Applications?
- Unified Data Platforms (ft. Postgres & Clickhouse)
- Streaming War and How Apache Pulsar is Acing the Battle
- Why Nutanix Beam Selected Apache Pulsar Over Apache Kafka
- MySQL 5.7.6: Introducing Multi-Source Replication
- MySQL 5.7.4: Change Master Without Stopping Slave
- MySQL 5.7.6: It Is Easier to Switch Master Now
- MySQL 5.7: Monitoring Replication with Performance Schema
I’ve had 70+ changes merged upstream across major open-source systems including MySQL, Apache Pulsar, and ClickHouse. What follows is a curated set of examples that reflect the kinds of problems I’ve worked on and the impact of that work.
Between 2012–2013, I worked on MySQL Server at Oracle, primarily in the Replication and Binlog subsystems. I had 50+ changes merged upstream during this period; the items below are a small, representative subset that highlights the kinds of problems I worked on and the impact of that work.
-
Replication observability via Performance Schema (
SHOW SLAVE STATUS)
This work moved replication state from ad-hoc text output into structured Performance Schema tables, making it possible to monitor and reason about replication programmatically.
WorkLog: WL#3656
Key commits: -
Improved replication control and operational workflows
Focused on making replication management less disruptive, particularly around master changes and GTID-based setups.
WorkLog: WL#6120
Key commits:
-
Fixed a worker ID mismatch across replication metadata tables
Addressed inconsistencies betweenSLAVE_WORKER_INFOandREPLICATION_EXECUTE_STATUS_BY_WORKER.
Commit -
Stabilized replication tests during InnoDB crash recovery
Fixed sporadic MTR failures that appeared under crash-recovery scenarios.
Commit -
Corrected
RESET SLAVE ALLbehavior
Ensured all connection parameters inMASTER_INFOare properly reset.
Commit -
Fixed incorrect thread ID reporting in replication Performance Schema tables
Aligned replication P_S tables with internal PFS thread identifiers.
Commit -
Prevented crashes and invalid data in replication worker status tables
Fixed crashes and garbage values inREPLICATION_EXECUTE_STATUS_BY_WORKER.
Commit -
Hardened replication Performance Schema queries
Prevented server crashes when querying replication P_S tables without replication configured.
Commit -
Improved
mysqlbinlogcorrectness and usability -
Reduced flakiness in replication test suite Fixed intermittent failures in
rpl_row_untiland related tests on PB2.
Commit · Commit -
GTID correctness improvements
I’ve contributed to Apache Pulsar across the Java and Python clients, schema handling, and the Pulsar–Flink connector. Most of my work focused on smoothing real-world operational edges: configuration correctness, authentication, schema behavior, and making failures easier to reason about.
-
Made schema version information available in the Python client
Exposed the writer schema version on messages so applications can reason about schema evolution at runtime instead of relying on external metadata.
PR #8173 -
Improved schema handling around incompatible schemas
Enabled incompatible schemas to safely co-exist on a topic, unblocking certain migration and multi-producer use cases.
PR #3840 -
Hardened Avro encoding failure paths
Fixed an edge case where Avro encoding failures could leave cursors in an inconsistent state, leading to hard-to-debug downstream issues.
PR #6695 -
Simplified TLS configuration in the Java client
Made TLS usage derive automatically from the service URL protocol instead of requiring explicit flags, reducing configuration foot-guns.
PR #4451 -
Improved authentication handling in the Java client
Cleaned up how authentication is constructed from class names and parameters, making client configuration more consistent and less error-prone.
PR #4381 -
Fixed and extended authentication support in the Pulsar–Flink connector
Ensured authentication works correctly when building Flink sources and allowed client auth to be configured directly from the connector layer.
PR #4284 · PR #3949 -
Moved Pulsar–Flink configuration to typed POJOs
Replaced loosely-typed configuration maps with explicit config objects, improving validation, readability, and long-term maintainability.
PR #4232 -
Improved error reporting for invalid client configuration
Added explicit errors for out-of-range or invalid configuration values, reducing silent misconfigurations.
PR #3950 -
Clarified and documented schema auto-update behavior
Documented the default schema auto-update strategy (FULL), reducing surprises when teams first adopt schemas in Pulsar.
PR #3842 -
Filled gaps in CLI and user documentation
Added missing documentation for publish-rate related CLI commands and improved general project documentation.
PR #6890 · PR #3841
- Added detailed mark cache eviction metrics (evicted bytes, marks, and files)
For long-running analytical workloads, understanding cache behavior is essential.
This contribution enhanced ClickHouse internals to expose evicted mark cache statistics, making it easier to reason about cache pressure and performance regressions.
Merged Pull Request: ClickHouse/ClickHouse#80799
(Fixes ClickHouse/ClickHouse#60989)