Skip to content

shiv4289/shiv-tech-talks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

talks

A collection of my technical talks across conferences and meetups, slide decks and recordings.

# Date Event Title Slide Deck Recording Description
1 2024-07-06 Bengaluru Streams Meetup Batch to Near-Realtime: Inspired by a Real Production Incident View Slides Watch Recording Insights into transitioning from batch to real-time processing.
2 2024-02-03 MyDBOps Open Source Database Meetup Navigating Transactions: ACID Complexity in Modern Databases View Slides Watch Recording Understanding ACID properties in contemporary databases.
3 2023-12-06 Druid Summit 2023 Changing Druid Ingestion from 3 Hours to 5 Minutes View Slides Watch Recording Optimizing Druid ingestion processes.
4 Pulsar Summit Asia 2022 Streaming Wars: How Apache Pulsar is Acing the Battle View Slides Watch Recording Exploring Apache Pulsar's role in the streaming ecosystem.
5 Pulsar Summit Asia 2021 Designing Pulsar for Isolation View Slides Watch Recording Strategies for isolating workloads in Apache Pulsar.
6 ApacheCon 2021 Structured Data Streaming with Apache Pulsar View Slides Watch Recording Leveraging Apache Pulsar for structured data streaming.
7 ApacheCon 2021 Apache BookKeeper Key-Value Store and Use Cases View Slides Watch Recording Insights into Apache BookKeeper's key-value store capabilities.
8 Pulsar NA Summit 2021 How Pulsar Stores Data View Slides Watch Recording Understanding Apache Pulsar's data storage mechanisms.
9 Pulsar Summit Asia Running a Secure Pulsar Cluster View Slides Watch Recording Best practices for securing Apache Pulsar deployments.
10 Pulsar Summit Asia Lessons from Managing a Pulsar Cluster View Slides Watch Recording Experiences and lessons learned from managing Apache Pulsar clusters.
11 FOSSASIA 2015 MySQL Group Replication View Slides N/A Deep dive into MySQL's group replication features.
12 Open Source India 2014 MySQL High Availability with Replication New Features View Slides N/A Exploring new features in MySQL replication for high availability.
13 MySQL Developer Day Conference MySQL Replication and Scalability View Slides N/A Strategies for scaling MySQL using replication techniques.
14 MySQL User Camp GTIDs in MySQL View Slides N/A Understanding Global Transaction Identifiers in MySQL replication.
15 2023-10-20 Open Source India, 2023 Building Blocks of Open Source Databases View Slides Watch Recording Building Blocks of Open Source Databases
16 1st Apache Druid Meetup, Bangalore Apache Druid on kubernetes View Slides Watch Recording Druid on kubernetes
17 Bengaluru Streams Meetup Apache Pulsar - The anatomy of Pub & Sub View Slides Watch Recording Apache Pulsar - The anatomy of Pub & Sub
18 Pulsar Summit Asia 2022 Keeping on top of hybrid cloud usage with Pulsar View Slides Watch Recording Keeping on top of hybrid cloud usage with Pulsar
19 2024-11-21 Open Source Analytics Conference Unified Data Management with ClickHouse® and Postgres View Slides Watch Recording Unified Data Management with ClickHouse® and Postgres
20 2021-10-08 EventSourcing Live 2021 Streaming app changes to Event Store View Slides Watch Recording Streaming Event Changes to App Via Events or CDC, tradeoff and challenges
21 2023-06-14 1st Apache Pulsar India User Meet Apache Pulsar Design Choices & use-cases View Slides Watch Recording Design Choices to love in Pulsar Asrchitecture and the Trade-Offs
22 2024-05-24 SNIA Webinar Navigating Transactions: ACID Complexity in Modern Databases View Slides Watch Recording Understanding ACID properties in contemporary databases.
23 2024-03-23 Clickhouse India Meetup Clickhouse Bangalore meetup: Doors Open & Community syncup N/A Watch Recording Clickhouse Community Usage Stories and Questionaire
24 2024-04-15 Clickhouse India Webinar Live Q&A Forum with Database & Cloud Experts N/A Watch Recording Panelist in Clickhouse Live Webinar hosted by Clickhouse Inc for questions left from #24
25 2025-03-06 PGConf India 2025 Pushing PostgreSQL to the Limits: Tackling Analytics Workloads with Extensions View Slides Watch Recording Run OLAP benchmarks on postgres, find issues & ideate on how to fix them. Read Abstract for more details
26 2025-05-10 Lakehouse Days Bengaluru Hacking Iceberg on Your Existing Databases View Slides Watch Recording Hacking Clickhouse & Postgres Open source code to support Iceberg Table Format
27 2025-06-27 Clickhouse Bangalore Meetup Squeezing Performance: Clickhouse@4GB on K8s​ View Slides N/A Benchmarking ClickHouse on Low-Memory Kubernetes Environments​
28 2025-07-12 Clickhouse Mumbai Meetup Rebalncing shards in Clickhouse open source​ View Slides N/A Clickhouse doesn't rebalance shards when a new shard is added. Presented options, open proposals and how we solved it​
29 2025-08-07 Kubecon + CLoudNative Con India 2025 Bridging Big Data and Machine Learning ecosystems : A cloudNative Approach using Kubeflow​ View Slides Watch Recording In today's data-driven landscape, bridging the gap between scalable big data systems (e.g., Apache Spark, Iceberg) and machine learning frameworks (e.g., PyTorch) while minimizing data movement and serialization overhead is a critical challenge. Traditional workflows require costly data serialization between storage (e.g., Parquet/Iceberg) and training frameworks, creating bottlenecks leading to inefficient resource utilization in distributed training. This talk explores a cloud-native solution using Kubeflow for end-to-end ML orchestration and Apache Arrow for high-performance data interchange, enabling seamless integration of analytics and ML workflows.
30 2025-11-05 Open Source Analytics Conference 2025 ClickHouse® Chronicles: Real-World War Rooms with Human and AI Agents​ View Slides Watch Recording We walk you through some of the toughest incidents we’ve faced: what broke, what we thought was wrong, what actually was wrong, and how we got to the root cause. Along the way, we’ll introduce a practical framework for tackling such issues—combining human intuition, AI assistance, and the messy negotiations that often define real-world problem-solving

Blog Posts

  1. AI Agents: The Future of SaaS Applications?
  2. Unified Data Platforms (ft. Postgres & Clickhouse)
  3. Streaming War and How Apache Pulsar is Acing the Battle
  4. Why Nutanix Beam Selected Apache Pulsar Over Apache Kafka
  5. MySQL 5.7.6: Introducing Multi-Source Replication
  6. MySQL 5.7.4: Change Master Without Stopping Slave
  7. MySQL 5.7.6: It Is Easier to Switch Master Now
  8. MySQL 5.7: Monitoring Replication with Performance Schema

Open Source Work — Databases, Messaging, and Observability

I’ve had 70+ changes merged upstream across major open-source systems including MySQL, Apache Pulsar, and ClickHouse. What follows is a curated set of examples that reflect the kinds of problems I’ve worked on and the impact of that work.

MySQL (Oracle) — Replication & Binlog work

Between 2012–2013, I worked on MySQL Server at Oracle, primarily in the Replication and Binlog subsystems. I had 50+ changes merged upstream during this period; the items below are a small, representative subset that highlights the kinds of problems I worked on and the impact of that work.

Features / Enhancements

  1. Replication observability via Performance Schema (SHOW SLAVE STATUS)
    This work moved replication state from ad-hoc text output into structured Performance Schema tables, making it possible to monitor and reason about replication programmatically.
    WorkLog: WL#3656
    Key commits:

  2. Improved replication control and operational workflows
    Focused on making replication management less disruptive, particularly around master changes and GTID-based setups.
    WorkLog: WL#6120
    Key commits:

Bug fixes and correctness work

  1. Fixed a worker ID mismatch across replication metadata tables
    Addressed inconsistencies between SLAVE_WORKER_INFO and REPLICATION_EXECUTE_STATUS_BY_WORKER.
    Commit

  2. Stabilized replication tests during InnoDB crash recovery
    Fixed sporadic MTR failures that appeared under crash-recovery scenarios.
    Commit

  3. Corrected RESET SLAVE ALL behavior
    Ensured all connection parameters in MASTER_INFO are properly reset.
    Commit

  4. Fixed incorrect thread ID reporting in replication Performance Schema tables
    Aligned replication P_S tables with internal PFS thread identifiers.
    Commit

  5. Prevented crashes and invalid data in replication worker status tables
    Fixed crashes and garbage values in REPLICATION_EXECUTE_STATUS_BY_WORKER.
    Commit

  6. Hardened replication Performance Schema queries
    Prevented server crashes when querying replication P_S tables without replication configured.
    Commit

  7. Improved mysqlbinlog correctness and usability

    • Reset byte position counters correctly when switching binlog files
      Commit
    • Made PURGE BINARY LOGS behavior more reliable and informative
      Commit
    • Fixed incorrect data-type decoding and DECIMAL handling in verbose output
      Commit · Commit
  8. Reduced flakiness in replication test suite Fixed intermittent failures in rpl_row_until and related tests on PB2.
    Commit · Commit

  9. GTID correctness improvements

    • Normalized GTID UUID values for consistency
      Commit
    • Fixed SQL_SLAVE_SKIP_COUNTER behavior with GTID_MODE=ON
      Commit

Apache Pulsar — Client, Schema, and Connector work

I’ve contributed to Apache Pulsar across the Java and Python clients, schema handling, and the Pulsar–Flink connector. Most of my work focused on smoothing real-world operational edges: configuration correctness, authentication, schema behavior, and making failures easier to reason about.

Selected contributions

  1. Made schema version information available in the Python client
    Exposed the writer schema version on messages so applications can reason about schema evolution at runtime instead of relying on external metadata.
    PR #8173

  2. Improved schema handling around incompatible schemas
    Enabled incompatible schemas to safely co-exist on a topic, unblocking certain migration and multi-producer use cases.
    PR #3840

  3. Hardened Avro encoding failure paths
    Fixed an edge case where Avro encoding failures could leave cursors in an inconsistent state, leading to hard-to-debug downstream issues.
    PR #6695

  4. Simplified TLS configuration in the Java client
    Made TLS usage derive automatically from the service URL protocol instead of requiring explicit flags, reducing configuration foot-guns.
    PR #4451

  5. Improved authentication handling in the Java client
    Cleaned up how authentication is constructed from class names and parameters, making client configuration more consistent and less error-prone.
    PR #4381

  6. Fixed and extended authentication support in the Pulsar–Flink connector
    Ensured authentication works correctly when building Flink sources and allowed client auth to be configured directly from the connector layer.
    PR #4284 · PR #3949

  7. Moved Pulsar–Flink configuration to typed POJOs
    Replaced loosely-typed configuration maps with explicit config objects, improving validation, readability, and long-term maintainability.
    PR #4232

  8. Improved error reporting for invalid client configuration
    Added explicit errors for out-of-range or invalid configuration values, reducing silent misconfigurations.
    PR #3950

  9. Clarified and documented schema auto-update behavior
    Documented the default schema auto-update strategy (FULL), reducing surprises when teams first adopt schemas in Pulsar.
    PR #3842

  10. Filled gaps in CLI and user documentation
    Added missing documentation for publish-rate related CLI commands and improved general project documentation.
    PR #6890 · PR #3841

ClickHouse — Observability and performance metrics

  1. Added detailed mark cache eviction metrics (evicted bytes, marks, and files)
    For long-running analytical workloads, understanding cache behavior is essential.
    This contribution enhanced ClickHouse internals to expose evicted mark cache statistics, making it easier to reason about cache pressure and performance regressions.
    Merged Pull Request: ClickHouse/ClickHouse#80799
    (Fixes ClickHouse/ClickHouse#60989)

About

Slides for Talks at conferences and meetups

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors