GSoC Program List #5594

beryl678 · 2025-02-25T12:59:31Z

beryl678
Feb 25, 2025
Collaborator

Introduction

GreptimeDB is an open-source, cloud-native, unified time-series database designed to handle metrics, logs, and events at any scale. It offers real-time insights from edge to cloud, providing a unified storage solution for diverse time-series data. GreptimeDB supports SQL and multiple protocols, enabling seamless integration with various data sources and platforms. Its architecture is optimized for scalability, efficiency, and powerful analytics, making it a robust choice for time-series data management.

GreptimeDB has participated in OSPP open source projects and successfully completed 5 projects. We hope that we can enter the GSoC cooperation list this year and continue to contribute to the open source cause:

Implementing Query-Level Resource Tracking and Quota Enforcement in GreptimeDB

Mentor：Yingwen
Mentor Intro：A Rust open-source software developer currently focusing on the development of the GreptimeDB storage engine. He has also contributed code to high-quality projects such as arrow-rs, OpenDAL, and DataFusion. Outside of work, he is learning photography and driving.
Description: GreptimeDB currently offers logical tenant isolation but lacks mechanisms to monitor and restrict resource consumption per tenant, particularly concerning query execution. This project aims to develop a query-level resource tracking system that monitors CPU and memory usage for each query and terminates any query that exceeds its allocated quota. During the execution stage, queries are optimized into different execution plans, and these metrics can be attached to the plans so GreptimeDB can be aware of the resource consumption.
Project Difficulty: Hard
Project Repository: https://github.com/GreptimeTeam/greptimedb
Project Technical Requirements:

Programming Languages: Proficiency in Rust, the primary language used in GreptimeDB.
Concurrency and Asynchronous Programming: Experience with Rust's async programming model and concurrency patterns.
Database Systems: Understanding of database query execution and resource management.
Performance Monitoring: Familiarity with tools and techniques for monitoring CPU and memory usage in Rust applications.
Testing Frameworks: Experience with testing methodologies and frameworks for Rust applications.
Documentation: Ability to produce clear and comprehensive technical documentation.

Enhancing GreptimeDB Backup with Apache Iceberg Integration

Mentor：Wenkang
Mentor Intro：He is a passionate open-source database developer and an Apache OpenDAL committer, with a focus on distributed systems, performance optimization, and scalable architectures. He also has a collection of Lotso plush toys.
Description: GreptimeDB, a time-series database, currently employs a proprietary data format for backups, limiting its compatibility with existing enterprise data lake platforms. This project aims to develop a new backup method that exports GreptimeDB data into the Apache Iceberg format, an open table format designed for large analytic datasets. This integration will facilitate seamless data management and interoperability with various big data platforms such as Apache Spark and ClickHouse.
Project Difficulty: Medium
Project Repository: https://github.com/GreptimeTeam/greptimedb
Project Technical Requirements:

Programming Languages: Proficiency in Rust, the primary language used in GreptimeDB.
Data Formats: Experience with Apache Iceberg and its data structures.
Database Systems: Understanding of GreptimeDB and its backup mechanism.
Testing Frameworks: Familiarity with testing methodologies and frameworks for database systems.
Documentation: Ability to produce clear and comprehensive technical documentation.

Implement Asynchronous Index Building Mechanism for GreptimeDB

Mentor：Zhenchi
Mentor Intro：He is a developer of GreptimeDB, focusing on architecture design and performance optimization for distributed time-series databases. He appreciates architectures and solutions that prioritize simplicity and elegance.
Description: As GreptimeDB evolves with diverse index types including minmax, inverted index, full-text search, and bloom filter, the current synchronous index building mechanism at SST file level has become a bottleneck for write optimization. The existing approach that couples index construction with flush/compact operations blocks the write pipeline, while the flexible index management capability (via DDL operations modifying region metadata) and query optimizer's intelligent index selection strategies make asynchronous index building feasible. This project will refactor the current index building process to implement an asynchronous mode decoupled from write operations. By introducing double-read operations to separate index construction from data persistence dependencies, it paves the way for advanced features like remote indexer and adaptive index selection based on data distribution.
Project Difficulty: Hard
Project Repository: https://github.com/GreptimeTeam/greptimedb

Arbitrary UDF Execution Framework

Mentor: Ruihang
Mentor Intro: GreptimeDB maintainer, Apache DataFusion PMC member, Arrow Committer, HoraeDB PPMC Member.
Description: User Defined Function (UDF) is an important mechanism for database systems to provide a user-friendly and highly extensible capability. This task proposes implementing a UDF framework including defining, managing, and executing UDFs in GreptimeDB. The UDF backend can be based on Python and WASM for wider adoption. This task requires skills related to Rust, Database Execution Engine, SQL, and related system knowledge.
Project Difficulty: Hard
Project Repository: https://github.com/GreptimeTeam/greptimedb

Conclusion

By participating in the Google Summer of Code 2025 program, contributors will have the opportunity to work on a critical feature that enhances GreptimeDB's resource management capabilities. This project not only addresses a significant need within the GreptimeDB community but also provides contributors with valuable experience in Rust programming, database systems, and performance optimization. We are committed to supporting contributors throughout the development process and look forward to collaborating on this impactful project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Greptime

GSoC Program List #5594

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Greptime

GSoC Program List #5594

Uh oh!

beryl678 Feb 25, 2025 Collaborator

Introduction

Implementing Query-Level Resource Tracking and Quota Enforcement in GreptimeDB

Enhancing GreptimeDB Backup with Apache Iceberg Integration

Implement Asynchronous Index Building Mechanism for GreptimeDB

Arbitrary UDF Execution Framework

Conclusion

Replies: 0 comments

beryl678
Feb 25, 2025
Collaborator