GSoC Program List #5594
beryl678
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Introduction
GreptimeDB is an open-source, cloud-native, unified time-series database designed to handle metrics, logs, and events at any scale. It offers real-time insights from edge to cloud, providing a unified storage solution for diverse time-series data. GreptimeDB supports SQL and multiple protocols, enabling seamless integration with various data sources and platforms. Its architecture is optimized for scalability, efficiency, and powerful analytics, making it a robust choice for time-series data management.
GreptimeDB has participated in OSPP open source projects and successfully completed 5 projects. We hope that we can enter the GSoC cooperation list this year and continue to contribute to the open source cause:
Implementing Query-Level Resource Tracking and Quota Enforcement in GreptimeDB
Mentor:Yingwen
Mentor Intro:A Rust open-source software developer currently focusing on the development of the GreptimeDB storage engine. He has also contributed code to high-quality projects such as arrow-rs, OpenDAL, and DataFusion. Outside of work, he is learning photography and driving.
Description: GreptimeDB currently offers logical tenant isolation but lacks mechanisms to monitor and restrict resource consumption per tenant, particularly concerning query execution. This project aims to develop a query-level resource tracking system that monitors CPU and memory usage for each query and terminates any query that exceeds its allocated quota. During the execution stage, queries are optimized into different execution plans, and these metrics can be attached to the plans so GreptimeDB can be aware of the resource consumption.
Project Difficulty: Hard
Project Repository: https://github.com/GreptimeTeam/greptimedb
Project Technical Requirements:
Enhancing GreptimeDB Backup with Apache Iceberg Integration
Mentor:Wenkang
Mentor Intro:He is a passionate open-source database developer and an Apache OpenDAL committer, with a focus on distributed systems, performance optimization, and scalable architectures. He also has a collection of Lotso plush toys.
Description: GreptimeDB, a time-series database, currently employs a proprietary data format for backups, limiting its compatibility with existing enterprise data lake platforms. This project aims to develop a new backup method that exports GreptimeDB data into the Apache Iceberg format, an open table format designed for large analytic datasets. This integration will facilitate seamless data management and interoperability with various big data platforms such as Apache Spark and ClickHouse.
Project Difficulty: Medium
Project Repository: https://github.com/GreptimeTeam/greptimedb
Project Technical Requirements:
Implement Asynchronous Index Building Mechanism for GreptimeDB
Mentor:Zhenchi
Mentor Intro:He is a developer of GreptimeDB, focusing on architecture design and performance optimization for distributed time-series databases. He appreciates architectures and solutions that prioritize simplicity and elegance.
Description: As GreptimeDB evolves with diverse index types including minmax, inverted index, full-text search, and bloom filter, the current synchronous index building mechanism at SST file level has become a bottleneck for write optimization. The existing approach that couples index construction with flush/compact operations blocks the write pipeline, while the flexible index management capability (via DDL operations modifying region metadata) and query optimizer's intelligent index selection strategies make asynchronous index building feasible. This project will refactor the current index building process to implement an asynchronous mode decoupled from write operations. By introducing double-read operations to separate index construction from data persistence dependencies, it paves the way for advanced features like remote indexer and adaptive index selection based on data distribution.
Project Difficulty: Hard
Project Repository: https://github.com/GreptimeTeam/greptimedb
Arbitrary UDF Execution Framework
Mentor: Ruihang
Mentor Intro: GreptimeDB maintainer, Apache DataFusion PMC member, Arrow Committer, HoraeDB PPMC Member.
Description: User Defined Function (UDF) is an important mechanism for database systems to provide a user-friendly and highly extensible capability. This task proposes implementing a UDF framework including defining, managing, and executing UDFs in GreptimeDB. The UDF backend can be based on Python and WASM for wider adoption. This task requires skills related to Rust, Database Execution Engine, SQL, and related system knowledge.
Project Difficulty: Hard
Project Repository: https://github.com/GreptimeTeam/greptimedb
Conclusion
By participating in the Google Summer of Code 2025 program, contributors will have the opportunity to work on a critical feature that enhances GreptimeDB's resource management capabilities. This project not only addresses a significant need within the GreptimeDB community but also provides contributors with valuable experience in Rust programming, database systems, and performance optimization. We are committed to supporting contributors throughout the development process and look forward to collaborating on this impactful project.
Beta Was this translation helpful? Give feedback.
All reactions