-
Notifications
You must be signed in to change notification settings - Fork 14.9k
Enhance README with improved Kafka description #21124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
Updated the description of Apache Kafka for clarity and added an architecture image.
|
A label of 'needs-attention' was automatically added to this PR in order to raise the |
AndrewJSchofield
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. I have left some comments.
README.md
Outdated
| [**Apache Kafka**](https://kafka.apache.org) is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. | ||
| [**Apache Kafka**](https://kafka.apache.org) is an open source, highly scalable, fault-tolerant, distributed event-streaming platform designed for real-time data ingestion, processing, and distribution. It enables applications to publish, store, and consume continuous streams of records with high throughput and durability, making it a core infrastructure component for building data pipelines, streaming analytics systems, and mission-critical, event-driven architectures. | ||
|
|
||
| Acchitecture of Apache Kafka: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "architecture" not "accitecture".
README.md
Outdated
| [**Apache Kafka**](https://kafka.apache.org) is an open source, highly scalable, fault-tolerant, distributed event-streaming platform designed for real-time data ingestion, processing, and distribution. It enables applications to publish, store, and consume continuous streams of records with high throughput and durability, making it a core infrastructure component for building data pipelines, streaming analytics systems, and mission-critical, event-driven architectures. | ||
|
|
||
| Acchitecture of Apache Kafka: | ||
| <img width="772" height="562" alt="kafka drawio" src="https://github.com/user-attachments/assets/f1dda78b-0826-4408-92ba-6aef364f1b3a" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This image is not helpful. It includes Zookeeper which was removed from Kafka in the 4.0 release. Personally, I would prefer not to include an image in the repo readme file.
|
Fixed the changes requested. Corrected the typo and removed the image according to your preferance |
This commit updates the existing Kafka definition to a more precise, technically grounded, and documentation-ready description. The new version provides clearer context on Kafka’s purpose, core capabilities, and role in modern data systems, improving onboarding for new contributors and enhancing the overall readability of our documentation.
Motivation
The previous definition, while correct, lacked depth and did not fully convey Kafka’s strengths as a distributed event-streaming platform. Clear and accurate documentation is essential for both internal developers and external users evaluating or onboarding to the project. This improvement ensures the definition better reflects Kafka’s architectural guarantees scalability, durability, fault tolerance and aligns with industry-standard terminology.
What’s Changed
Benefits
Helps future contributors by providing clearer context up front.