-
Notifications
You must be signed in to change notification settings - Fork 2k
feat(new sink): add Apache Doris sink support #23117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Created Jira card for Docs Team review. |
maycmlee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small suggestions
maycmlee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for docs
|
Hi @bingquanzhao, thank you for this PR. Please rebase on master and fix merge conflicts. There are 12k affected lines right now. |
- Update socket2 from 0.5.8 to 0.5.10 - Update sqlx from 0.8.3 to 0.8.6 - Keep mysql support in sqlx features for Doris sink
|
Is this PR process still ongoing? |
|
@bingquanzhao Hi, Thanks for your work! However, I got some warnings when consuming messages from kafka and loading into doris with your version (bingquanzhao@be278bb). Although messages are successfully written(unknown integrity).
And here is my config: data_dir: /tmp/vector/data
sources:
kafka_source:
type: kafka
bootstrap_servers: 100.88.1.4:9092
group_id: vector_consumer111111111
topics:
- test_topic_1
auto_offset_reset: earliest
# 解析JSON消息
decoding:
codec: bytes
sinks:
doris_sink:
type: doris
inputs:
- kafka_source
endpoints:
- http://100.88.1.4:8030
database: test
table: "{{ message_key }}"
auth:
strategy: basic
user: root
password: "root@1234"
encoding:
codec: "text"
only_fields: ["message"]
# 启用批处理以提高性能
batch:
max_events: 1000
timeout_secs: 1
headers:
format: "json"
strip_outer_array: "false"
read_json_by_line: "true"
# 配置请求
request:
concurrency: 1
rate_limit_duration_secs: 1
rate_limit_num: 100
# 配置重试
acknowledgements:
enabled: false
log_request: truePlease ask me for any other information you need! |
This warning is benign and does not affect data integrity. |
thomasqueirozb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay! It was a lot of code 😁
Thank you so much for your review. I will adjust the code according to your review suggestions as quickly as possible. |
|
Hi @thomasqueirozb ,thanks for the review! I've addressed your comments. Let me know if there's anything else I should update. |
thomasqueirozb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Should be good to go realistically. Would only like to talk about the string uri and I'll commit the other required changes myself
|
It actually looks like integration tests are failing when I run |
Hi @thomasqueirozb ,I’ve addressed the type issue with base_url and fixed the integration test issues. |
|
@bingquanzhao Hi, I added header group_commit: async_mode to conf, and got an error from doris "label and group_commit can't be set at the same time". Can you provide a switch to disable label generation? (more efficiency and less integrity) |
I will add a check. When group_commit is set, do not set the label. |
Summary
This PR introduces a new Apache Doris sink for Vector, enabling users to send log data directly to Apache Doris databases using the Stream Load API. The implementation includes:
Apache Doris is a modern MPP analytical database that provides sub-second query response times on large datasets, making it ideal for real-time data warehouses and log analysis scenarios.
Change Type
Is this a breaking change?
How did you test this PR?
Local Testing
cargo testvector validatemake generate-component-docs./scripts/check_changelog_fragments.shTest Configuration Used
Environment Setup
Does this PR include user facing changes?
Notes
Implementation Details
format: Data format specification (json, csv, etc.)read_json_by_line: JSON line-by-line reading modestrip_outer_array: Array handling configurationcolumns: Column mapping specificationDocumentation
CI=true make check-docs)Dependencies
Code Quality
cargo fmtTesting Strategy
References