Skip to content

Conversation

@benjamin-awd
Copy link
Contributor

Summary

This PR adds support for the ArrowStream format to Clickhouse.

Vector configuration

 clickhouse:
    type: clickhouse
    inputs: [normalize]
    database: mydatabase
    table: logs
    endpoint: ${CLICKHOUSE_HOST}
    format: arrow_stream
    batch_encoding:
      codec: arrow_stream
      allow_nullable_fields: true

How did you test this PR?

Ran against local Clickhouse instance, integration tests

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details here.

@benjamin-awd benjamin-awd requested a review from a team as a code owner December 12, 2025 14:55
@github-actions github-actions bot added the domain: sinks Anything related to the Vector's sinks label Dec 12, 2025
@benjamin-awd benjamin-awd requested a review from a team as a code owner December 12, 2025 15:16
@github-actions github-actions bot added the domain: external docs Anything related to Vector's external, public documentation label Dec 12, 2025
@drichards-87 drichards-87 self-assigned this Dec 12, 2025
@drichards-87 drichards-87 removed their assignment Dec 12, 2025
Copy link
Contributor

@thomasqueirozb thomasqueirozb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All my comments here are regarding improvements. This is very nice work and should work as is, thanks a lot!

@benjamin-awd benjamin-awd force-pushed the ch-sink-arrow branch 2 times, most recently from e678a00 to db3297d Compare December 13, 2025 08:10
@akutta
Copy link
Contributor

akutta commented Dec 15, 2025

2025-12-15T13:37:08.641679Z ERROR vector::topology::builder: Configuration error. error=Sink "clickhouse": Failed to fetch schema for logs.logs: Failed to fetch schema from provider: Failed to convert column 'other_attributes': Unknown ClickHouse type 'JSON'. This type cannot be automatically converted.. internal_log_rate_limit=false

Per the docs; this is a limitation of the Arrow format:

Unsupported Arrow data types:

FIXED_SIZE_BINARY
JSON
UUID
ENUM.

So it's not fully backwards compatible with the JSON encoder. Would make sense to call this out in the documentation.

Copy link
Contributor

@thomasqueirozb thomasqueirozb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Only a couple of small nits and small improvements but this should be pretty much good to go!

auto-merge was automatically disabled December 19, 2025 16:29

Head branch was pushed to by a user without write access

@thomasqueirozb thomasqueirozb added this pull request to the merge queue Dec 19, 2025
Merged via the queue into vectordotdev:master with commit 60fa980 Dec 19, 2025
50 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Dec 19, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

domain: external docs Anything related to Vector's external, public documentation domain: sinks Anything related to the Vector's sinks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ArrowStream format to Clickhouse sink

5 participants