Skip to content

Conversation

@benjamin-awd
Copy link
Owner

Summary

This PR adds a generic Arrow codec to support Apache Arrow IPC serialization. This will enable sinks (e.g. ClickHouse) to serialize and transmit structured events efficiently as compared to row+text-based formats. By introducing a unified Arrow serialization layer, Vector can now interoperate more easily with Arrow-native systems and improve performance for columnar workflows.

Vector configuration

An example of how this configuration would look with a sink:

sinks:
  type: clickhouse
  host: http://localhost:8123
  table: my_table
  batch_encoding:
     codec: arrow_stream

How did you test this PR?

Tested using a Clickhouse sink implementation (not included in this PR in order to keep the scope limited)

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Split from: vectordotdev#24075 (comment) Related: vectordotdev#24074 (requires this to be implemented)
Related: vectordotdev#1374 -- should hopefully allow this to move forward

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants