Skip to content

Commit 6698eed

Browse files
authored
Clean up README.md in advance of the 5.0 release (apache#536)
* Clean up parquet readme * Fixup arrow readme content * cleanups * update main readmen * update contributing * Prettier * RAT
1 parent 8823b9b commit 6698eed

File tree

7 files changed

+363
-394
lines changed

7 files changed

+363
-394
lines changed

CONTRIBUTING.md

Lines changed: 103 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -17,61 +17,122 @@
1717
under the License.
1818
-->
1919

20-
# How to contribute to Apache Arrow
20+
## Developer's guide to Arrow Rust
2121

22-
## Did you find a bug?
22+
### How to compile
2323

24-
The Arrow project uses JIRA as a bug tracker. To report a bug, you'll have
25-
to first create an account on the
26-
[Apache Foundation JIRA](https://issues.apache.org/jira/). The JIRA server
27-
hosts bugs and issues for multiple Apache projects. The JIRA project name
28-
for Arrow is "ARROW".
24+
This is a standard cargo project with workspaces. To build it, you need to have `rust` and `cargo`:
2925

30-
To be assigned to an issue, ask an Arrow JIRA admin to go to
31-
[Arrow Roles](https://issues.apache.org/jira/plugins/servlet/project-config/ARROW/roles),
32-
click "Add users to a role," and add you to the "Contributor" role. Most
33-
committers are authorized to do this; if you're a committer and aren't
34-
able to load that project admin page, have someone else add you to the
35-
necessary role.
26+
```bash
27+
cargo build
28+
```
3629

37-
Before you create a new bug entry, we recommend you first
38-
[search](https://issues.apache.org/jira/projects/ARROW/issues/ARROW-5140?filter=allopenissues)
39-
among existing Arrow issues.
30+
You can also use rust's official docker image:
4031

41-
When you create a new JIRA entry, please don't forget to fill the "Component"
42-
field. Arrow has many subcomponents and this helps triaging and filtering
43-
tremendously. Also, we conventionally prefix the issue title with the component
44-
name in brackets, such as "[C++] Crash in Array::Frobnicate()", so as to make
45-
lists more easy to navigate, and we'd be grateful if you did the same.
32+
```bash
33+
docker run --rm -v $(pwd):/arrow-rs -it rust /bin/bash -c "cd /arrow-rs && rustup component add rustfmt && cargo build"
34+
```
4635

47-
## Did you write a patch that fixes a bug or brings an improvement?
36+
The command above assumes that are in the root directory of the project, not in the same
37+
directory as this README.md.
4838

49-
First create a JIRA entry as described above. Then, submit your changes
50-
as a GitHub Pull Request. We'll ask you to prefix the pull request title
51-
with the JIRA issue number and the component name in brackets.
52-
(for example: "ARROW-2345: [C++] Fix crash in Array::Frobnicate()").
53-
Respecting this convention makes it easier for us to process the backlog
54-
of submitted Pull Requests.
39+
You can also compile specific workspaces:
5540

56-
### Minor Fixes
41+
```bash
42+
cd arrow && cargo build
43+
```
5744

58-
Any functionality change should have a JIRA opened. For minor changes that
59-
affect documentation, you do not need to open up a JIRA. Instead you can
60-
prefix the title of your PR with "MINOR: " if meets the following guidelines:
45+
### Git Submodules
6146

62-
- Grammar, usage and spelling fixes that affect no more than 2 files
63-
- Documentation updates affecting no more than 2 files and not more
64-
than 500 words.
47+
Before running tests and examples, it is necessary to set up the local development environment.
6548

66-
## Do you want to propose a significant new feature or an important refactoring?
49+
The tests rely on test data that is contained in git submodules.
6750

68-
We ask that all discussions about major changes in the codebase happen
69-
publicly on the [arrow-dev mailing-list](https://mail-archives.apache.org/mod_mbox/arrow-dev/).
51+
To pull down this data run the following:
7052

71-
## Do you have questions about the source code, the build procedure or the development process?
53+
```bash
54+
git submodule update --init
55+
```
7256

73-
You can also ask on the mailing-list, see above.
57+
This populates data in two git submodules:
7458

75-
## Further information
59+
- `../parquet_testing/data` (sourced from https://github.com/apache/parquet-testing.git)
60+
- `../testing` (sourced from https://github.com/apache/arrow-testing)
7661

77-
Please read our [development documentation](https://arrow.apache.org/docs/developers/contributing.html).
62+
By default, `cargo test` will look for these directories at their
63+
standard location. The following environment variables can be used to override the location:
64+
65+
```bash
66+
# Optionally specify a different location for test data
67+
export PARQUET_TEST_DATA=$(cd ../parquet-testing/data; pwd)
68+
export ARROW_TEST_DATA=$(cd ../testing/data; pwd)
69+
```
70+
71+
From here on, this is a pure Rust project and `cargo` can be used to run tests, benchmarks, docs and examples as usual.
72+
73+
### Running the tests
74+
75+
Run tests using the Rust standard `cargo test` command:
76+
77+
```bash
78+
# run all tests.
79+
cargo test
80+
81+
82+
# run only tests for the arrow crate
83+
cargo test -p arrow
84+
```
85+
86+
## Code Formatting
87+
88+
Our CI uses `rustfmt` to check code formatting. Before submitting a
89+
PR be sure to run the following and check for lint issues:
90+
91+
```bash
92+
cargo +stable fmt --all -- --check
93+
```
94+
95+
## Clippy Lints
96+
97+
We recommend using `clippy` for checking lints during development. While we do not yet enforce `clippy` checks, we recommend not introducing new `clippy` errors or warnings.
98+
99+
Run the following to check for clippy lints.
100+
101+
```bash
102+
cargo clippy
103+
```
104+
105+
If you use Visual Studio Code with the `rust-analyzer` plugin, you can enable `clippy` to run each time you save a file. See https://users.rust-lang.org/t/how-to-use-clippy-in-vs-code-with-rust-analyzer/41881.
106+
107+
One of the concerns with `clippy` is that it often produces a lot of false positives, or that some recommendations may hurt readability. We do not have a policy of which lints are ignored, but if you disagree with a `clippy` lint, you may disable the lint and briefly justify it.
108+
109+
Search for `allow(clippy::` in the codebase to identify lints that are ignored/allowed. We currently prefer ignoring lints on the lowest unit possible.
110+
111+
- If you are introducing a line that returns a lint warning or error, you may disable the lint on that line.
112+
- If you have several lints on a function or module, you may disable the lint on the function or module.
113+
- If a lint is pervasive across multiple modules, you may disable it at the crate level.
114+
115+
## Git Pre-Commit Hook
116+
117+
We can use [git pre-commit hook](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks) to automate various kinds of git pre-commit checking/formatting.
118+
119+
Suppose you are in the root directory of the project.
120+
121+
First check if the file already exists:
122+
123+
```bash
124+
ls -l .git/hooks/pre-commit
125+
```
126+
127+
If the file already exists, to avoid mistakenly **overriding**, you MAY have to check
128+
the link source or file content. Else if not exist, let's safely soft link [pre-commit.sh](pre-commit.sh) as file `.git/hooks/pre-commit`:
129+
130+
```bash
131+
ln -s ../../rust/pre-commit.sh .git/hooks/pre-commit
132+
```
133+
134+
If sometimes you want to commit without checking, just run `git commit` with `--no-verify`:
135+
136+
```bash
137+
git commit --no-verify -m "... commit message ..."
138+
```

README.md

Lines changed: 21 additions & 146 deletions
Original file line numberDiff line numberDiff line change
@@ -17,167 +17,42 @@
1717
under the License.
1818
-->
1919

20-
# Native Rust implementation of Apache Arrow
20+
# Native Rust implementation of Apache Arrow and Parquet
2121

2222
[![Coverage Status](https://codecov.io/gh/apache/arrow/rust/branch/master/graph/badge.svg)](https://codecov.io/gh/apache/arrow?branch=master)
2323

2424
Welcome to the implementation of Arrow, the popular in-memory columnar format, in [Rust](https://www.rust-lang.org/).
2525

26-
This part of the Arrow project is divided in 4 main components:
26+
This repo contains the following main components:
2727

28-
| Crate | Description | Documentation |
29-
| ------------ | ------------------------------------------------------------------ | ------------------------------------------------------------------------------------- |
30-
| Arrow | Core functionality (memory layout, arrays, low level computations) | [(README)](arrow/README.md) |
31-
| Parquet | Parquet support | [(README)](parquet/README.md) |
32-
| Arrow-flight | Arrow data between processes | [(README)](arrow-flight/README.md) |
33-
| DataFusion | In-memory query engine with SQL support | [(README)](https://github.com/apache/arrow-datafusion/blob/master/README.md) |
34-
| Ballista | Distributed query execution | [(README)](https://github.com/apache/arrow-datafusion/blob/master/ballista/README.md) |
35-
36-
Independently, they support a vast array of functionality for in-memory computations.
28+
| Crate | Description | Documentation |
29+
| ------------ | ------------------------------------------------------------------ | ---------------------------------- |
30+
| arrow | Core functionality (memory layout, arrays, low level computations) | [(README)](arrow/README.md) |
31+
| parquet | Support for Parquet columnar file format | [(README)](parquet/README.md) |
32+
| arrow-flight | Support for Arrow-Flight IPC protocol | [(README)](arrow-flight/README.md) |
3733

38-
Together, they allow users to write an SQL query or a `DataFrame` (using the `datafusion` crate), run it against a parquet file (using the `parquet` crate), evaluate it in-memory using Arrow's columnar format (using the `arrow` crate), and send to another process (using the `arrow-flight` crate).
34+
There are two related crates in a different repository
35+
| Crate | Description | Documentation |
36+
| ------------ | ------------------------------------------------------------------ | ------------------------------------------------------------------------------------- |
37+
| DataFusion | In-memory query engine with SQL support | [(README)](https://github.com/apache/arrow-datafusion/blob/master/README.md) |
38+
| Ballista | Distributed query execution | [(README)](https://github.com/apache/arrow-datafusion/blob/master/ballista/README.md) |
3939

40-
Generally speaking, the `arrow` crate offers functionality to develop code that uses Arrow arrays, and `datafusion` offers most operations typically found in SQL, including `join`s and window functions.
40+
Collectively, these crates support a vast array of functionality for analytic computations in Rust.
4141

42-
There are too many features to enumerate here, but some notable mentions:
42+
For example, you can write an SQL query or a `DataFrame` (using the `datafusion` crate), run it against a parquet file (using the `parquet` crate), evaluate it in-memory using Arrow's columnar format (using the `arrow` crate), and send to another process (using the `arrow-flight` crate).
4343

44-
- `Arrow` implements all formats in the specification except certain dictionaries
45-
- `Arrow` supports SIMD operations to some of its vertical operations
46-
- `DataFusion` supports `async` execution
47-
- `DataFusion` supports user-defined functions, aggregates, and whole execution nodes
44+
Generally speaking, the `arrow` crate offers functionality for using Arrow arrays, and `datafusion` offers most operations typically found in SQL, including `join`s and window functions.
4845

4946
You can find more details about each crate in their respective READMEs.
5047

5148
## Arrow Rust Community
5249

53-
We use the official [ASF Slack](https://s.apache.org/slack-invite) for informal discussions and coordination. This is
54-
a great place to meet other contributors and get guidance on where to contribute. Join us in the `arrow-rust` channel.
55-
56-
We use [ASF JIRA](https://issues.apache.org/jira/secure/Dashboard.jspa) as the system of record for new features
57-
and bug fixes and this plays a critical role in the release process.
58-
59-
For design discussions we generally collaborate on Google documents and file a JIRA linking to the document.
60-
61-
There is also a bi-weekly Rust-specific sync call for the Arrow Rust community. This is hosted on Google Meet
62-
at https://meet.google.com/ctp-yujs-aee on alternate Wednesday's at 09:00 US/Pacific, 12:00 US/Eastern. During
63-
US daylight savings time this corresponds to 16:00 UTC and at other times this is 17:00 UTC.
64-
65-
## Developer's guide to Arrow Rust
66-
67-
### How to compile
68-
69-
This is a standard cargo project with workspaces. To build it, you need to have `rust` and `cargo`:
70-
71-
```bash
72-
cargo build
73-
```
74-
75-
You can also use rust's official docker image:
76-
77-
```bash
78-
docker run --rm -v $(pwd):/arrow-rs -it rust /bin/bash -c "cd /arrow-rs && rustup component add rustfmt && cargo build"
79-
```
80-
81-
The command above assumes that are in the root directory of the project, not in the same
82-
directory as this README.md.
83-
84-
You can also compile specific workspaces:
85-
86-
```bash
87-
cd arrow && cargo build
88-
```
89-
90-
### Git Submodules
91-
92-
Before running tests and examples, it is necessary to set up the local development environment.
93-
94-
The tests rely on test data that is contained in git submodules.
95-
96-
To pull down this data run the following:
97-
98-
```bash
99-
git submodule update --init
100-
```
101-
102-
This populates data in two git submodules:
103-
104-
- `../parquet_testing/data` (sourced from https://github.com/apache/parquet-testing.git)
105-
- `../testing` (sourced from https://github.com/apache/arrow-testing)
106-
107-
By default, `cargo test` will look for these directories at their
108-
standard location. The following environment variables can be used to override the location:
109-
110-
```bash
111-
# Optionally specify a different location for test data
112-
export PARQUET_TEST_DATA=$(cd ../parquet-testing/data; pwd)
113-
export ARROW_TEST_DATA=$(cd ../testing/data; pwd)
114-
```
50+
The `[email protected]` mailing list serves as the core communication channel for the Arrow community. Instructions for signing up and links to the archives can be found at the [Arrow Community](https://arrow.apache.org/community/) page. All major announcements and communications happen there.
11551

116-
From here on, this is a pure Rust project and `cargo` can be used to run tests, benchmarks, docs and examples as usual.
52+
The Rust Arrow community also uses the official [ASF Slack](https://s.apache.org/slack-invite) for informal discussions and coordination. This is
53+
a great place to meet other contributors and get guidance on where to contribute. Join us in the `#arrow-rust` channel.
11754

118-
### Running the tests
119-
120-
Run tests using the Rust standard `cargo test` command:
121-
122-
```bash
123-
# run all tests.
124-
cargo test
125-
126-
127-
# run only tests for the arrow crate
128-
cargo test -p arrow
129-
```
130-
131-
## Code Formatting
132-
133-
Our CI uses `rustfmt` to check code formatting. Before submitting a
134-
PR be sure to run the following and check for lint issues:
135-
136-
```bash
137-
cargo +stable fmt --all -- --check
138-
```
139-
140-
## Clippy Lints
141-
142-
We recommend using `clippy` for checking lints during development. While we do not yet enforce `clippy` checks, we recommend not introducing new `clippy` errors or warnings.
143-
144-
Run the following to check for clippy lints.
145-
146-
```bash
147-
cargo clippy
148-
```
149-
150-
If you use Visual Studio Code with the `rust-analyzer` plugin, you can enable `clippy` to run each time you save a file. See https://users.rust-lang.org/t/how-to-use-clippy-in-vs-code-with-rust-analyzer/41881.
151-
152-
One of the concerns with `clippy` is that it often produces a lot of false positives, or that some recommendations may hurt readability. We do not have a policy of which lints are ignored, but if you disagree with a `clippy` lint, you may disable the lint and briefly justify it.
153-
154-
Search for `allow(clippy::` in the codebase to identify lints that are ignored/allowed. We currently prefer ignoring lints on the lowest unit possible.
155-
156-
- If you are introducing a line that returns a lint warning or error, you may disable the lint on that line.
157-
- If you have several lints on a function or module, you may disable the lint on the function or module.
158-
- If a lint is pervasive across multiple modules, you may disable it at the crate level.
159-
160-
## Git Pre-Commit Hook
161-
162-
We can use [git pre-commit hook](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks) to automate various kinds of git pre-commit checking/formatting.
163-
164-
Suppose you are in the root directory of the project.
165-
166-
First check if the file already exists:
167-
168-
```bash
169-
ls -l .git/hooks/pre-commit
170-
```
171-
172-
If the file already exists, to avoid mistakenly **overriding**, you MAY have to check
173-
the link source or file content. Else if not exist, let's safely soft link [pre-commit.sh](pre-commit.sh) as file `.git/hooks/pre-commit`:
174-
175-
```bash
176-
ln -s ../../rust/pre-commit.sh .git/hooks/pre-commit
177-
```
178-
179-
If sometimes you want to commit without checking, just run `git commit` with `--no-verify`:
55+
Unlike other parts of the Arrow ecosystem, the Rust implementation uses [github issues](https://github.com/apache/arrow-rs/issues) as the system of record for new features
56+
and bug fixes and this plays a critical role in the release process.
18057

181-
```bash
182-
git commit --no-verify -m "... commit message ..."
183-
```
58+
For design discussions we generally collaborate on Google documents and file a github issue linking to the document.

arrow-flight/README.md

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,15 @@
2121

2222
[![Crates.io](https://img.shields.io/crates/v/arrow-flight.svg)](https://crates.io/crates/arrow-flight)
2323

24-
Apache Arrow Flight is a gRPC based protocol for exchanging Arrow data between processes. See the blog post [Introducing Apache Arrow Flight: A Framework for Fast Data Transport](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/) for more information.
24+
## Usage
25+
26+
Add this to your Cargo.toml:
2527

26-
This crate simply provides the Rust implementation of the [Flight.proto](../../format/Flight.proto) gRPC protocol and provides an example that demonstrates how to build a Flight server implemented with Tonic.
28+
```toml
29+
[dependencies]
30+
arrow-flight = "5.0"
31+
```
32+
33+
Apache Arrow Flight is a gRPC based protocol for exchanging Arrow data between processes. See the blog post [Introducing Apache Arrow Flight: A Framework for Fast Data Transport](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/) for more information.
2734

28-
Note that building a Flight server also requires an implementation of Arrow IPC which is based on the Flatbuffers serialization framework. The Rust implementation of Arrow IPC is not yet complete although the generated Flatbuffers code is available as part of the core Arrow crate.
35+
This crate provides a Rust implementation of the [Flight.proto](../../format/Flight.proto) gRPC protocol and provides an example that demonstrates how to build a Flight server implemented with Tonic.

0 commit comments

Comments
 (0)