Skip to content

Commit d31308b

Browse files
committed
Add development guidelines
1 parent 3484a57 commit d31308b

File tree

1 file changed

+75
-2
lines changed

1 file changed

+75
-2
lines changed

README.md

Lines changed: 75 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -149,7 +149,80 @@ This means that:
149149
1. The head stage is executed normally as if the query was not distributed.
150150
2. Upon calling `.execute()` on the `ArrowFlightReadExec`, instead of recursively calling `.execute()` on its children,
151151
they will be serialized and sent over the wire to another node.
152-
3. The next node, which is hosting an Arrow Flight Endpoint listening for gRPC requests over an HTTP server, will pick up
153-
the request containing the serialized chunk of the overall plan, and execute it.
152+
3. The next node, which is hosting an Arrow Flight Endpoint listening for gRPC requests over an HTTP server, will pick
153+
up the request containing the serialized chunk of the overall plan, and execute it.
154154
4. This is repeated for each stage, and data will start flowing from bottom to top until it reaches the head stage.
155155

156+
## Development
157+
158+
### Prerequisites
159+
160+
- Rust 1.85.1 or later (specified in `rust-toolchain.toml`)
161+
- Git LFS for test data
162+
163+
### Setup
164+
165+
1. **Clone the repository:**
166+
```bash
167+
git clone [email protected]:datafusion-contrib/datafusion-distributed
168+
cd datafusion-distributed
169+
```
170+
171+
2. **Install Git LFS and fetch test data:**
172+
```bash
173+
git lfs install
174+
git lfs checkout
175+
```
176+
177+
### Running Tests
178+
179+
**Unit and integration tests:**
180+
181+
```bash
182+
cargo test --features integration
183+
```
184+
185+
### Running Examples
186+
187+
**Start localhost workers:**
188+
189+
```bash
190+
# Terminal 1
191+
cargo run --example localhost_worker -- 8080 --cluster-ports 8080,8081
192+
193+
# Terminal 2
194+
cargo run --example localhost_worker -- 8081 --cluster-ports 8080,8081
195+
```
196+
197+
**Execute distributed queries:**
198+
199+
```bash
200+
cargo run --example localhost_run -- 'SELECT count(*) FROM weather' --cluster-ports 8080,8081
201+
```
202+
203+
### Benchmarks
204+
205+
**Generate TPC-H benchmark data:**
206+
207+
```bash
208+
cd benchmarks
209+
./gen-tpch.sh
210+
```
211+
212+
**Run TPC-H benchmarks:**
213+
214+
```bash
215+
cargo run -p datafusion-distributed-benchmarks --release -- tpch --path benchmarks/data/tpch_sf1
216+
```
217+
218+
### Project Structure
219+
220+
- `src/` - Core library code
221+
- `flight_service/` - Arrow Flight service implementation
222+
- `plan/` - Physical plan extensions and operators
223+
- `stage/` - Execution stage management
224+
- `common/` - Shared utilities
225+
- `examples/` - Usage examples
226+
- `tests/` - Integration tests
227+
- `benchmarks/` - Performance benchmarks
228+
- `testdata/` - Test datasets

0 commit comments

Comments
 (0)