@@ -149,7 +149,80 @@ This means that:
1491491 . The head stage is executed normally as if the query was not distributed.
1501502 . Upon calling ` .execute() ` on the ` ArrowFlightReadExec ` , instead of recursively calling ` .execute() ` on its children,
151151 they will be serialized and sent over the wire to another node.
152- 3 . The next node, which is hosting an Arrow Flight Endpoint listening for gRPC requests over an HTTP server, will pick up
153- the request containing the serialized chunk of the overall plan, and execute it.
152+ 3 . The next node, which is hosting an Arrow Flight Endpoint listening for gRPC requests over an HTTP server, will pick
153+ up the request containing the serialized chunk of the overall plan, and execute it.
1541544 . This is repeated for each stage, and data will start flowing from bottom to top until it reaches the head stage.
155155
156+ ## Development
157+
158+ ### Prerequisites
159+
160+ - Rust 1.85.1 or later (specified in ` rust-toolchain.toml ` )
161+ - Git LFS for test data
162+
163+ ### Setup
164+
165+ 1 . ** Clone the repository:**
166+ ``` bash
167+ git clone
[email protected] :datafusion-contrib/datafusion-distributed
168+ cd datafusion-distributed
169+ ```
170+
171+ 2 . ** Install Git LFS and fetch test data:**
172+ ``` bash
173+ git lfs install
174+ git lfs checkout
175+ ```
176+
177+ ### Running Tests
178+
179+ ** Unit and integration tests:**
180+
181+ ``` bash
182+ cargo test --features integration
183+ ```
184+
185+ ### Running Examples
186+
187+ ** Start localhost workers:**
188+
189+ ``` bash
190+ # Terminal 1
191+ cargo run --example localhost_worker -- 8080 --cluster-ports 8080,8081
192+
193+ # Terminal 2
194+ cargo run --example localhost_worker -- 8081 --cluster-ports 8080,8081
195+ ```
196+
197+ ** Execute distributed queries:**
198+
199+ ``` bash
200+ cargo run --example localhost_run -- ' SELECT count(*) FROM weather' --cluster-ports 8080,8081
201+ ```
202+
203+ ### Benchmarks
204+
205+ ** Generate TPC-H benchmark data:**
206+
207+ ``` bash
208+ cd benchmarks
209+ ./gen-tpch.sh
210+ ```
211+
212+ ** Run TPC-H benchmarks:**
213+
214+ ``` bash
215+ cargo run -p datafusion-distributed-benchmarks --release -- tpch --path benchmarks/data/tpch_sf1
216+ ```
217+
218+ ### Project Structure
219+
220+ - ` src/ ` - Core library code
221+ - ` flight_service/ ` - Arrow Flight service implementation
222+ - ` plan/ ` - Physical plan extensions and operators
223+ - ` stage/ ` - Execution stage management
224+ - ` common/ ` - Shared utilities
225+ - ` examples/ ` - Usage examples
226+ - ` tests/ ` - Integration tests
227+ - ` benchmarks/ ` - Performance benchmarks
228+ - ` testdata/ ` - Test datasets
0 commit comments