You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+35-13Lines changed: 35 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
# DataFusion Distributed
1
+
# Distributed DataFusion
2
2
3
3
[![Apache licensed][license-badge]][license-url]
4
4
@@ -7,7 +7,7 @@
7
7
8
8
## Overview
9
9
10
-
DataFusion Distributed is a distributed execution framework that enables DataFusion DataFrame and SQL queries to run in a distributed fashion. This project provides the infrastructure to scale DataFusion workloads across multiple nodes in a cluster.
10
+
Distributed DataFusion is a distributed execution framework that enables DataFusion DataFrame and SQL queries to run in a distributed fashion. This project provides the infrastructure to scale DataFusion workloads across multiple nodes in a cluster.
11
11
12
12
This is an open source version of the distributed DataFusion prototype, extracted from DataDog's internal implementation and made available to the community.
13
13
@@ -100,20 +100,42 @@ protoc --version
100
100
./build.sh --release
101
101
```
102
102
103
+
**Clean Rebuild**: If you need to completely clean and rebuild (removes all build artifacts):
104
+
105
+
```bash
106
+
# Clean rebuild in debug mode
107
+
./clean_and_build.sh
108
+
109
+
# Clean rebuild in release mode (optimized)
110
+
./clean_and_build.sh --release
111
+
```
112
+
103
113
#### Using Cargo Directly
104
114
105
-
To build the project in debug mode:
115
+
You can also build the project directly with Cargo (the build.rs script will automatically handle Protocol Buffer compilation):
106
116
107
117
```bash
118
+
# Build in debug mode
108
119
cargo build
109
120
```
110
121
111
-
To build the project in release mode (optimized):
122
+
```bash
123
+
# Build in release mode (optimized)
124
+
cargo build --release
125
+
```
126
+
127
+
**Clean Build Artifacts**: To clean previous build artifacts before rebuilding:
112
128
113
129
```bash
130
+
# Clean all build artifacts (removes target/ directory contents)
131
+
cargo clean
132
+
133
+
# Then rebuild
114
134
cargo build --release
115
135
```
116
136
137
+
**Note**: Both commands, `build.sh` script and `cargo` automatically invoke `build.rs`, which handles Protocol Buffer compilation before building the main crate. The main advantage of using `./build.sh` is the user-friendly output and usage examples it provides.
138
+
117
139
### Running Tests
118
140
119
141
Run all tests:
@@ -203,10 +225,10 @@ In separate terminal windows, start two workers:
To make your cluster aware of specific table schemas, you’ll need to define a new environment variable, DFRAY_TABLES, when starting each worker and proxy. This variable should specify tables whose data is stored in Parquet files.For example, the following setup registers two tables—customer and nation—along with their corresponding data sources.
0 commit comments