Skip to content

Commit 2977a80

Browse files
authored
Merge branch 'main' into 957-sparse-data-transfer
2 parents c4a0461 + c6f7563 commit 2977a80

File tree

237 files changed

+4257
-2453
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

237 files changed

+4257
-2453
lines changed

CONTRIBUTING.md

Lines changed: 43 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@ See the License for the specific language governing permissions and
1414
limitations under the License.
1515
-->
1616

17-
# Contributing to the DAPHNE System
17+
# Contributing to DAPHNE
1818

19-
*Thank you* for your interest in contributing to the DAPHNE system.
19+
*Thank you* for your interest in contributing to DAPHNE.
2020
Our goal is to build an **open and inclusive community of developers** around the system.
2121
Thus, **contributions are highly welcome**, both from *within the DAPHNE project consortium* and from *external researchers/developers*.
2222

@@ -25,6 +25,7 @@ In the following, you find some rough **guidelines on contributing**, which will
2525
## Ways of Contributing
2626

2727
There are **various ways of contributing** including (but not limited to):
28+
2829
- actual implementation
2930
- writing test cases
3031
- writing documentation
@@ -36,7 +37,7 @@ That way, discussions are made *accessible and transparent* to everyone interest
3637
This is important to *involve people* and to *avoid repetition* in case multiple people have the same question/comment or encounter the same problem.
3738
So feel free to create an issue to start a discussion on a particular topic (including these contribution guidelines) or to report a bug or other problem.
3839

39-
## Issue tracking
40+
## Issue Tracking
4041

4142
All open/ongoing/completed **work is tracked as issues** on GitHub.
4243
These could be anything from precisely defined small tasks to requests for complex components.
@@ -65,31 +66,41 @@ That is, please try your best to make a good-quality contribution and we will he
6566
**The procedure is roughly as follows:**
6667

6768
1. **Get assigned to the issue** to let others know you are going to work on it and to avoid duplicate work. Please leave a comment on the issue stating that you are going to work on it. After that, a collaborator will formally assign you.
69+
6870
2. **Fork the repository** on GitHub and **clone your fork** (see [GitHub docs](https://docs.github.com/en/get-started/quickstart/fork-a-repo)).
69-
We recommend cloning by `git clone --recursive https://github.com/<USERNAME>/daphne.git` (note the `--recursive`), as specified in [Getting Started](/doc/GettingStarted.md).
70-
71-
*You may skip this step and reuse your existing fork if you have contributed before. Simply update your fork with the recent changes from the original DAPHNE repository (see [GitHub docs](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork)).*
71+
<!-- TODO with containers, we don't recommend that anymore -->
72+
We recommend cloning by `git clone --recursive https://github.com/<USERNAME>/daphne.git` (note the `--recursive`), as specified in [Getting Started](/doc/GettingStarted.md).
73+
74+
*You may skip this step and reuse your existing fork if you have contributed before. Simply update your fork with the recent changes from the original DAPHNE repository (see [GitHub docs](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork)).*
75+
7276
3. **Create your own local branch**: `git checkout -b BRANCH_NAME`.
73-
`BRANCH_NAME` should clearly indicate what the branch is about; the recommended pattern is `123-some-short-title` (where `123` is the issue number).
77+
`BRANCH_NAME` should clearly indicate what the branch is about; the recommended pattern is `123-some-short-title` (where `123` is the issue number).
78+
7479
4. **Add as many commits as you like** to your branch, and `git push` them to your fork.
75-
Use `git push --set-upstream origin BRANCH_NAME` when you push the first time.
76-
5. If you work longer on your contribution, make sure to **get the most recent changes from the upstream** (original DAPHNE system repository) from time to time (see [GitHub docs](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork)).
80+
Use `git push --set-upstream origin BRANCH_NAME` when you push the first time.
81+
82+
5. If you work longer on your contribution, make sure to **get the most recent changes from the upstream** (original DAPHNE repository) from time to time (see [GitHub docs](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork)).
83+
7784
6. Once you feel ready (for integration or for discussion/feedback), **create a pull request** on GitHub (see [GitHub docs](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request)).
78-
Normally, you'll want to ask for integration into `base:main`, the repo's default branch.
79-
Please choose an expressive title and provide a short description of your changes.
80-
Feel free to mark your pull request "WIP: " or "Draft: " in the title.
81-
Note that you can add more commits to your pull request after you created it.
82-
Ideally, the changes in the PR contain only the changes you made for that PR,
83-
e.g, by rebasing your branch on top of the target branch. This makes it easier for others to
84-
review your PR.
85+
Normally, you'll want to ask for integration into `base:main`, the repo's default branch.
86+
Please choose an expressive title and provide a short description of your changes.
87+
Feel free to mark your pull request "WIP: " or "Draft: " in the title.
88+
Note that you can add more commits to your pull request after you created it.
89+
Ideally, the changes in the PR contain only the changes you made for that PR,
90+
e.g, by rebasing your branch on top of the target branch. This makes it easier for others to
91+
review your PR.
92+
<!-- TODO link to reviewing guidelines -->
93+
8594
7. [Resolve any open conflicts](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/addressing-merge-conflicts/about-merge-conflicts) to the target branch of the PR.
95+
8696
8. You **receive feedback** on your proposed contribution.
87-
You may be asked to apply certain changes, or we might apply straightforward adjustments ourselves before the integration.
97+
You may be asked to apply certain changes, or we might apply straightforward adjustments ourselves before the integration.
98+
8899
9. If it looks good (potentially after some help), **your contribution becomes a part of DAPHNE**.
89100

90101
### Experienced DAPHNE Contributors (Collaborators)
91102

92-
We appreciate *continued commitment* to the DAPHNE system.
103+
We appreciate *continued commitment* to DAPHNE.
93104
Thus, **frequent contributors can become collaborators** on GitHub.
94105
Currently, this requires **at least three non-trivial contributions** to the system.
95106
Collaborators have *direct write access* to all branches of the repository, including the main branch.
@@ -99,25 +110,26 @@ Collaborators do not need to create a fork, and do not need to go through pull r
99110
At the same time, this freedom comes with certain responsibilities, which are roughly sketched here:
100111

101112
1. Please **follow some simple guidelines when changing the code**:
102-
- Feel free to directly push to the main branch, but *be mindful of what you commit*, since it will affect everyone.
103-
As a guideline, commits fundamentally changing how certain things work should be announced and discussed first, whereas small changes or changes local to "your" component are not critical.
104-
- But **never force push to the main branch**, since it can lead to severe inconsistencies in the Git history.
105-
- Even *collaborators may still use pull requests* (just like new contributors) to suggest larger changes.
106-
This is also suitable whenever you feel unsure about a change or want to get feedback first.
113+
- Feel free to directly push to the main branch, but *be mindful of what you commit*, since it will affect everyone.
114+
As a guideline, commits fundamentally changing how certain things work should be announced and discussed first, whereas small changes or changes local to "your" component are not critical.
115+
- But **never force push to the main branch**, since it can lead to severe inconsistencies in the Git history.
116+
- Even *collaborators may still use pull requests* (just like new contributors) to suggest larger changes.
117+
This is also suitable whenever you feel unsure about a change or want to get feedback first.
107118
2. Please **engage in the [handling of pull requests](/doc/development/HandlingPRs.md)**; especially those affecting the components you are working on.
108-
This includes:
109-
- reading the code others suggest for integration
110-
- trying if it works
111-
- providing constructive and *actionable* feedback on improving the contribution prior to the integration
112-
- actually merging a pull request in
113-
114-
Balancing the handling of pull requests is important to *keep the development process scalable*.
119+
This includes:
120+
121+
- reading the code others suggest for integration
122+
- trying if it works
123+
- providing constructive and *actionable* feedback on improving the contribution prior to the integration
124+
- actually merging a pull request in
125+
126+
Balancing the handling of pull requests is important to *keep the development process scalable*.
115127

116128

117129
### Code Style
118130

119131
Before contributing, please make sure to run `clang-format` on your C++ (.h and
120-
.cpp) files. The codebase is currently formatted with `clang-format` version
132+
.cpp) files. The code base is currently formatted with `clang-format` version
121133
`18.1.3`. This is the default `clang-format` version when installing via `apt`
122134
on Ubuntu 24.04, and can easily be installed via `python -mpip install clang-format==18.1.3`
123135
on other systems.

UserConfig.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,17 @@
2222
"explain_property_inference": false,
2323
"explain_sql": false,
2424
"explain_select_matrix_repr": false,
25+
"explain_transfer_data_props": false,
2526
"explain_type_adaptation": false,
2627
"explain_vectorized": false,
2728
"explain_obj_ref_mgnt": false,
2829
"explain_mlir_codegen": false,
2930
"explain_mlir_codegen_sparsity_exploiting_op_fusion": false,
3031
"explain_mlir_codegen_daphneir_to_mlir": false,
3132
"explain_mlir_codegen_mlir_specific": false,
33+
"enable_property_recording": false,
34+
"enable_property_insert": false,
35+
"properties_file_path": "properties.json",
3236
"taskPartitioningScheme": "STATIC",
3337
"numberOfThreads": -1,
3438
"minimumTaskSize": 1,

doc/BinaryFormat.md

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ limitations under the License.
1717
# Binary Data Format
1818

1919
DAPHNE defines its own binary representation for the serialization of in-memory data objects (matrices/frames).
20-
This representation is intended to be used by default whenever we need to transfer or persistently store these in-memory objects, e.g., for
20+
This representation is intended to be used by default whenever we need to transfer or persistently store these in-memory objects, e.g., for:
2121

2222
- the data transfer in the distributed runtime
2323
- a custom binary file format
@@ -32,7 +32,7 @@ At the moment, we focus on the case of a single block per data object.
3232

3333
## Binary Representation of a Whole Data Object
3434

35-
The binary representation of a data object (matrix/frame) starts with a header containing general and data type-specific information.
35+
The binary representation of a data object (matrix/frame) starts with a header containing general and data-type-specific information.
3636
The data object is partitioned into rectangular blocks (in the extreme case, this can mean a single block).
3737
All blocks are represented individually (see binary representation of a single block below) and stored along with their position in the data object.
3838

@@ -76,7 +76,7 @@ We currently support the following **value types**:
7676
| 9 | `float` |
7777
| 10 | `double` |
7878

79-
Depending on the data type, there are more information in the header:
79+
Depending on the data type, there is more information in the header:
8080

8181
*For `DenseMatrix` and `CSRMatrix`*:
8282

@@ -108,8 +108,8 @@ size[B] 1 1 8 8 1 1 2 len[0]
108108
The body consists of a sequence of:
109109

110110
- a pair of
111-
- row index `rx` (uint64)
112-
- column index `cx` (uint64)
111+
- row index `rx` (uint64)
112+
- column index `cx` (uint64)
113113
- a binary block representation
114114

115115
For the special case of a single block, this looks as follows:
@@ -128,7 +128,7 @@ size[B]
128128
A single data block is a rectangular partition of a data object.
129129
In the extreme case, a single block can span the entire data object in both dimensions (one block per data object).
130130

131-
General block header
131+
General block header:
132132

133133
- number of rows `#r` (uint32)
134134
- number of columns `#c` (uint32)
@@ -143,7 +143,7 @@ addr[B] 0 3 4 7 8 8 9 *
143143
size[B] 4 4 1 *
144144
```
145145

146-
## Block types
146+
## Block Types
147147

148148
We define different block types to allow for a space-efficient representation depending on the data.
149149
When serializing a data object, the block types are not required to match the in-memory representation (e.g., the blocks of a `DenseMatrix` could use the *sparse* binary representation).
@@ -161,7 +161,7 @@ Most block types store their value type as part of the block type-specific infor
161161
Note that the value type used for the binary representation is not required to match the value type of the in-memory object (e.g., `DenseMatrix<uint64_t>` may be represented as a *dense* block with value type `uint8_t`, if the value range permits).
162162
Furthermore, each block may be represented using its individual value type.
163163

164-
### Empty block
164+
### Empty Block
165165

166166
This block type is used to represent blocks that contain only zeros of the respective value type very space-efficiently.
167167

@@ -175,7 +175,7 @@ addr[B] 0 3 4 7 8 8
175175
size[B] 4 4 1
176176
```
177177

178-
### Dense block
178+
### Dense Block
179179

180180
Block type-specific information:
181181

@@ -192,17 +192,17 @@ addr[B] 0 3 4 7 8 8 9 9 10 10+#r*#c*S
192192
size[B] 4 4 1 1 S S S
193193
```
194194

195-
### Sparse block (compressed sparse row, CSR)
195+
### Sparse Block (Compressed Sparse Row, CSR)
196196

197197
Block type-specific information:
198198

199199
- value type `vt` (uint8)
200200
- number of non-zeros in the block `#nzb` (uint64)
201201
- for each row
202-
- number of non-zeros in the row `#nzr` (uint32)
203-
- for each non-zero in the row
204-
- column index `cx` (uint32)
205-
- value `v` (value type `vt`)
202+
- number of non-zeros in the row `#nzr` (uint32)
203+
- for each non-zero in the row
204+
- column index `cx` (uint32)
205+
- value `v` (value type `vt`)
206206

207207
Note that both a row and the entire block might contain no non-zeros.
208208

@@ -233,20 +233,20 @@ size[B] 4 4 1 1 8 4+#nzr[i]*(4+S)
233233
4 S
234234
```
235235

236-
### Ultra-sparse block (coordinate, COO)
236+
### Ultra-Sparse Block (Coordinate, COO)
237237

238238
Ultra-sparse blocks contain almost no non-zeros, so we want to keep the overhead of the meta data low.
239239
Thus, we distinguish blocks with a single column (where we don't need to store the column index) and blocks with more than one column.
240240

241-
### Blocks with a single column
241+
### Blocks with a Single Column
242242

243243
Block type-specific information:
244244

245245
- value type `vt` (uint8)
246246
- number of non-zeros in the block `#nzb` (uint32)
247247
- for each non-zero
248-
- row index `rx` (uint32)
249-
- value `v` (value type `vt`)
248+
- row index `rx` (uint32)
249+
- value `v` (value type `vt`)
250250

251251
Below, `S` denotes the size (in bytes) of a single value of type `vt`.
252252

@@ -266,16 +266,16 @@ size[B] 4 4 1 1 4 4+S 4+S 4+S
266266
4 S
267267
```
268268

269-
### Blocks with more than one column
269+
### Blocks with More than One Column
270270

271271
Block type-specific information:
272272

273273
- value type `vt` (uint8)
274274
- number of non-zeros in the block `#nzb` (uint32)
275275
- for each non-zero
276-
- row index `rx` (uint32)
277-
- column index `cx` (uint32)
278-
- value `v` (value type `vt`)
276+
- row index `rx` (uint32)
277+
- column index `cx` (uint32)
278+
- value `v` (value type `vt`)
279279

280280
Below, `S` denotes the size (in bytes) of a single value of type `vt`.
281281

doc/BuildEnvironment.md

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,22 +14,21 @@ See the License for the specific language governing permissions and
1414
limitations under the License.
1515
-->
1616

17-
# Building in unsupported environments
17+
# Building in Unsupported Environments
1818

1919
DAPHNE can also be built in environments other than the one endorsed by the DAPHNE project team. To help with that,
20-
there are scripts to build your own toolchain in the [``buildenv``](/buildenv) directory.
20+
there are scripts to build your own toolchain in the [`buildenv`](/buildenv) directory.
2121
These scripts can be used to build the compiler and necessary libraries to build DAPHNE in unsupported environments
2222
(RHEL UBI, CentOS, ...) Besides the build scripts, there is a docker file to create a UBI image to build for Redhat 8.
2323

24-
2524
Usage:
26-
* Create Docker image with build-ubi8.sh
27-
* Run the Docker image with run-ubi8.sh
28-
* Run build-all.sh
29-
* Run source env.sh inside or outside the Docker image to set PATH and linker variables to use the created environment
30-
(you need to cd into the directory containing env.sh as this uses $PWD)
3125

26+
* Create the Docker image with `build-ubi8.sh`
27+
* Run the Docker image with `run-ubi8.sh`
28+
* Run `build-all.sh`
29+
* Run `source env.sh` inside or outside the Docker image to set `PATH` and linker variables to use the created environment
30+
(you need to cd into the directory containing `env.sh` as this uses `$PWD`)
3231

3332
Beware that this procedure needs ~50GB of free disk space. Also, the provided CUDA SDK expects a recent driver (550+)
3433
version. That will most likely be an issue on large installations - exchange the relevant version and file names for
35-
another CUDA version in env.sh to build another version.
34+
another CUDA version in `env.sh` to build another version.

0 commit comments

Comments
 (0)