Skip to content

Commit 3273663

Browse files
authored
Add source release scripts (#40)
* Add release scripts copied from datafusion-python * remove GH_TOKEN check: * replace datafusion-python with datafusion-ray * update changelog generator based on latest version in datafusion * update release instructions * update scripts
1 parent 8ee46ab commit 3273663

File tree

8 files changed

+928
-0
lines changed

8 files changed

+928
-0
lines changed

dev/release/README.md

Lines changed: 251 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,251 @@
1+
<!---
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# DataFusion Ray Release Process
21+
22+
Development happens on the `main` branch, and most of the time, we depend on DataFusion using GitHub dependencies
23+
rather than using an official release from crates.io. This allows us to pick up new features and bug fixes frequently
24+
by creating PRs to move to a later revision of the code. It also means we can incrementally make updates that are
25+
required due to changes in DataFusion rather than having a large amount of work to do when the next official release
26+
is available.
27+
28+
When there is a new official release of DataFusion, we update the `main` branch to point to that, update the version
29+
number, and create a new release branch, such as `branch-0.2`. Once this branch is created, we switch the `main` branch
30+
back to using GitHub dependencies. The release activity (such as generating the changelog) can then happen on the
31+
release branch without blocking ongoing development in the `main` branch.
32+
33+
We can cherry-pick commits from the `main` branch into `branch-0.2` as needed and then create new patch releases
34+
from that branch.
35+
36+
## Detailed Guide
37+
38+
### Pre-requisites
39+
40+
Releases can currently only be created by PMC members due to the permissions needed.
41+
42+
You will need a GitHub Personal Access Token. Follow
43+
[these instructions](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token)
44+
to generate one if you do not already have one.
45+
46+
You will need a PyPI API token. Create one at https://test.pypi.org/manage/account/#api-tokens, setting the “Scope” to
47+
“Entire account”.
48+
49+
You will also need access to the [datafusion-ray](https://test.pypi.org/project/datafusion-ray/) project on testpypi.
50+
51+
### Preparing the `main` Branch
52+
53+
Before creating a new release:
54+
55+
- We need to ensure that the main branch does not have any GitHub dependencies
56+
- a PR should be created and merged to update the major version number of the project
57+
- A new release branch should be created, such as `branch-0.2`
58+
59+
### Change Log
60+
61+
We maintain a `CHANGELOG.md` so our users know what has been changed between releases.
62+
63+
The changelog is generated using a Python script:
64+
65+
```bash
66+
$ GITHUB_TOKEN=<TOKEN> ./dev/release/generate-changelog.py 0.1.0 HEAD 0.2.0 > dev/changelog/0.2.0.md
67+
```
68+
69+
This script creates a changelog from GitHub PRs based on the labels associated with them as well as looking for
70+
titles starting with `feat:`, `fix:`, or `docs:` . The script will produce output similar to:
71+
72+
```
73+
Fetching list of commits between 24.0.0 and HEAD
74+
Fetching pull requests
75+
Categorizing pull requests
76+
Generating changelog content
77+
```
78+
79+
### Preparing a Release Candidate
80+
81+
### Tag the Repository
82+
83+
```bash
84+
git tag 0.2.0-rc1
85+
git push apache 0.2.0-rc1
86+
```
87+
88+
### Create a source release
89+
90+
```bash
91+
./dev/release/create-tarball.sh 0.2.0 1
92+
```
93+
94+
This will also create the email template to send to the mailing list.
95+
96+
Create a draft email using this content, but do not send until after completing the next step.
97+
98+
### Publish Python Artifacts to testpypi
99+
100+
This section assumes some familiarity with publishing Python packages to PyPi. For more information, refer to \
101+
[this tutorial](https://packaging.python.org/en/latest/tutorials/packaging-projects/#uploading-the-distribution-archives).
102+
103+
#### Publish Python Wheels to testpypi
104+
105+
Pushing an `rc` tag to the release branch will cause a GitHub Workflow to run that will build the Python wheels.
106+
107+
Go to https://github.com/apache/datafusion-ray/actions and look for an action named "Python Release Build"
108+
that has run against the pushed tag.
109+
110+
Click on the action and scroll down to the bottom of the page titled "Artifacts". Download `dist.zip`. It should
111+
contain files such as:
112+
113+
```text
114+
datafusion-ray-0.2.0-cp37-abi3-macosx_10_7_x86_64.whl
115+
datafusion-ray-0.2.0-cp37-abi3-macosx_11_0_arm64.whl
116+
datafusion-ray-0.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
117+
datafusion-ray-0.2.0-cp37-abi3-win_amd64.whl
118+
```
119+
120+
Upload the wheels to testpypi.
121+
122+
```bash
123+
unzip dist.zip
124+
python3 -m pip install --upgrade setuptools twine build
125+
python3 -m twine upload --repository testpypi datafusion-ray-0.2.0-cp37-abi3-*.whl
126+
```
127+
128+
When prompted for username, enter `__token__`. When prompted for a password, enter a valid GitHub Personal Access Token
129+
130+
#### Publish Python Source Distribution to testpypi
131+
132+
Download the source tarball created in the previous step, untar it, and run:
133+
134+
```bash
135+
maturin sdist
136+
```
137+
138+
This will create a file named `dist/datafusion-ray-0.2.0.tar.gz`. Upload this to testpypi:
139+
140+
```bash
141+
python3 -m twine upload --repository testpypi dist/datafusion-ray-0.2.0.tar.gz
142+
```
143+
144+
### Send the Email
145+
146+
Send the email to start the vote.
147+
148+
## Verifying a Release
149+
150+
Running the unit tests against a testpypi release candidate:
151+
152+
```bash
153+
# clone a fresh repo
154+
git clone https://github.com/apache/datafusion-ray.git
155+
cd datafusion-ray
156+
157+
# checkout the release commit
158+
git fetch --tags
159+
git checkout 0.2.0-rc1
160+
161+
# create the env
162+
python3 -m venv venv
163+
source venv/bin/activate
164+
165+
# install release candidate
166+
pip install --extra-index-url https://test.pypi.org/simple/ datafusion-ray==0.2.0
167+
168+
# only dep needed to run tests is pytest
169+
pip install pytest
170+
171+
# run the tests
172+
pytest --import-mode=importlib python/tests
173+
```
174+
175+
Try running one of the examples from the top-level README, or write some custom Python code to query some available
176+
data files.
177+
178+
## Publishing a Release
179+
180+
### Publishing Apache Source Release
181+
182+
Once the vote passes, we can publish the release.
183+
184+
Create the source release tarball:
185+
186+
```bash
187+
./dev/release/release-tarball.sh 0.2.0 1
188+
```
189+
190+
### Publishing Python Artifacts to PyPi
191+
192+
Go to the Test PyPI page of Datafusion, and download
193+
[all published artifacts](https://test.pypi.org/project/datafusion-ray/#files) under `dist-release/` directory. Then proceed
194+
uploading them using `twine`:
195+
196+
```bash
197+
twine upload --repository pypi dist-release/*
198+
```
199+
200+
### Push the Release Tag
201+
202+
```bash
203+
git checkout 0.2.0-rc1
204+
git tag 0.2.0
205+
git push apache 0.2.0
206+
```
207+
208+
### Add the release to Apache Reporter
209+
210+
Add the release to https://reporter.apache.org/addrelease.html?datafusion with a version name prefixed with `DATAFUSION-RAY`,
211+
for example `DATAFUSION-RAY-0.2.0`.
212+
213+
The release information is used to generate a template for a board report (see example from Apache Arrow
214+
[here](https://github.com/apache/arrow/pull/14357)).
215+
216+
### Delete old RCs and Releases
217+
218+
See the ASF documentation on [when to archive](https://www.apache.org/legal/release-policy.html#when-to-archive)
219+
for more information.
220+
221+
#### Deleting old release candidates from `dev` svn
222+
223+
Release candidates should be deleted once the release is published.
224+
225+
Get a list of DataFusion release candidates:
226+
227+
```bash
228+
svn ls https://dist.apache.org/repos/dist/dev/datafusion | grep datafusion-ray
229+
```
230+
231+
Delete a release candidate:
232+
233+
```bash
234+
svn delete -m "delete old DataFusion RC" https://dist.apache.org/repos/dist/dev/datafusion/apache-datafusion-ray-0.1.0-rc1/
235+
```
236+
237+
#### Deleting old releases from `release` svn
238+
239+
Only the latest release should be available. Delete old releases after publishing the new release.
240+
241+
Get a list of DataFusion releases:
242+
243+
```bash
244+
svn ls https://dist.apache.org/repos/dist/release/datafusion | grep datafusion-ray
245+
```
246+
247+
Delete a release:
248+
249+
```bash
250+
svn delete -m "delete old DataFusion release" https://dist.apache.org/repos/dist/release/datafusion/datafusion-ray-0.1.0
251+
```

dev/release/check-rat-report.py

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
#!/usr/bin/python
2+
##############################################################################
3+
# Licensed to the Apache Software Foundation (ASF) under one
4+
# or more contributor license agreements. See the NOTICE file
5+
# distributed with this work for additional information
6+
# regarding copyright ownership. The ASF licenses this file
7+
# to you under the Apache License, Version 2.0 (the
8+
# "License"); you may not use this file except in compliance
9+
# with the License. You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing,
14+
# software distributed under the License is distributed on an
15+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
16+
# KIND, either express or implied. See the License for the
17+
# specific language governing permissions and limitations
18+
# under the License.
19+
##############################################################################
20+
import fnmatch
21+
import re
22+
import sys
23+
import xml.etree.ElementTree as ET
24+
25+
if len(sys.argv) != 3:
26+
sys.stderr.write("Usage: %s exclude_globs.lst rat_report.xml\n" % sys.argv[0])
27+
sys.exit(1)
28+
29+
exclude_globs_filename = sys.argv[1]
30+
xml_filename = sys.argv[2]
31+
32+
globs = [line.strip() for line in open(exclude_globs_filename, "r")]
33+
34+
tree = ET.parse(xml_filename)
35+
root = tree.getroot()
36+
resources = root.findall("resource")
37+
38+
all_ok = True
39+
for r in resources:
40+
approvals = r.findall("license-approval")
41+
if not approvals or approvals[0].attrib["name"] == "true":
42+
continue
43+
clean_name = re.sub("^[^/]+/", "", r.attrib["name"])
44+
excluded = False
45+
for g in globs:
46+
if fnmatch.fnmatch(clean_name, g):
47+
excluded = True
48+
break
49+
if not excluded:
50+
sys.stdout.write(
51+
"NOT APPROVED: %s (%s): %s\n"
52+
% (clean_name, r.attrib["name"], approvals[0].attrib["name"])
53+
)
54+
all_ok = False
55+
56+
if not all_ok:
57+
sys.exit(1)
58+
59+
print("OK")
60+
sys.exit(0)

0 commit comments

Comments
 (0)