Skip to content
Open
Changes from 1 commit
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 98 additions & 0 deletions scripts/prepare_release.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
It must run from a clean git worktree with initialized, clean git submodules.
When making an Mbed TLS >=4 release, the tf-psa-crypto submodule should
already contain a TF-PSA-Crypto release candidate.
This script will update the checked out git branch, if any.
On normal exit, the worktree contains the release candidate commit.

This script requires the following external tools:
- GNU tar (can be called ``gnutar`` or ``gtar``);
Expand Down Expand Up @@ -94,13 +96,16 @@ class Options(typing.NamedTuple):
"""
# Directory where the release artifacts will be placed.
artifact_directory: pathlib.Path
# Release date (YYYY-mm-dd).
release_date: str
# GNU tar command.
tar_command: str
# Version to release (None to read it from ChangeLog).
version: Optional[str]

DEFAULT_OPTIONS = Options(
artifact_directory=pathlib.Path(os.pardir),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
artifact_directory=pathlib.Path(os.pardir),
artifact_directory=os.path.join(pathlib.Path(os.pardir), self.artifact_base_name() + "_artifacts"),

Parent director can contain a lot of files, that we may not want to override. We should create an explicit directory to store all the release related artifcats

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this script should create a new directory. Especially by default: users are likely to miss it, or it could clash with something unrelated. The parent directory seems like a sensible default to me: you'll create mbedtls-1.2.3.tar.bz2 next to your mbedtls directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are in the mbedtls directory, and you cannot invoke the script from the parent directory, the chances are most users to look for the files and not find them, not knowing its on the parent directory.

And given that mbedtls rarely sits on its own, it will be an already cluttered directory. And the script is generating more than archives, it generates several files, which terminal users have to clean up by hand. Putting it on a directory makes it easier to clean up and archive from a ci perspective.

how about we use /tmp to create the artefacts, and then:

  • Move them on the root of mbedtls/tf-psa-crypto where the script will be invoked. They will be visible because of git, and we can clean them up using git clean -dfx
  • Move them in a release_artifacts subdir inside mbedtls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point taken that different people have different habits.

Also .. is not a good default when there is an explicit --directory argument, it should be the parent of the source directory to be somewhat sensible.

At first I didn't want to touch any files inside the source tree, in case we wanted to do something that can't cope with uncommitted files. But I think we can reasonably assume that either the script will robustly ignore uncommitted file, or there will be a simple way to exclude files.

So I propose to change this to a directory in the source tree. Of course that's the default and the command line argument will stay.

release_date=datetime.date.today().isoformat(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky. This is a sensible default from a code perpective,but it will impossible to produce a release and test/sign it off on the same day.

I would either remove the default and make arg.parser explode, or try to pass it from the 'os.env' so the release maker can set it when agreed and not bother with long command lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that the defaults would be suitable for running the script on the CI to make a mock release, and that we'd be more explicit when we run the script manually to make an actual release.

Or maybe we should change how we handle ChangeLog version sections: currently assemble_changelog.py checks the release date of the first section, and it creates a new section if it's a proper date but adds to the existing section if the date contains xx. I didn't want to change the semantics of assemble_changelog.py at the same time as introducing the release script, but there are other sensible policy. For example, we could decide that assemble_changelog.py always adds to the top section, and we create a new section as soon as a release is done. Then setting the release date would be a manual edit of ChangeLog.

tar_command=find_gnu_tar(),
version=None)

Expand Down Expand Up @@ -280,6 +285,39 @@ def assert_git_status(self) -> None:
"""
self.call_git(['diff', '--quiet'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use porcelain for this check git status --porcelain=v1 . Git diff will only care about tracked files, but you can have old autogenerated files or other user files in the directorly and would make it into the archive.

Also I would enclose it in a try/except block and print out a helper message to the user. Otherwis subprocess.CalledProcessError if the first message any new user will encounter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Untracked files don't matter. They won't be archived.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestions on making the message prettier here? The obvious solution is to print to stderr and exit(), but I'd rather not exit() in case this code is invoked from other Python code, for example we may want to add unit tests. A custom exception with add_note looks nice, but that requires a very recent Python.

        if not self.files_are_clean():
            raise Exception('There are uncommitted changes (maybe in submodules) in ' + str(self.info.top_dir))

is better than what's there now, do you think that's good enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script will exit anyway in the current form, with the full traceback, but wont be very informative.

We should at least print that it failed because the tree is not clean so at least they know that. We can also exit with an error code, so any script invoking it will also stop the execution flow.

On a tangent note it is quite easy in Python to detect when the script is called by a human operator as opposed to a script by using the built-in sys library sys.stdin.isatty() and sys.stdout.isatty()

So in theory you could have a custom print/log command that will display more human friendly messages when on an active tty and less when invoked by a CI script. ( not neccessarily needed here but since we mention it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The distinction isn't human vs CI, but imported vs standalone process.

Anyway I think the custom exception with an explanatory message is a reasonable compromise.


def files_are_clean(self, *files: str,
where: Optional[PathOrString] = None) -> bool:
"""Check whether the specified files are identical to their git version.

With no files, check the whole work tree (including submodules).

Pass `where` to specify a submodule (default: top level).
The file names are relative to the repository or submodule root,
and may not be in a submodule of `where`.
"""
try:
self.call_git(['diff', '--quiet', '--'] + list(files),
where=where)
return True
except subprocess.CalledProcessError as exn:
if exn.returncode != 1:
raise
return False

def git_commit_maybe(self,
files: List[str],
message: str) -> None:
"""Commit changes into Git.

Do nothing if there are no changed files.

The file names are relative to the toplevel directory,
and may not be in a submodule.
"""
if not self.files_are_clean(*files):
self.call_git(['add', '--'] + files)
self.call_git(['commit', '--signoff',
'-m', message])

def artifact_base_name(self) -> str:
"""The base name for a release artifact (file created for publishing).
Expand Down Expand Up @@ -326,6 +364,62 @@ def run(self) -> None:
raise NotImplementedError


class AssembleChangelogStep(Step):
"""Assemble the changelog and commit the result.

Create a new changelog section if needed.
Do nothing if the changelog is already fine.

Note: this step does not check or affect submodules.
"""

@classmethod
def name(cls) -> str:
return 'changelog'

def create_section(self, old_content: str) -> str:
"""Create a new changelog section for the version that we're releasing."""
product = self.info.product_human_name
version = self.info.version
new_section = f'= {product} {version} branch released xxxx-xx-xx\n\n'
return re.sub(r'(?=^=)',
new_section,
old_content, flags=re.MULTILINE, count=1)

def release_date_needs_updating(self) -> bool:
"""Whether the release date needs updating in the existing ChangeLog section."""
m = re.match(r'([0-9]+)-([0-9]+)-([0-9]+)', self.info.old_release_date)
if not m: # presumably xxxx-xx-xx
return True
# The date format is a lexicographic, fixed-width format,
# so we can just compare the strings.
return self.info.old_release_date < self.options.release_date

def finalize_release(self, old_content: str) -> str:
"""Update the version and release date in the changelog content."""
version = self.info.version
date = self.options.release_date
return re.sub(r'^(=.* )\S+( branch released )\S+$',
rf'\g<1>{version}\g<2>{date}',
old_content, flags=re.MULTILINE, count=1)

def run(self) -> None:
"""Assemble the changelog if needed."""
subprocess.check_call(['framework/scripts/assemble_changelog.py'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels wrong to invoke another interpreter to use functionality from another python script. Our can be made to interact with other with minimal changes:

For example in assemble_changelog.py can be achieved by

  • Move the argparser outside of the main and into the module.
  • call 'assemble_changelog.merge_entries(options)'

Alternatively

  • Use a sane default namedtuple for each python script that uses argparser ( we arleady partially do that)
Options = namedtuple("Options", ["dir", "input", "keep_entries", "output"])
DEFAULTS = Options(
    dir="Changelog.d",
    input="Changelog",
    keep_entries=False,
    output="Changelog",
)
  • Then call 'assemble_changelog.merge_entries(DEFAULTS)'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? assemble_changelog has a job and it does it. The fact that it's written in Python is an implementation detail.

cwd=self.info.top_dir)
if self.files_are_clean('ChangeLog') and \
self.info.old_version != self.info.version:
# Edge case: no change since the previous release.
# This could happen, for example, in an emergency release
# of Mbed TLS to ship a crypto bug fix, or when testing
# the release script.
self.edit_file('ChangeLog', self.create_section)
else:
self.edit_file('ChangeLog', self.finalize_release)
self.git_commit_maybe(['ChangeLog', 'ChangeLog.d'],
'Assemble changelog and set release date')


class ArchiveStep(Step):
"""Prepare release archives and the associated checksum file."""

Expand Down Expand Up @@ -487,6 +581,7 @@ def run(self) -> None:


ALL_STEPS = [
AssembleChangelogStep,
ArchiveStep,
] #type: Sequence[typing.Type[Step]]

Expand Down Expand Up @@ -526,6 +621,8 @@ def main() -> None:
parser.add_argument('--list-steps',
action='store_true',
help='List release steps and exit')
parser.add_argument('--release-date',
help='Release date (YYYY-mm-dd) (default: today)')
parser.add_argument('--tar-command',
help='GNU tar command')
parser.add_argument('--to', metavar='STEP',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if that is needed to be honest. I can see where the --from step can be usefull to do re-do things like assemble changelog.

I could also see a use for a --single-step function that would be considerably simpler to implement and very usefull.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used --{from,to}=step extensively during testing, I don't think it's too much harder to type than --single-step=step. I've also used --to on its own occasionally. It might even be more useful than --from since the steps are supposed to be idempotent. (Of course, if we need to do weird things e.g. for an emergency point release then “supposed to” might not actually be the case.)

Expand All @@ -540,6 +637,7 @@ def main() -> None:
return
options = Options(
artifact_directory=pathlib.Path(args.artifact_directory).absolute(),
release_date=args.release_date,
tar_command=args.tar_command,
version=args.version)
run(options, args.directory,
Expand Down