Skip to content

Conversation

mafredri
Copy link
Member

@mafredri mafredri commented Sep 16, 2025

This change adds a new archive module to the Coder registry. It can be used to archive user-data from pre-defined locations and restore it as well.

Here we also explore:

  • A new method of passing arrays from Terraform to Bash
  • A new method of writing Bash scripts that minimizes the interaction with terraform interpolation
  • Extensive test-suite that not only tests that Terraform options can be selected, but also the resulting script behaviors

@mafredri mafredri force-pushed the mafredri/module-archive branch 14 times, most recently from c9efaa3 to 90f7dbd Compare September 18, 2025 13:15
@mafredri mafredri force-pushed the mafredri/module-archive branch from 90f7dbd to 34eca10 Compare September 18, 2025 13:21
Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been testing this out locally.
Nice work! Left some comments down below (none blocking).
LMK when this is ready for a more in-depth review.

Comment on lines 13 to 31
- Depends on: `tar` (and `gzip` or `zstd` if you select those compression modes)
- Installed scripts:
- `$CODER_SCRIPT_BIN_DIR/coder-archive-create`
- `$CODER_SCRIPT_BIN_DIR/coder-archive-extract`
- Library installed to:
- `$CODER_SCRIPT_DATA_DIR/archive-lib.sh`
- On start (always): installer decodes and writes the library, then generates the wrappers in `$CODER_SCRIPT_BIN_DIR` with Terraform-provided defaults embedded.
- Optional on stop: when enabled, a separate `run_on_stop` script invokes the create command.

## Features

- Create a single `.tar`, `.tar.gz`, or `.tar.zst` containing selected paths.
- Compression algorithms: `gzip`, `zstd`, `none`.
- Defaults for directory, archive path, compression, include/exclude lists come from Terraform and can be overridden at runtime with CLI flags.
- Logs and status messages go to stderr; the create command prints only the final archive path to stdout.
- Strict bash mode and safe invocation of `tar`.
- Optional:
- `run_on_stop` to create an archive automatically when the workspace stops.
- `extract_on_start` to wait for an archive to appear and extract it on start (with timeout).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: you can probably combine these two

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combine which two? I'm worried about the user experience if we assume that extract on start means you also want to run on stop (if that's what you meant).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry -- I meant simply to combine both sets of bullet points, as there seems to be some slight duplication between them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a major cleanup of the README in 822cf15.

Comment on lines 81 to 84
paths = [for v in var.paths : replace(v, "/^~/", "$$HOME")]
exclude_patterns = [for v in var.exclude_patterns : replace(v, "/^~/", "$$HOME")]
directory = replace(var.directory, "/^~/", "$$HOME")
output_dir = replace(var.output_dir, "/^~/", "$$HOME")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this correctly, it only handles the ~ at the start of the string? I can't see a reason to have it anywhere else tbh.

Should we also match a path separator to avoid expanding e.g. ~username into /home/usernameusername?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but I'll need to look into if tf replace regex handles replacement like "/^~(\/?)/", "$1". Or we can move this into bash to avoid the repetition. Because we'll need to handle both "~", and "~/..."`.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b48daf9

@mafredri mafredri marked this pull request as ready for review September 19, 2025 14:31
@mafredri mafredri changed the title [WIP] feat(registry/coder/modules): add archive module feat(registry/coder/modules): add archive module Sep 19, 2025
@mafredri mafredri requested a review from johnstcn September 22, 2025 09:45
Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Just a couple of nits below.

};

const fileExists = async (id: string, path: string) => {
const res = await sh(id, `test -f ${path} && echo yes || echo no`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could this just be simplified to

await sh(id, `test -f ${path}; echo $?`);
return res.stdout.trim() === "1";

const extractDir = "/tmp/extract";
const extract = await bashRun(
id,
`${BIN_DIR}/coder-archive-extract -f ${out} -C ${extractDir}`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also be validating that these end up in $PATH?

e.g. without ${BIN_DIR}

awit bashRun(id, `coder-archive-extract ...`)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can put it in path manually, e.g. set CODER_SCRIPT_BIN_DIR=/usr/local/bin or something. But to make it a truly real test-case, we'd have to be running our commands inside a coder agent spawned environment.

Copy link
Member

@matifali matifali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change adds a new archive module to the Coder registry. It can be used to archive user-data from pre-defined locations and restore it as well.

Where do these locations live? I see in examples that they are archived in /tmp which is not persistent in our example templates. Should we add examples on how we can archive and upload to an external storage like a s2 bucket and then restore from that?

The 2nd suggestion is to move this to coder-labs namespace.

@mafredri
Copy link
Member Author

Where do these locations live? I see in examples that they are archived in /tmp which is not persistent in our example templates. Should we add examples on how we can archive and upload to an external storage like a s2 bucket and then restore from that?

That's up to the author currently. They can store the archive in a place that's persisted or upload it. Ideally this module can be kept focused on archiving, and uploading/downloading could be handled by another module or scripted by template author.

I can add an example how upload/download could be implemented.

The 2nd suggestion is to move this to coder-labs namespace.

Didn't realize we have this, that works 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants