Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
844 commits
Select commit Hold shift + click to select a range
916b36f
cleaner location for mounted in cocalc code
williamstein Sep 12, 2025
902230e
project-runner: implement overlay so that rootfs changes persist
williamstein Sep 12, 2025
ad488b5
disable feature detection
williamstein Sep 12, 2025
008d2ea
implementation of overlayfs root for podman
williamstein Sep 12, 2025
6c7c0c0
place in database for root image for project
williamstein Sep 12, 2025
387f386
embrace overlayfs
williamstein Sep 12, 2025
cc37d82
project runner -- improve logging
williamstein Sep 12, 2025
2963074
clear workdir and more aggressively unmount overlayfs
williamstein Sep 12, 2025
1ed5e3e
podman start -- use the image
williamstein Sep 13, 2025
94a1928
delete all use of nsjail for running projects -- podman is obviously …
williamstein Sep 13, 2025
a34b789
implement UI for selecting project's root filesystem image
williamstein Sep 13, 2025
ce88870
clean up env and also use the env specified by the container
williamstein Sep 13, 2025
6d74ca9
podman: implement limits
williamstein Sep 13, 2025
eb363da
lite: enable compression for http server
williamstein Sep 13, 2025
7dfd8af
new lite version
williamstein Sep 13, 2025
2f870ca
Merge branch 'fs2' of github.com:sagemathinc/cocalc into fs2
williamstein Sep 13, 2025
3eaddec
Merge branch 'fs2' of github.com:sagemathinc/cocalc into fs2
williamstein Sep 13, 2025
5a82110
deleting code from project related to sync-fs / websocketfs
williamstein Sep 13, 2025
c6a56a9
project: delete brower http-server
williamstein Sep 13, 2025
1378a3e
delete more project code we no longer need
williamstein Sep 13, 2025
0ed8b01
delete old compute server code and sync-fs. Good riddance!
williamstein Sep 13, 2025
4174dad
Merge branch 'master' into fs2
williamstein Sep 13, 2025
08a4e9c
I did the merge incorrectly (previous commit)
williamstein Sep 13, 2025
a0dbc76
rewrite "open" script in vanilla self contained javascript
williamstein Sep 13, 2025
3b3c72e
improve how build-lite works so no need to commit first and no copy o…
williamstein Sep 13, 2025
a7d1b6d
get cocalc-lite to build without any reliance on git
williamstein Sep 13, 2025
a333846
more build cleanup; making the binaries named in a clean way.
williamstein Sep 14, 2025
5a2841b
making it so our SEA can also act as node
williamstein Sep 14, 2025
11ee020
refactoring SEA control code between lite and project-runner
williamstein Sep 14, 2025
220268f
with this the "open" command is moved to javascript and automatically…
williamstein Sep 14, 2025
2951992
make 'open' command also work robustly for cocalc-lite
williamstein Sep 14, 2025
3afb807
podman: solve zombie problem; qualify docker images
williamstein Sep 14, 2025
7e0a5b9
working on running projects; in particular WIP on supporting ssh
williamstein Sep 14, 2025
41ca7fa
correct approach to installing dropbear ssh server
williamstein Sep 14, 2025
e6d2333
add optional sshpiper install support
williamstein Sep 15, 2025
ba00a6a
fix some tests/relax warnings
williamstein Sep 15, 2025
e89f2b8
remove nsjail completely -- since for *our purposes* podman does ever…
williamstein Sep 15, 2025
76d0c8a
way to install all optional tools -- will be used by file-server
williamstein Sep 15, 2025
c2d8e09
add ssh key functionality to project
williamstein Sep 16, 2025
1f6f2b8
starting to work on the ssh server part of the file server
williamstein Sep 16, 2025
9a7baba
working on ssh server...
williamstein Sep 16, 2025
41d66d1
Merge branch 'master' into fs2
williamstein Sep 16, 2025
58aef30
Merge branch 'master' into fs2
williamstein Sep 16, 2025
d31a6a8
work in progress on file-server ssh gateway
williamstein Sep 16, 2025
3883638
work in progress on file server ssh gateway thingy
williamstein Sep 16, 2025
d8343b8
mutagen install automation
williamstein Sep 17, 2025
ca64e84
working on file-server ssh container
williamstein Sep 17, 2025
56d435f
file-server ssh: add config of .ssh path
williamstein Sep 17, 2025
01abd31
getting components of file-server ssh to work together...
williamstein Sep 17, 2025
e0f0266
automate starting sshpiperd server
williamstein Sep 17, 2025
2671494
ssh server: mount home directory properly
williamstein Sep 17, 2025
22e567a
don't try to stage mutagen on tmpfs
williamstein Sep 17, 2025
2f4bd74
include more of dropbear
williamstein Sep 17, 2025
e12986d
add dbclient ssh wrapper; add sync-times script
williamstein Sep 17, 2025
dcc545e
work in progress to use dropbear with mutagen seemlessly
williamstein Sep 17, 2025
97e0cb5
when project key changes for some reason, restart ssh container
williamstein Sep 17, 2025
fa6db2c
install dropbear
williamstein Sep 17, 2025
05b3b1d
rewriting to use sidecar for project runner and mutagen etc -- it's j…
williamstein Sep 17, 2025
d5cedb5
fully embrace sidecar for project runner
williamstein Sep 17, 2025
582c996
explain the sidecar container that currently provides mutagen's daemon
williamstein Sep 17, 2025
8ce861e
work in progress on file server
williamstein Sep 18, 2025
264bfdd
work in progress adding more state to project runner
williamstein Sep 18, 2025
8d7ae41
project runner with integrated sync/forwarding/ssh config
williamstein Sep 18, 2025
cbcf18b
project runner: refactor to be flexible and extensible now that we kn…
williamstein Sep 18, 2025
c8d8465
project-runner: quick proof of concept of rsync first
williamstein Sep 19, 2025
e07ea46
project runner: start/config of mutagen in sidecar
williamstein Sep 19, 2025
919fcad
add an rsync step in
williamstein Sep 19, 2025
36c3358
fix a number of issues with project-runner and file-server ssh server
williamstein Sep 19, 2025
ee95392
add readme about the project runner
williamstein Sep 19, 2025
bbf93bd
make not having commands like "file" non-fatal to open uknown files
williamstein Sep 19, 2025
8d3ca92
add way for project runner to log progress so user can see it.
williamstein Sep 19, 2025
f1fdc7e
add rsync progress for copying home directory
williamstein Sep 19, 2025
1c1199a
refactor rsync progress code
williamstein Sep 19, 2025
5e93fda
similar progress for rootfs
williamstein Sep 19, 2025
d1756ef
improve tracking of project transition states better, especially beca…
williamstein Sep 20, 2025
2f379c6
very rough first draft of frontend UI for seeing the project boot log
williamstein Sep 20, 2025
3461dfc
switch from dropbear to openssh in file-server container
williamstein Sep 20, 2025
5840c50
switch project sidecar to use ubuntu instead of alpine for maybe bett…
williamstein Sep 20, 2025
580a643
use lz4 compression for initial rsync's; do them in parallel
williamstein Sep 20, 2025
f7c13c4
make file-server container secure, hopefully
williamstein Sep 20, 2025
22419dc
make projects independent of the project-runner (detach), so that we …
williamstein Sep 21, 2025
1570feb
backend: fixes to match with stop all
williamstein Sep 21, 2025
7e409e0
store mutagen state on separate scratch volume
williamstein Sep 21, 2025
1f42efa
completely delete dropbear -- no use for us after all
williamstein Sep 21, 2025
98a3a57
on project start make sure that target upperdir exists on the fileser…
williamstein Sep 21, 2025
6651aa1
fix frontend issue (maybe) with resume from standby
williamstein Sep 21, 2025
9dae6bf
clean up code around naming of pods and containers
williamstein Sep 21, 2025
2719f50
give file-system container access to the filesystem tools
williamstein Sep 21, 2025
5aa88eb
add btm to our filesystem sandbox
williamstein Sep 21, 2025
0df95d3
idle timeout for file-server containers
williamstein Sep 21, 2025
4f65242
properly flush mutagen on project stop so no data is (potentially) lost
williamstein Sep 22, 2025
99cdebf
rely on mutagen instead of rsync once things are working and keep ses…
williamstein Sep 22, 2025
f35dcfc
put mutagen init in the main startup path so very clear when it fails
williamstein Sep 22, 2025
701a6a2
Merge branch 'master' into fs2
williamstein Sep 22, 2025
7d2aec1
merge conflicts -- resolve typescript issues
williamstein Sep 22, 2025
3081244
fix bug in install script
williamstein Sep 22, 2025
7370645
upgrade rustic; ripgrep --> rg rename; add version number checking to…
williamstein Sep 23, 2025
fc05a09
make it possible to use non-root user without having to save a manife…
williamstein Sep 23, 2025
91932d6
switch cpu to "fair sharing" instead of an arbitrary cap in the backe…
williamstein Sep 23, 2025
67bb445
equal sharing of cpu instead for projects
williamstein Sep 23, 2025
1bbecd4
workaround an import issue with ESM only modules and nextjs
williamstein Sep 23, 2025
91d3823
make it so when the FAIR_CPU_MODE const is set to true, then all ment…
williamstein Sep 23, 2025
c982f19
Merge branch 'master' into fs2
williamstein Sep 23, 2025
ceca092
Merge branch 'master' into fs2
williamstein Sep 23, 2025
e3fc428
Merge branch 'master' into fs2
williamstein Sep 23, 2025
febbfc7
make lowerdir visible in sidecar
williamstein Sep 23, 2025
8232652
podman project runner: use cleaner mount approach
williamstein Sep 24, 2025
c8d14c6
add project-runner as dep to file-server; use mountArg
williamstein Sep 24, 2025
de29e40
adding the lowerdir and rootfs image of project to file-server
williamstein Sep 24, 2025
84037f8
add enough permissions so we can rsync rootfs to the file server
williamstein Sep 24, 2025
696ae2e
implement scripts that do backup/restore for rootfs AND preserve uid/…
williamstein Sep 24, 2025
10d1afb
getting project startup to work with rootfs
williamstein Sep 24, 2025
fa9035d
project runner: automate saving rootfs
williamstein Sep 24, 2025
415ee5a
overlayfs sync revamp yet again: disable xattr and now straight rootl…
williamstein Sep 24, 2025
5c806c9
define location of overlay files in ONE PLACE to avoid bugs/confusion…
williamstein Sep 24, 2025
ddcfa2c
compute server awareness in root filesystem selector
williamstein Sep 24, 2025
275964c
make it so if you delete the overlayfs files in your project explicit…
williamstein Sep 24, 2025
e88fef2
make file-server container less secure so that it's possible to use i…
williamstein Sep 24, 2025
b39ba93
work on conat's client mutagen.forward and mutagen.sync
williamstein Sep 24, 2025
76f13d4
carefully revamp the rsync/mutagen exclude/ignores. It can be VERY c…
williamstein Sep 24, 2025
53aae3b
dropbear is back in
williamstein Sep 24, 2025
9e8e7cd
way less automatic snapshots by default
williamstein Sep 24, 2025
9028574
file-server --> core for ssh; starting work on ssh-to-project support
williamstein Sep 25, 2025
fb8ceac
working on core container
williamstein Sep 25, 2025
6746b7e
fix worker maybe
williamstein Sep 25, 2025
ac2d0d4
update http-proxy-3
williamstein Sep 25, 2025
3b4740e
broken work in progress in having the sidecar use a different HOME to…
williamstein Sep 25, 2025
a2fcb59
use explicit constants more to make code easier to use; fix some subt…
williamstein Sep 25, 2025
356c1e5
work in progress refactoring code to put things in constants related …
williamstein Sep 25, 2025
da88fae
mainly rewrite file-server ssh to not have containers as child processes
williamstein Sep 26, 2025
17b13c2
only consider initial load a success when a certain file is created t…
williamstein Sep 26, 2025
1042a4e
add ssh server to projects
williamstein Sep 26, 2025
1ba64a0
make ssh gateway understand standard ways to represent project_id for…
williamstein Sep 27, 2025
bc27f98
update the frontend ui with new username
williamstein Sep 27, 2025
8e1414e
add an endpoint to get ssh keys for a specific project
williamstein Sep 27, 2025
d7f6252
refactor code for calling hub from project
williamstein Sep 27, 2025
da7de58
add some info about where conat hub api is used
williamstein Sep 27, 2025
0bffaf4
project ssh: make it have all the env vars; also make logs visible vi…
williamstein Sep 27, 2025
c46d260
write code for managing authorized keys files
williamstein Sep 27, 2025
a1770b9
automatically set authorized keys
williamstein Sep 27, 2025
798615b
project api: endpoint to update authorized_keys file
williamstein Sep 27, 2025
33e3088
make it so ssh keys are automatically updated for projects as soon as…
williamstein Sep 27, 2025
b2ab927
account for stale state in updating ssh keys
williamstein Sep 27, 2025
32fc778
Merge branch 'master' into fs2
williamstein Sep 27, 2025
2917aab
Merge branch 'master' into fs2
williamstein Sep 27, 2025
5c32b6f
Merge branch 'master' into fs2
williamstein Sep 27, 2025
62d747d
Merge branch 'master' into fs2
williamstein Sep 27, 2025
1fb33f1
so a stack trace for this close -- hopefully this is better (?)
williamstein Sep 27, 2025
aaa4e1d
improve project start/stop/restart -- not done yet
williamstein Sep 27, 2025
9444e6d
make project start more robust
williamstein Sep 27, 2025
a21a4b0
project frontend: way to surface any start/stop error and a "force st…
williamstein Sep 27, 2025
96fbc22
mainly wiring up proxy server for project (doesn't work properly yet)
williamstein Sep 27, 2025
d83e2eb
work in progress on project proxy server
williamstein Sep 27, 2025
bb6ccd7
project proxy is fully working
williamstein Sep 27, 2025
a91e026
implement http proxy for projects
williamstein Sep 28, 2025
065e91c
refactor file-server http proxy code to be more flexible
williamstein Sep 28, 2025
fda4d38
integrate project proxy servers with hub (deleting the old ones)
williamstein Sep 28, 2025
1a87271
make the UI for managing ssh keys more like what you get with a file …
williamstein Sep 28, 2025
62e915a
allow "proxy" as alias for "server" for compat with vscode
williamstein Sep 28, 2025
274bedb
clean up the port naming; ensure core mutagen forwards are defined on…
williamstein Sep 28, 2025
c811daa
project startup -- moving scripts around, etc.
williamstein Sep 28, 2025
8f82dda
project --> sshd rename, which makes more sense; always kill file-ser…
williamstein Sep 28, 2025
1294d59
fix some timeouts related to starting projects
williamstein Sep 28, 2025
4828f56
fix some backup conflict issues (mostly in the UI)
williamstein Sep 28, 2025
a26b099
project runner: unmounting when point doesn't exist shouldn't throw e…
williamstein Sep 28, 2025
6fc1790
render bootlog info more nicely
williamstein Sep 29, 2025
6e43c34
support multiple project runners with stickiness and explicit ability…
williamstein Sep 29, 2025
0c16f5c
fix some errors in testing
williamstein Sep 29, 2025
6d6dd20
update project runner
williamstein Sep 29, 2025
cdd6807
rewrite named server launchers
williamstein Sep 29, 2025
e38be79
revamp the runner recipes to use cmd/args and also properly manage ch…
williamstein Sep 29, 2025
f4e38d3
get basic launching of servers to work again...
williamstein Sep 30, 2025
7b843de
add xpra as a server app
williamstein Sep 30, 2025
a26b133
factor out getting the rootfs base from an OCI image
williamstein Sep 30, 2025
4235db0
project runner: working on improving startup -- WIP on better error …
williamstein Sep 30, 2025
27aefd0
fix issues with jupyter launcher
williamstein Sep 30, 2025
6549230
reorg how project startup works to be more log friendly and better tr…
williamstein Sep 30, 2025
58f0567
make project control frontend slightly more usable
williamstein Sep 30, 2025
f663de7
add progress reporting for pulling podman image
williamstein Sep 30, 2025
770e183
Merge branch 'master' into fs2
williamstein Oct 1, 2025
88f4d0f
project-runner: add image extract progress bar updates
williamstein Oct 1, 2025
909bd9e
show time estimate in bootlog
williamstein Oct 1, 2025
808c7be
Merge branch 'master' into fs2
williamstein Oct 1, 2025
3adcbc1
fix bug in an error message and a confusing kernel switch message
williamstein Oct 1, 2025
f9e4831
make install of mutagen more cross-platform
williamstein Oct 1, 2025
5225d89
fixing install of sandbox binaries for macos
williamstein Oct 1, 2025
b03757c
small fix so automated app signing for mac works again
williamstein Oct 1, 2025
df42e08
share server -- get it to work using conat fs; not efficient yet
williamstein Oct 1, 2025
4b6856d
Merge branch 'fs2' of github.com:sagemathinc/cocalc into fs2
williamstein Oct 1, 2025
1dfdee1
implement share server directory listing efficiently using the getLis…
williamstein Oct 1, 2025
f0ea98d
rewriting share server to work with conat/fs2
williamstein Oct 1, 2025
8e7db5c
Merge branch 'master' into fs2
williamstein Oct 1, 2025
0e4a543
write an architecture overview
williamstein Oct 2, 2025
9d99e48
Merge branch 'revert-sticky-routing' into fs2
williamstein Oct 2, 2025
775a611
remove sync-mtime -- we will use rsync for this when we need somethin…
williamstein Oct 2, 2025
d27f613
add openssh to the sandbox tools
williamstein Oct 2, 2025
e8c8ebc
add way for user to explicitly save a project
williamstein Oct 2, 2025
5969b1c
fixing issues with cloning projects (not done)
williamstein Oct 2, 2025
37170dd
working on cloning/saving project stuff
williamstein Oct 3, 2025
cf64ed5
fix frontend issue with opening a file after searching
williamstein Oct 3, 2025
e270a4e
rewriting ssh keys approach
williamstein Oct 3, 2025
d4dc558
properly configure ssh config when cloning project
williamstein Oct 3, 2025
0c4689c
change base quota to 1GB instead of 3GB
williamstein Oct 3, 2025
61bfe3e
refactor so project-runner now depends on file-server instead of the …
williamstein Oct 3, 2025
e7d0722
pasta --> slirp networking, to workaround a bug (and also maybe it is…
williamstein Oct 3, 2025
13c5366
implement support for main file-server for using existing btrfs files…
williamstein Oct 3, 2025
b15520b
supporting using btrfs on the project-runner
williamstein Oct 3, 2025
8602812
script to easily build podman from source
williamstein Oct 3, 2025
88a6130
implement swap properly and also disk quota on runners
williamstein Oct 3, 2025
9bfc8bc
add /scratch for projects
williamstein Oct 3, 2025
ef8527d
Merge branch 'master' into fs2
williamstein Oct 3, 2025
b6bf952
ts: tests
williamstein Oct 3, 2025
3b5fd3d
upgrade chokidar (which I think we don't even use); rewrite our actua…
williamstein Oct 3, 2025
c1e225f
do not use simple quotas with btrfs for now -- it's way too confusing…
williamstein Oct 3, 2025
415fc7e
sandbox: fix two security bugs checking parameters
williamstein Oct 3, 2025
0591ea0
implement quota for snapshots
williamstein Oct 3, 2025
19f249f
implement api so users can get the exact snapshot quotas
williamstein Oct 3, 2025
61816fb
fix bug in idle timeout monitor (uncaught exception)
williamstein Oct 4, 2025
5d77b18
add mutagen sync to file-server
williamstein Oct 4, 2025
00cae8b
save some code related to running bees
williamstein Oct 5, 2025
acfb6bd
switch to running bees using a simple nodejs function instead of some…
williamstein Oct 5, 2025
2b16adc
install and use bees
williamstein Oct 5, 2025
c3ba3e2
work in progress on wiring together filesystem sync api's
williamstein Oct 5, 2025
d5efcda
internal file-sync: first working version
williamstein Oct 5, 2025
d3ae3c8
file-sync: add ignores and ban certain paths
williamstein Oct 5, 2025
a6a88fd
sync: normalize output path when getting sync
williamstein Oct 5, 2025
aef97d5
sync: tweaking file watching code
williamstein Oct 5, 2025
c426f91
re-implementing how file watching and loading from disk for sync work
williamstein Oct 6, 2025
66dfc48
switch to lru cache for sandbox file state
williamstein Oct 6, 2025
7ecccdc
remove traces of older ignore on save approach
williamstein Oct 6, 2025
b293392
file watching: improve typescript
williamstein Oct 6, 2025
92b4906
upgrade rspack
williamstein Oct 6, 2025
2cad2f5
handling of directories/files being deleted now that we switched to c…
williamstein Oct 6, 2025
17b1a75
add back signal option
williamstein Oct 6, 2025
e4e065d
do not close on any file unlink
williamstein Oct 6, 2025
f4bf516
Merge branch 'master' into fs2
williamstein Oct 6, 2025
3923467
further comment out file stars and make a note that the pr was too ha…
williamstein Oct 6, 2025
21bda0f
refactoring backend file watcher to make it easier to understand and …
williamstein Oct 6, 2025
12c3c75
improve watching of file on disk by editor
williamstein Oct 6, 2025
22d7192
...
williamstein Oct 6, 2025
1ef285f
improve directory listing watcher (using stats from chokhidar, fixing…
williamstein Oct 6, 2025
1f7f308
add another missing "deadline" to dmp
williamstein Oct 7, 2025
a75d21e
change stability thresh
williamstein Oct 7, 2025
71cfaf7
Merge branch 'master' into fs2
williamstein Oct 7, 2025
77e785f
add back patch
williamstein Oct 7, 2025
72f3c7d
sync -- it doesn't matter when the file was read, just that it has be…
williamstein Oct 7, 2025
5e9edbc
backend watch: adjust some params and disable extremely verbose logging
williamstein Oct 7, 2025
7db8b39
implement patch approach to loading ipynb file from disk
williamstein Oct 8, 2025
8775c90
for now at least, disable the clean webpack plugin by default, since …
williamstein Oct 8, 2025
782400e
a disk usage button in explorer
williamstein Oct 8, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
54 changes: 25 additions & 29 deletions .github/workflows/make-and-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ jobs:
detached: true
- uses: actions/checkout@v4
- name: Install python3 requests
run: sudo apt-get install python3-requests
run: sudo apt-get install python3-requests python3-yapf
- name: Check doc links
run: cd src/scripts && python3 check_doc_urls.py || sleep 5 || python3 check_doc_urls.py

Expand Down Expand Up @@ -91,19 +91,15 @@ jobs:
# cache: "pnpm"
# cache-dependency-path: "src/packages/pnpm-lock.yaml"

- name: Download and install Valkey
run: |
VALKEY_VERSION=8.1.2
curl -LOq https://download.valkey.io/releases/valkey-${VALKEY_VERSION}-jammy-x86_64.tar.gz
tar -xzf valkey-${VALKEY_VERSION}-jammy-x86_64.tar.gz
sudo cp valkey-${VALKEY_VERSION}-jammy-x86_64/bin/valkey-server /usr/local/bin/
- name: Install btrfs-progs and bup for @cocalc/file-server
run: sudo apt-get update && sudo apt-get install -y btrfs-progs bup

- name: Set up Python venv and Jupyter kernel
run: |
python3 -m pip install --upgrade pip virtualenv
python3 -m virtualenv venv
source venv/bin/activate
pip install ipykernel
pip install ipykernel yapf
python -m ipykernel install --prefix=./jupyter-local --name python3-local --display-name "Python 3 (Local)"


Expand All @@ -128,30 +124,30 @@ jobs:
name: "test-results-node-${{ matrix.node-version }}-pg-${{ matrix.pg-version }}"
path: 'src/packages/*/junit.xml'

report:
runs-on: ubuntu-latest
# report:
# runs-on: ubuntu-latest

needs: [test]
# needs: [test]

if: ${{ !cancelled() }}
# if: ${{ !cancelled() }}

steps:
- name: Checkout code
uses: actions/checkout@v4
# steps:
# - name: Checkout code
# uses: actions/checkout@v4

- name: Download all test artifacts
uses: actions/download-artifact@v4
with:
pattern: "test-results-*"
merge-multiple: true
path: test-results/
# - name: Download all test artifacts
# uses: actions/download-artifact@v4
# with:
# pattern: "test-results-*"
# merge-multiple: true
# path: test-results/

- name: Test Report
uses: dorny/test-reporter@v2
with:
name: CoCalc Jest Tests
path: 'test-results/**/junit.xml'
reporter: jest-junit
use-actions-summary: 'true'
fail-on-error: false
# - name: Test Report
# uses: dorny/test-reporter@v2
# with:
# name: CoCalc Jest Tests
# path: 'test-results/**/junit.xml'
# reporter: jest-junit
# use-actions-summary: 'true'
# fail-on-error: false

16 changes: 16 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -164,8 +164,24 @@ src/.claude/settings.local.json

# test reports by jest-junit
junit.xml


sea-prep.blob
cocalc
cocalc-lite.tar.*
*.egg-info
.python-version
src/packages/lite/sea/cocalc*gz
src/packages/lite/sea/cocalc*xz
src/packages/lite/sea/cocalc*zip
src/packages/lite/sea/cocalc*gnu

g

# autogenerated docs
**/cocalc-api/site/**
*.pkg
*.zip

src/packages/lite/build
src/packages/project-runner/build
236 changes: 236 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
# CoCalc2 Architecture Overview (Draft)

> This is a working draft meant to capture the current design in one place.

---

## Goals & Non‑Goals

**Goals**

- Fast, durable, multi‑tenant project storage with clear quotas.
- Predictable save from project runner VMs to the central file server \(no “I did work but it can’t be saved”\).
- Efficient storage via transparent compression; simple mental model for users.
- Rolling snapshots for user self‑service restore. Separate quota for snapshots, which users mostly don't worry about.

**Non‑Goals**

- Per‑user UID separation on runners \(we rely on containerization and subvolume quotas instead\).
- Snapshots on runner VMs \(server owns snapshot history; runners are ephemeral\).

---

## High‑Level Components

1. **Central File Server** \(single large Btrfs filesystem\)
- One Btrfs **subvolume per project** \(live working set\).
- Compression enabled \(e.g., `zstd`\).
- **Qgroups/Quotas** enabled for hard limits.
- **Rolling snapshots** per project for user restore.
- Named **user created snapshots**.

2. **Project Runner VMs** \(many; fast local SSD\)
- Also Btrfs with compression and **per‑project subvolumes**.
- Hard quotas sized slightly below the server quota to maintain save‑back headroom.
- No persistent snapshots \(might use short‑lived read only snapshots for atomic rsync of rootfs\).

3. **Sync Layer**
- **Mutagen**: near real‑time sync for user home files.
- **rsync**: periodic sync for the container rootfs upper overlay.

4. **Web UI & Services**
- Surfaced usage and limits \(live and snapshots\), snapshot browser/restore, warnings.

---

## Storage Model & Quotas

### Per‑Project Subvolume (File Server)

- Each project lives at `/mnt/project-<project-id>` as its **own subvolume**.
- **Compression** is enabled at the filesystem level; **quotas are enforced** _**after compression**_.
- Two distinct quota budget buckets:
- **Live quota**: applies to the live subvolume.
- **Snapshots quota**: applies to the aggregate of _all_ snapshots for that project.
- Quota for snapshots will be a simple function \(probably 2x\) of the live quota.

### Qgroups Structure

- Btrfs assigns each subvolume an implicit qgroup `0/<live-id>`.
- We create an **aggregate qgroup** `1/<live-id>` for that project’s snapshots.
- We apply limits:
- **Live**: limit `0/<live-id>` \(or the path directly\) to, say, **10 GiB**.
- **Snapshots**: limit `1/<live-id>` to, say, **20 GiB** total across all snapshots.
- On snapshot creation, we assign the snapshot’s `0/<snap-id>` **into** `1/<live-id>`.
- Using the **live subvolume ID as the aggregate id** avoids external ID bookkeeping.

### Runner VM Quotas

- Each runner has a **per‑project subvolume** with **quota set to ~85–90%** of the server’s live quota.
- Rationale: keeps **headroom** so save‑back to the server succeeds even if compression ratios differ.

### User‑Facing Explanation (docs‑ready blurb)

> **Storage quota is measured after compression.** Your project has a quota that measures the actual space consumed on disk. If your data compresses well, the sum of file sizes you see in the editor may exceed your quota and still fit. Snapshots have a separate quota \(twice the project quota\) that limits how much historical data is retained.

---

## Snapshots

- **Where**: server only, per project \(no long‑term snapshots on runners\).
- **How**: periodic RO snapshots \(e.g., 15 minute/hourly/daily/weekly retention\).
- **Budget**: snapshots all share the project’s **snapshot quota** \(`1/<live-id>` limit\). When the budget is exceeded, the snapshot retention policy prunes oldest automatic snapshots until under budget. Explicit user created named snapshots are not automatically deleted.
- **Self‑service**: UI lets users browse/restore from snapshots; command line restore via rsync is also supported.

> **Note**: Runner nodes may take a **short‑lived RO snapshot** strictly for consistent `rsync` (copy‑on‑write point‑in‑time view), then delete it immediately after sync completes. This does not change policy: history lives on the server.

---

## Data Flow

1. **Active work on runner**
- User edits files in their per‑project subvolume on a runner.
- **Mutagen** streams home‑dir changes to the server nearly immediately. In case of file change conflicts the central file server always wins.
- **rsync** pushes the rootfs overlay periodically \(e.g., every minute\) from a transient snapshot for consistency.

2. **File Server receives changes**
- Writes land in the project’s live subvolume, bounded by the live quota.
- Periodic snapshots capture history and consume from the snapshots quota.

3. **Restore**
- Users restore individual files or directories from snapshots via UI or CLI.

---

## Operational Procedures

The following is roughly what the actual Javascript code in `packages/file-server` does.

### One‑Time Setup (per filesystem)

```bash
# Enable quotas once
sudo btrfs quota enable /mnt/fs
# Optional after bulk ops or enabling late
sudo btrfs quota rescan -w /mnt/fs
```

### Create a New Project (Server)

```bash
# Live subvolume
sudo btrfs subvolume create /mnt/project-$PROJECT_ID

# Set live quota (example: 10 GiB)
sudo btrfs qgroup limit 10G /mnt/project-$PROJECT_ID

# Snapshot aggregate group uses the live subvolume ID
LIVEID=$(sudo btrfs subvolume show /mnt/project-$PROJECT_ID | awk '/Subvolume ID:/ {print $3}')

# Create and limit the snapshots group
sudo btrfs qgroup create 1/$LIVEID /mnt/
sudo btrfs qgroup limit 20G 1/$LIVEID /mnt/ # example snapshots budget
```

### Snapshot Creation (Server)

```bash
# Create RO snapshot
TS=$(date -u +%Y%m%dT%H%M%SZ)
SNAP=/mnt/project-$PROJECT_ID/.snapshots/$TS
sudo btrfs subvolume snapshot -r /mnt/project-$PROJECT_ID "$SNAP"

# Assign snapshot to the project’s snapshot group
SNAPID=$(sudo btrfs subvolume show "$SNAP" | awk '/ID:/ {print $2}')
LIVEID=$(sudo btrfs subvolume show /mnt/project-$PROJECT_ID | awk '/ID:/ {print $2}')
sudo btrfs qgroup assign 0/$SNAPID 1/$LIVEID /mnt
```

### Runner Subvolume & Quota

```bash
# Create per‑project subvolume on runner
sudo btrfs subvolume create /runnerfs/project-$PROJECT_ID

# Set runner quota to ~90% of server limit (example: 9 GiB)
sudo btrfs qgroup limit 9G /runnerfs/project-$PROJECT_ID
```

### Rsync from Runner \(optional transient snapshot\)

```bash
# (TODO)
P=/runnerfs/projects/$PROJ
TS=$(date -u +%Y%m%dT%H%M%SZ)
rsync -aHAX --delete ... file-server:/mnt/projects-$PROJECT_ID/.local/overlay/...
```

### Inspecting Usage

```bash
# Qgroup usage (referenced/exclusive, human‑readable)
sudo btrfs qgroup show -reF /mnt | less

# Filesystem space by class (useful with compression)
sudo btrfs filesystem df /mnt
```

---

## Policies & Safety

- **Hard quotas**: enforced by the kernel via qgroups \(both server and runner\). When a project exceeds its quota, writes fail with ENOSPC scoped to that subvolume.
- **Headroom on runners**: prevents the common failure mode where work done on a runner can’t be saved back to the server due to tighter server limits or different compression ratios.
- **User guidance**: expose a `~/scratch` directory \(separate subvolume and policy\) for large temporary files not intended for sync—reduces quota pressure on the live budget.
- **Performance knobs**: `compress=zstd[:3]`, `ssd`, `discard=async`. Consider `autodefrag` only for heavy small‑random‑write workloads. Set `chattr +C` sparingly on paths needing no‑CoW \(trades off checksumming\).
- **Dedup** on runners: optional **bees** on runners to reduce local SSD usage; measure CPU/IO overhead under realistic load. Use reflink copy\-on\-write when possible \(e.g., cloning projects\).
- **Dedup** on file server: optional **bees** to reduce disk usage. Also extensively use copy\-on\-write, e.g., when copying files between projects.

---

## Failure Modes & Mitigations

- **Runner quota exceeded** → user sees ENOSPC early; save‑back fails fast and visibly. UI should warn near 80–90%.
- **Server live quota exceeded** → incoming syncs fail; UI callouts \+ guidance to delete files or increase quota.
- **Snapshot budget exceeded** → retention pruner deletes oldest snapshots until under budget.
- **Qgroup counter drift** \(rare, after crashes/bulk ops\) → `btrfs quota rescan -w` to reconcile.
- **Filesystem nearly full** → monitor `btrfs filesystem df`; alert admins before metadata pools are pressured.

---

## Observability (What to Monitor)

- Live and snapshots usage per project (qgroup referenced/exclusive).
- Runner vs server usage deltas (to detect pathological compression differences).
- Snapshot creation latency; pruner actions count.
- Error rates from mutagen/rsync; ENOSPC events; quota rescans.

---

## FAQ (User‑Facing)

**Q: My files add up to more than my quota, but I’m not blocked. Why?**
A: Quotas measure space **after compression**. If your data compresses well, you can store more than the sum of uncompressed file sizes.

**Q: Do snapshots count against my main quota?**
A: No. Snapshots have a **separate budget which is twice your main quota**. When that fills, older snapshots are pruned automatically.

**Q: What happens if I hit the quota while working?**
A: New writes fail with “out of space.” Delete data or request a higher quota, then try again.

**Q: Can I keep big temporary outputs?**
A: Use `~/scratch` \(limited retention and a separate quota\). Only the project’s live area is synced and counted against your main quota.

---

## Appendix: Rationale for Design Choices

- **Per‑project subvolumes** enable kernel‑level quotas, small blast radius, and fast deletion.
- **Server‑side snapshots only** simplify reasoning about history, save SSD cycles on runners, and reduce operational complexity.
- **Aggregate snapshot qgroup** provides a single dial for “how much history a project can accumulate.”
- **Runner quotas < server quotas** provide a simple, robust guardrail against save‑back failures due to compression variance.

---

_End of draft._

8 changes: 0 additions & 8 deletions src/compute/.npmrc

This file was deleted.

10 changes: 0 additions & 10 deletions src/compute/README.md

This file was deleted.

1 change: 0 additions & 1 deletion src/compute/api-client

This file was deleted.

1 change: 0 additions & 1 deletion src/compute/backend

This file was deleted.

1 change: 0 additions & 1 deletion src/compute/comm

This file was deleted.

Loading
Loading