Skip to content

Commit abfac4c

Browse files
committed
Revert "fix #7342 -- compute server file sync should use ctime instead of mtime"
This reverts commit cd3d502.
1 parent 09780b8 commit abfac4c

File tree

4 files changed

+25
-25
lines changed

4 files changed

+25
-25
lines changed

src/packages/sync-fs/README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ The code here helps with periodically syncing a compute server and the project,
1010
where the compute server uses unionfs\-fuse combined with websocketfs, and the
1111
project uses a non\-FUSE fast local filesystem \(e.g., ext4 or zfs\). This
1212
algorithm will result in the file systems being equal if there is no activity for
13-
a few seconds. It's meant to provide a "last one to change the file wins", BUT
13+
a few seconds. ItϨ's meant to provide a "last on to change the path wins", BUT
1414
with a tolerance of maybe ~10 seconds, especially for deletes.
1515

1616
This does not use inotify or any other file watching because cocalc runs on
@@ -28,13 +28,13 @@ The actual sync works as follows. For now, we will do this periodically, possibl
2828
by active usage signals from the user.
2929

3030
**STEP 1:** On the compute server, make a map from all paths in upper \(both directories and files and whiteouts\),
31-
except ones excluded from sync, to the ctime for the path \(or negative ctime for deleted paths\):
31+
except ones excluded from sync, to the mtime for the path \(or negative mtime for deleted paths\):
3232

3333
```javascript {kernel="javascript"}
34-
computeState = {[path:string]:ctime of last change to file}
34+
computeState = {[path:string]:mtime of last change to file metadata}
3535
```
3636

37-
**IMPORTANT: We use ctimes in integer seconds, rounding down, since that's what tar does.** Also, a 1 second resolution is more than enough for our application.
37+
**IMPORTANT: We use mtimes in integer seconds, rounding down, since that's what tar does.** Also, a 1second resolution is more than enough for our application.
3838

3939
We store this in memory.
4040

@@ -57,12 +57,12 @@ if any of the following apply:
5757
- copy from project to compute
5858
- copy from compute to project
5959

60-
The decision about which is based on knowing the `ctime` of that path on compute, in the project,
60+
The decision about which is based on knowing the `mtime` of that path on compute, in the project,
6161
and whether or not the file was deleted \(and when\) on both sides. We know all this information
6262
for each path, so we _can_ make this decision. It is tricky for directories and files in them,
6363
but the information is all there, so we can make the decision. If there is a conflict, we resolve it
6464
by "last timestamp wins, with preference to the project in case of a tie". Note also that all
65-
`ctimes` are defined and this all happens on local filesystems \(not websocketfs\). It's also possible
65+
mtimes are defined and this all happens on local filesystems \(not websocketfs\). It's also possible
6666
to just decide not to do anything regarding a given path and wait until later, which is critical
6767
since we do not have point in time snapshots; if a file is actively being changed, we just wait until
6868
next time to deal with it.
@@ -120,11 +120,11 @@ This is a sync algorithm that depends on the existence of clocks. Therefore we
120120

121121
We amend the above protocol as follows. The compute server's message to the project also includes $t_c$ which is the number of ms since the epoch as far as the compute server is concerned. When the project receives the message, it computes its own time $t_p$. If $|t_c - t_p|$ is small, e.g., less than maybe 3 seconds, we just assume the clocks are properly sync'd and do nothing different. Otherwise, we assume the clock on $t_c$ is wrong. Instead of trying to fix it, we just shift all timestamps _provided by the compute server_ by adding $\delta = t_p - t_c$ to them. Also, when sending timestamps computed on the project to the compute server, we subtract $\delta$ from them. In this way everything should work and the compute server should be none the wiser.
122122

123-
Except that all the files in the tarballs have the wrong timestamps, so we would have to adjust the times of all these files. And of course all the lower layer filesystem timestamps are just going to be wrong no matter what. This is not something that can reasonably be done.
123+
Except that all the files in the tarballs have the wrong timestamps, so we would have to adjust the mtimes of all these files. And of course all the lower layer filesystem timestamps are just going to be wrong no matter what. This is not something that can reasonably be done.
124124

125125
OK, so our protocol instead is that if the time is off by at least 10s \(say\), then instead of doing sync, the project responds with an error message. This can then be surfaced to the user.
126126

127127
## Notes
128128

129-
- [mtime versus ctime](https://github.com/sagemathinc/cocalc/issues/7342): We do not use mtime at all. We do use ctime since whenever metadata or actual data changes, ctime changes, but mtime only changes when actual data changes \(or touch\). The time is used to decide in which direction to sync files when there is a conflict. It is NOT used as a threshold for whether or not to copy files at all. E.g., if you have an old file `a.c` and type `cp -a a.c a2.c` on the compute server, then `a2.c` does still get copied back to the project.
129+
- mtime versus ctime. We do not use ctime at all. We do use mtime, but it is used to decide in which direction to sync files when there is a conflict. It is NOT used as a threshold for whether or not to copy files at all. E.g., if you have an old file `a.c` and type `cp -a a.c a2.c` on the compute server, then `a2.c` does still get copied back to the project.
130130

src/packages/sync-fs/lib/handle-api-call.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ etc.
88
import { fromCompressedJSON } from "./compressed-json";
99
import getLogger from "@cocalc/backend/logger";
1010
import type { FilesystemState } from "./types";
11-
import { metadataFile, ctimeDirTree, remove, writeFileLz4 } from "./util";
11+
import { metadataFile, mtimeDirTree, remove, writeFileLz4 } from "./util";
1212
import { join } from "path";
1313
import { mkdir, rename, readFile, writeFile } from "fs/promises";
1414
import type { MesgSyncFSOptions } from "@cocalc/comm/websocket/types";
@@ -138,7 +138,7 @@ async function getProjectState(meta, exclude): Promise<FilesystemState> {
138138
if (!process.env.HOME) {
139139
throw Error("HOME must be defined");
140140
}
141-
const projectState = await ctimeDirTree({
141+
const projectState = await mtimeDirTree({
142142
path: process.env.HOME,
143143
exclude,
144144
metadataFile: meta,

src/packages/sync-fs/lib/index.ts

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ import {
1414
} from "fs/promises";
1515
import { basename, dirname, join } from "path";
1616
import type { FilesystemState /*FilesystemStatePatch*/ } from "./types";
17-
import { execa, ctimeDirTree, parseCommonPrefixes, remove } from "./util";
17+
import { execa, mtimeDirTree, parseCommonPrefixes, remove } from "./util";
1818
import { toCompressedJSON } from "./compressed-json";
1919
import SyncClient from "@cocalc/sync-client/lib/index";
2020
import { encodeIntToUUID } from "@cocalc/util/compute/manager";
@@ -653,28 +653,28 @@ class SyncFS {
653653
whiteouts: string[];
654654
}> => {
655655
// Create the map from all paths in upper (both directories and files and whiteouts),
656-
// except ones excluded from sync, to the ctime for the path (or negative ctime
657-
// for deleted paths): {[path:string]:ctime of last change to file}
656+
// except ones excluded from sync, to the ctime for the path (or negative mtime
657+
// for deleted paths): {[path:string]:mtime of last change to file metadata}
658658
const whiteLen = "_HIDDEN~".length;
659-
const computeState = await ctimeDirTree({
659+
const computeState = await mtimeDirTree({
660660
path: this.upper,
661661
exclude: this.exclude,
662662
});
663663
const whiteouts: string[] = [];
664664
const unionfs = join(this.upper, UNIONFS);
665-
const ctimes = await ctimeDirTree({
665+
const mtimes = await mtimeDirTree({
666666
path: unionfs,
667667
exclude: [],
668668
});
669-
for (const path in ctimes) {
670-
const ctime = ctimes[path];
669+
for (const path in mtimes) {
670+
const mtime = mtimes[path];
671671
if (path.endsWith("_HIDDEN~")) {
672672
const p = path.slice(0, -whiteLen);
673673
whiteouts.push(path);
674674
if ((await stat(join(unionfs, path))).isDirectory()) {
675675
whiteouts.push(p);
676676
}
677-
computeState[p] = -ctime;
677+
computeState[p] = -mtime;
678678
}
679679
}
680680

src/packages/sync-fs/lib/util.ts

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ export async function metadataFile({
2828
path: string;
2929
exclude: string[];
3030
}): Promise<string> {
31-
log("metadataFile", path, exclude);
31+
log("mtimeDirTree", path, exclude);
3232
if (!(await exists(path))) {
3333
return "";
3434
}
@@ -41,7 +41,7 @@ export async function metadataFile({
4141
// in them!
4242
// BUT -- we are assuming filenames can be encoded as utf8; if not, sync will
4343
// obviously not work.
44-
// - The find output contains more than just what is needed for ctimeDirTree; it contains
44+
// - The find output contains more than just what is needed for mtimeDirTree; it contains
4545
// everything needed by websocketfs for doing stat, i.e., this output is used
4646
// for the metadataFile functionality of websocketfs.
4747
// - Just a little fact -- output from find is NOT sorted in any guaranteed way.
@@ -61,7 +61,7 @@ export async function metadataFile({
6161
"-o",
6262
...findExclude(exclude),
6363
"-printf",
64-
"%p\\0%.10C@ %.10A@ %b %s %M\\0\\0",
64+
"%p\\0%.10T@ %.10A@ %b %s %M\\0\\0",
6565
]),
6666
{
6767
cwd: path,
@@ -70,13 +70,13 @@ export async function metadataFile({
7070
return stdout;
7171
}
7272

73-
// Compute the map from paths to their integral ctime for the entire directory tree
73+
// Compute the map from paths to their integral mtime for the entire directory tree
7474
// NOTE: this could also be done with the walkdir library, but using find
7575
// is several times faster in general. This is *the* bottleneck, and the
7676
// subprocess IO isn't much, so calling find as a subprocess is the right
7777
// solution! This is not a hack at all.
7878
// IMPORTANT: top level hidden subdirectories in path are always ignored
79-
export async function ctimeDirTree({
79+
export async function mtimeDirTree({
8080
path,
8181
exclude,
8282
metadataFile,
@@ -85,7 +85,7 @@ export async function ctimeDirTree({
8585
exclude: string[];
8686
metadataFile?: string;
8787
}): Promise<{ [path: string]: number }> {
88-
log("ctimeDirTree", path, exclude);
88+
log("mtimeDirTree", path, exclude);
8989
if (!(await exists(path))) {
9090
return {};
9191
}
@@ -102,7 +102,7 @@ export async function ctimeDirTree({
102102
"-o",
103103
...findExclude(exclude),
104104
"-printf",
105-
"%p\\0%C@\\0\\0",
105+
"%p\\0%T@\\0\\0",
106106
]);
107107
const { stdout } = await execa("find", [...args], {
108108
cwd: path,

0 commit comments

Comments
 (0)