Revert "fix #7342 -- compute server file sync should use ctime instead of mtime"

williamstein · williamstein · commit abfac4c19905 · 2024-10-21T12:36:16.000Z
This reverts commit cd3d502.
diff --git a/src/packages/sync-fs/README.md b/src/packages/sync-fs/README.md
@@ -10,7 +10,7 @@ The code here helps with periodically syncing a compute server and the project,
 where the compute server uses unionfs\-fuse combined with websocketfs, and the
 project uses a non\-FUSE fast local filesystem \(e.g., ext4 or zfs\). This
 algorithm will result in the file systems being equal if there is no activity for
-a few seconds. It's meant to provide a "last one to change the file wins", BUT
+a few seconds. ItϨ's meant to provide a "last on to change the path wins", BUT
 with a tolerance of maybe ~10 seconds, especially for deletes.
 
 This does not use inotify or any other file watching because cocalc runs on
@@ -28,13 +28,13 @@ The actual sync works as follows. For now, we will do this periodically, possibl
 by active usage signals from the user.
 
 **STEP 1:** On the compute server, make a map from all paths in upper \(both directories and files and whiteouts\),
-except ones excluded from sync, to the ctime for the path \(or negative ctime for deleted paths\):
+except ones excluded from sync, to the mtime for the path \(or negative mtime for deleted paths\):
 
 ```javascript {kernel="javascript"}
-computeState = {[path:string]:ctime of last change to file}
+computeState = {[path:string]:mtime of last change to file metadata}
 ```
 
-**IMPORTANT: We use ctimes in integer seconds, rounding down, since that's what tar does.** Also, a 1 second resolution is more than enough for our application.
+**IMPORTANT: We use mtimes in integer seconds, rounding down, since that's what tar does.** Also, a 1second resolution is more than enough for our application.
 
 We store this in memory.
 
@@ -57,12 +57,12 @@ if any of the following apply:
 - copy from project to compute
 - copy from compute to project
 
-The decision about which is based on knowing the `ctime` of that path on compute, in the project,
+The decision about which is based on knowing the `mtime` of that path on compute, in the project,
 and whether or not the file was deleted \(and when\) on both sides. We know all this information
 for each path, so we _can_ make this decision. It is tricky for directories and files in them,
 but the information is all there, so we can make the decision. If there is a conflict, we resolve it
 by "last timestamp wins, with preference to the project in case of a tie". Note also that all
-`ctimes` are defined and this all happens on local filesystems \(not websocketfs\). It's also possible
+mtimes are defined and this all happens on local filesystems \(not websocketfs\). It's also possible
 to just decide not to do anything regarding a given path and wait until later, which is critical
 since we do not have point in time snapshots; if a file is actively being changed, we just wait until
 next time to deal with it.
@@ -120,11 +120,11 @@ This is a sync algorithm that depends on the existence of clocks.  Therefore we
 
 We amend the above protocol as follows.  The compute server's message to the project also includes $t_c$ which is the number of ms since the epoch as far as the compute server is concerned.   When the project receives the message, it computes its own time $t_p$.  If  $|t_c - t_p|$ is small, e.g., less than maybe 3 seconds, we just assume the clocks are properly sync'd and do nothing different.  Otherwise, we assume the clock on $t_c$ is wrong.  Instead of trying to fix it, we just shift all timestamps _provided by the compute server_  by adding $\delta = t_p - t_c$ to them.  Also, when sending timestamps computed on the project to the compute server, we subtract $\delta$ from them.  In this way everything should work and the compute server should be none the wiser.
 
-Except that all the files in the tarballs have the wrong timestamps, so we would have to adjust the times of all these files.  And of course all the lower layer filesystem timestamps are just going to be wrong no matter what.  This is not something that can reasonably be done.
+Except that all the files in the tarballs have the wrong timestamps, so we would have to adjust the mtimes of all these files.  And of course all the lower layer filesystem timestamps are just going to be wrong no matter what.  This is not something that can reasonably be done.  
 
 OK, so our protocol instead is that if the time is off by at least 10s \(say\), then instead of doing sync, the project responds with an error message.  This can then be surfaced to the user.
 
 ## Notes
 
-- [mtime versus ctime](https://github.com/sagemathinc/cocalc/issues/7342):  We do not use mtime at all. We do use ctime since whenever metadata or actual data changes, ctime changes, but mtime only changes when actual data changes \(or touch\). The time is used to decide in which direction to sync files when there is a conflict.  It is NOT used as a threshold for whether or not to copy files at all.  E.g., if you have an old file `a.c` and type `cp -a a.c a2.c` on the compute server, then `a2.c` does still get copied back to the project.
+- mtime versus ctime.  We do not use ctime at all. We do use mtime, but it is used to decide in which direction to sync files when there is a conflict.  It is NOT used as a threshold for whether or not to copy files at all.  E.g., if you have an old file `a.c` and type `cp -a a.c a2.c` on the compute server, then `a2.c` does still get copied back to the project.
 
diff --git a/src/packages/sync-fs/lib/handle-api-call.ts b/src/packages/sync-fs/lib/handle-api-call.ts
@@ -8,7 +8,7 @@ etc.
 import { fromCompressedJSON } from "./compressed-json";
 import getLogger from "@cocalc/backend/logger";
 import type { FilesystemState } from "./types";
-import { metadataFile, ctimeDirTree, remove, writeFileLz4 } from "./util";
+import { metadataFile, mtimeDirTree, remove, writeFileLz4 } from "./util";
 import { join } from "path";
 import { mkdir, rename, readFile, writeFile } from "fs/promises";
 import type { MesgSyncFSOptions } from "@cocalc/comm/websocket/types";
@@ -138,7 +138,7 @@ async function getProjectState(meta, exclude): Promise<FilesystemState> {
   if (!process.env.HOME) {
     throw Error("HOME must be defined");
   }
-  const projectState = await ctimeDirTree({
+  const projectState = await mtimeDirTree({
     path: process.env.HOME,
     exclude,
     metadataFile: meta,
diff --git a/src/packages/sync-fs/lib/index.ts b/src/packages/sync-fs/lib/index.ts
@@ -14,7 +14,7 @@ import {
 } from "fs/promises";
 import { basename, dirname, join } from "path";
 import type { FilesystemState /*FilesystemStatePatch*/ } from "./types";
-import { execa, ctimeDirTree, parseCommonPrefixes, remove } from "./util";
+import { execa, mtimeDirTree, parseCommonPrefixes, remove } from "./util";
 import { toCompressedJSON } from "./compressed-json";
 import SyncClient from "@cocalc/sync-client/lib/index";
 import { encodeIntToUUID } from "@cocalc/util/compute/manager";
@@ -653,28 +653,28 @@ class SyncFS {
     whiteouts: string[];
   }> => {
     // Create the map from all paths in upper (both directories and files and whiteouts),
-    // except ones excluded from sync, to the ctime for the path (or negative ctime
-    // for deleted paths):  {[path:string]:ctime of last change to file}
+    // except ones excluded from sync, to the ctime for the path (or negative mtime
+    // for deleted paths):  {[path:string]:mtime of last change to file metadata}
     const whiteLen = "_HIDDEN~".length;
-    const computeState = await ctimeDirTree({
+    const computeState = await mtimeDirTree({
       path: this.upper,
       exclude: this.exclude,
     });
     const whiteouts: string[] = [];
     const unionfs = join(this.upper, UNIONFS);
-    const ctimes = await ctimeDirTree({
+    const mtimes = await mtimeDirTree({
       path: unionfs,
       exclude: [],
     });
-    for (const path in ctimes) {
-      const ctime = ctimes[path];
+    for (const path in mtimes) {
+      const mtime = mtimes[path];
       if (path.endsWith("_HIDDEN~")) {
         const p = path.slice(0, -whiteLen);
         whiteouts.push(path);
         if ((await stat(join(unionfs, path))).isDirectory()) {
           whiteouts.push(p);
         }
-        computeState[p] = -ctime;
+        computeState[p] = -mtime;
       }
     }
 
diff --git a/src/packages/sync-fs/lib/util.ts b/src/packages/sync-fs/lib/util.ts
@@ -28,7 +28,7 @@ export async function metadataFile({
   path: string;
   exclude: string[];
 }): Promise<string> {
-  log("metadataFile", path, exclude);
+  log("mtimeDirTree", path, exclude);
   if (!(await exists(path))) {
     return "";
   }
@@ -41,7 +41,7 @@ export async function metadataFile({
   //   in them!
   //   BUT -- we are assuming filenames can be encoded as utf8; if not, sync will
   //   obviously not work.
-  // - The find output contains more than just what is needed for ctimeDirTree; it contains
+  // - The find output contains more than just what is needed for mtimeDirTree; it contains
   //   everything needed by websocketfs for doing stat, i.e., this output is used
   //   for the metadataFile functionality of websocketfs.
   // - Just a little fact -- output from find is NOT sorted in any guaranteed way.
@@ -61,7 +61,7 @@ export async function metadataFile({
       "-o",
       ...findExclude(exclude),
       "-printf",
-      "%p\\0%.10C@ %.10A@ %b %s %M\\0\\0",
+      "%p\\0%.10T@ %.10A@ %b %s %M\\0\\0",
     ]),
     {
       cwd: path,
@@ -70,13 +70,13 @@ export async function metadataFile({
   return stdout;
 }
 
-// Compute the map from paths to their integral ctime for the entire directory tree
+// Compute the map from paths to their integral mtime for the entire directory tree
 // NOTE: this could also be done with the walkdir library, but using find
 // is several times faster in general. This is *the* bottleneck, and the
 // subprocess IO isn't much, so calling find as a subprocess is the right
 // solution!  This is not a hack at all.
 // IMPORTANT: top level hidden subdirectories in path are always ignored
-export async function ctimeDirTree({
+export async function mtimeDirTree({
   path,
   exclude,
   metadataFile,
@@ -85,7 +85,7 @@ export async function ctimeDirTree({
   exclude: string[];
   metadataFile?: string;
 }): Promise<{ [path: string]: number }> {
-  log("ctimeDirTree", path, exclude);
+  log("mtimeDirTree", path, exclude);
   if (!(await exists(path))) {
     return {};
   }
@@ -102,7 +102,7 @@ export async function ctimeDirTree({
       "-o",
       ...findExclude(exclude),
       "-printf",
-      "%p\\0%C@\\0\\0",
+      "%p\\0%T@\\0\\0",
     ]);
     const { stdout } = await execa("find", [...args], {
       cwd: path,