Skip to content

Commit 99c779e

Browse files
committed
add docs about --bag switch
1 parent f60b9d4 commit 99c779e

File tree

2 files changed

+16
-4
lines changed

2 files changed

+16
-4
lines changed

docs/running.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,15 @@ It is safe to kill ({kbd}`control` + {kbd}`c`) and restart the QA process when t
5858
The python program, not the docker container, the container should clean itself up when python exits.
5959
:::
6060

61+
#### Control how closely to follow BagIt manifest validation spec `--bag`
62+
The first release version of this software would only check what is in the manifest-md5.txt file, that was found to not be as robust as we wanted.
63+
Some breakouts were found to have files, but empty manifests, this software would treat this as an empty breakout and... crash.
64+
A stricter mode was implemented that can be controlled by the --bag switch value:
65+
66+
* `strict`, any files in the `/data` directory and not in the manifest-md5.txt cause the manifest OK test to report failure.
67+
* `flex`, a reasonable set of file names are allowed to exist in `/data` and not in the manifest-md5.txt, see [](#r2r_ctd.breakout.FLEX_FILES_OK) for the list of filenames allowed.
68+
* `manifest` reverts to the original behavior where only paths in the manifest-md5.txt are checked and any extra files in `/data` are ignored.
69+
6170
## Breakout Structure
6271
When R2R receives data from a cruise it will be split up into separate collections called "breakouts".
6372
To be processed, the breakout is expected to be a directory with contents, not an archive such as a zip file.

src/r2r_ctd/breakout.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,11 @@
2020

2121
logger = getLogger(__name__)
2222

23+
FLEX_FILES_OK = """.DS_Store
24+
thumbs.db"""
25+
"""Filenames that, when in --bag flex validation mode, will not cause a fail"""
26+
# kept as a string so it prints in the documentation nicely
27+
2328

2429
class BBox(NamedTuple):
2530
"""namedtuple to represent a geo bounding box
@@ -178,9 +183,7 @@ def manifest_ok(self) -> bool:
178183
179184
This is one of the checks that goes into the stoplight report.
180185
"""
181-
flex_files = {
182-
".DS_Store",
183-
}
186+
flex_files_ok = set(FLEX_FILES_OK.split("\n"))
184187
logger.info(f"Bag validation mode: {self.bag_strictness}")
185188
err_message = "Files are in payload directory and not in manifest, breakout is likely invalid or corrupted"
186189
for root, _, files in self.payload_path.walk():
@@ -191,7 +194,7 @@ def manifest_ok(self) -> bool:
191194
return False
192195

193196
if self.bag_strictness == "flex" and not all(
194-
d.name in flex_files for d in diff
197+
d.name in flex_files_ok for d in diff
195198
):
196199
logger.critical(err_message)
197200
return False

0 commit comments

Comments
 (0)