-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Problem Statement
When DE has finished a build, we've often encountered some combination of the following problems:
- It's not clear to GIS (or Amanda) what build of a dataset should be reviewed.
- The status of a dataset in the publication lifecycle (being built, awaiting review, in review, approved) isn't clear.
- Files for datasets that have been reviewed are potentially (maybe even often) overwritten by subsequent builds.
- There's not clear lineage backwards from the published / distributed datasests (e.g. on Socrata / Bytes) back to their raw files.
- There's no review history for subsequent builds and reviews of a dataset.
- e.g. say we're in our third QA cycle of PLUTO 24v1.1, where should we look to know why the second build failed QA?
Proposed Solution
For all of our products, we should add a subfolder under the version to indicate the draft publication version. The current state looks like this:
- current: edm-publishing / db-pluto / publish / 24v1.1 /
dataset files - proposed: edm-publishing / db-pluto / publish / 24v1.1 /
draft publication version/dataset files
The draft publication version will be composed of an integer version, and a summary to describe the the build, similar to the summary line of a git commit. A list of builds versions could look like this:
- 1-initial-build
- 2-add-new-corrections
- 3-fix-zoning
I suggest an integer version instead of a timestamp because we don't really care when the draft was published, whereas the integer corresponds to something that we do care about. e.g. if we're in round three of PLUTO publishing, and you see that the last draft publication is 6-fix-the-issue then you immediately know something is wrong.
Draft Publication Github Issues
Our Publishing Github Action will create a Github Issue for every published build version. Decisions, discussions, etc should be documented on that issue. They should all be linked back to a parent Issue for a build of a dataset.
The Issue for the draft publication should use Github Labels to indicate the status. A list of statuses might be:
- Ready
- In QA
- Passed
- Failed
Perhaps we can auto-add all of GIS as an Assignee
Implementation Details (Technical)
- in DO: Migrate everything in the
publishfolder to this new scheme. - in GHA: Modify our
Publishaction to accept achanges summaryfield, which will be used to generate thedraft publication version. The integer part will be inferred from existing versions on DO. - in
dcpy: Thedraft publication versionconcept needs to be added to the edm publish connector.Publishfunctionality should refuse to overwrite existing data.
Implementation Details (Nontechnical)
- Within this new framework, everything under a products
draftfolder should be considered deletable. - Need to work with GIS to change procedures.
Other Considerations
- I think it's nearly time (if not actually time) to do away with the
latestfolders. It's a convenient hack, but has the liability of being potentially out of sync with actual latest versions. As part of this, we could help GIS migrate off. We could either supply them python code to infer last build version, or add a REST endpoint to the QAQC app to redirect to the DO location. - We could certainly use some utilities around tidying up our edm publications bucket. e.g. CLI tooling to flag or move "misplaced" items, e.g.
edm-publishing/db-pluto/23v2which should be (presumably) under the draft folder. - Regarding what we call the datasets in our publishing folders, is there a better term than
draft publications?publication drafts? They really are just drafts of what we'll eventually publish. - What repo should Draft Publication Issues live in?
Example Workflow for GIS (Copied from @sf-dcp's comment)
Suppose we're building PLUTO v24.1.
- DE creates parent issue for a dataset build manually, lets call it PLUTO v24.1. DE kicks off a build.
- DE reviews internally, and publishes via the Github action (no changes to process so far)
- GHA generates new Issue PLUTO v24.1 1-initial build (child issue), tagged as
Readyand GIS team is added to the issue - GIS updates the child issue's tag In review:
Their QA review looks good. They put tag Passed for the child issue
Their QA review asks for changes --> They put tag Failed for the child issue --> They note required changes in the child issue --> we repeat the process for generating consecutive child issues.
Whiteboarding
Metadata
Metadata
Assignees
Labels
Type
Projects
Status

