Skip to content

After push to GIN, remote retains folders that were deleted from dataset #120

@jsheunis

Description

@jsheunis

Origin: Office Hour chatroom message

Description

User reported:

I have an acquisition computer, an analysis computer and a gin repository. The experiment files are in a subset (rawdata) and pushed to gin, then retrieved from the analysis computer.

Now, I have deleted/restructured the data in the acquisiton computer (deleted, renamed, moved), saved the changes and pushed, but some of the old folders are still there on Gin. All the files are gone, but the folder structure remains on the gin repository and no push will remove them.

Besides that, part of this restructuring was changing folder names from "bla folder" to "bla_folder", and I keep getting the old version in my acquisition computer - so I have "bla folder" on the acquisition computer and cannot get the correct one "bla_folder", even if "bla folder" does not exist in the repository anymore.

@adswa asked to confirm that:

  • the actual files were successfully pushed (i.e., there are on Gin and safely backed up)?
  • what remains on the acquisition computer are empty directories with outdated names?

User answer:

The actual files are pushed to Gin. The acquisition computer has the original and ideal version of this dataset.
The old folders with outdated names remain in Gin, and are present in my analysis computer. I cannot get their correct versions.

I am new to datalad and so far only using it to transfer data (and have version control) this way, acquisition -> gin -> analysis.

So when I am done acquiring new data, I use save, then update --merge and finally push --to gin. Only the rawdata subdataset is present in the acquisition computer.

From the analysis computer, I update and get whichever files I need to work on.

As for the structure of the datasets, I have superset in the analysis computer, this contains the rawdata subdatasets, and other folders containing code, figures, etc. This has its own Gin repo.

More clarifying questions:

  • So inside of the rawdata subdataset on the acquisition computer you run:
    datalad save
    datalad update --merge 
    datalad push 
    
    correct?
  • Can I ask why you run the update --merge?
  • Are you making changes to the raw data subdataset at any other location/clone than the acquisition computer?
  • In case it is public, can you share the Gin repository, or could you hop into a video call with us either today until 2.30pm or during the next office hour?
  • Also, please share the set of commands that you ran, and also the dataset structure (super- and subdataset boundaries)

Next steps

  • Wait for user feedback to above questions.

TODO (not necessarily to be performed in this order)

  • Inform OP/Add reference to this issue at origin
  • Clarifying Qs asked or not needed
  • Nature of the issue is understood
  • Inform OP about resolution

Metadata

Metadata

Assignees

Labels

support-trackerTrack a support event that occurred elsewhere

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions