Skip to content

ISSUE-238 Scenarios for the acquisition of data files and file versioning#265

Merged
lewismc merged 13 commits intotagbase:mainfrom
lewismc:ISSUE-238
Jun 12, 2023
Merged

ISSUE-238 Scenarios for the acquisition of data files and file versioning#265
lewismc merged 13 commits intotagbase:mainfrom
lewismc:ISSUE-238

Conversation

@lewismc
Copy link
Member

@lewismc lewismc commented May 19, 2023

WIP for #238 and #164
Work to be done

  • rather than concatenating the dataset_id as a string and including in submission table, we should create a new foreign key dataset_id in submission referencing dataset table.
  • it is OK to open file twice (as long as we only read metadata). We need to create functions to 1. only read the global attributes (metadata) and 2. populate the dataset table, and 3. ensure that the dataset_id is populated in submission.

@lewismc lewismc marked this pull request as draft May 19, 2023 05:29
@lewismc lewismc modified the milestones: 0.12.0, 0.13.0 May 19, 2023
@lewismc lewismc added enhancement New feature or request storage Anything tagbase-server storage/persistence related. labels May 19, 2023
@lewismc
Copy link
Member Author

lewismc commented May 21, 2023

More work to be done here @renato2099 but we are pretty close.

@renato2099
Copy link
Collaborator

hey @lewismc , I pushed a WIP commit about moving from a trigger based data migration to a stored-procedure one, but before we go down that path I'd like us to do some additional validation and think through if we want this potentially large behavioural change at this point

@renato2099
Copy link
Collaborator

hey @lewismc I run a couple of ingestions with this patch and it seems to be doing something 😅 I see data being ingesting but we should check if data migration is still working as expected before we proceed with this

@lewismc
Copy link
Member Author

lewismc commented Jun 10, 2023

Hi @renato2099 I updated this patch and have tested it. It looks good. I will note the following things though

  1. upon ingestion of iccat_gbyp0008_ArgosTrans_eTUFF0.txt and successful migration, loads of data is left in proc_observations... this requires investigation. I don't think this is new behavior
  2. we need more unit tests
    3. for some reason the result of a GET on /tags/{tag_id} now returns the same metadata for each submission rather than different metadata. I checked and confirmed that the correct metadata is populated into the database so this is definitely a bug in tags_controller.py
  3. we need to augment the stored procedure to accommodate the following scenario

User initially ingests a tag submission representing a reference track. User then ingests a different file which is for the same tag and dataset but the new reference track. We need to make sure that the original submission and the metadata is no longer the reference track. This is a kinda tricky as essentially the sha256 needs to change as well.

  1. finally, if I submit a .zip containing the three files iccat_gbyp0008_ArgosTrans_eTUFF0.txt, iccat_gbyp0008_ArgosTrans_eTUFF1.txt and iccat_gbyp0008_ArgosTrans_eTUFF2.txt the latter all three files may end up being assigned a different dataset_id depending on whether the transaction has completed yet and a entry has been written to dataset table before the next cursor attempt to read from that table. This has an impact on ingestion as it really depends on an initial entry being present before another file associated with the same dataset is ingested. I can demo this to you quite easily.

That being said, none of these really block us releasing 0.13.0. Let's discuss this weekend.

@lewismc lewismc marked this pull request as ready for review June 10, 2023 05:43
@lewismc
Copy link
Member Author

lewismc commented Jun 12, 2023

@renato2099 I updated this PR to fix # 3 above. This was an old bug which we hadn't caught before. It is now fixed.

@lewismc lewismc merged commit 09155dd into tagbase:main Jun 12, 2023
@lewismc lewismc deleted the ISSUE-238 branch June 12, 2023 05:48
lewismc added a commit that referenced this pull request Sep 24, 2023
…ning (#265)

* ISSUE-238 Scenarios for the acquisition of data files and file versioning

---------

Co-authored-by: Renato Marroquin <marenato@inf.ethz.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request storage Anything tagbase-server storage/persistence related.

Projects

No open projects

Development

Successfully merging this pull request may close these issues.

2 participants