Skip to content

Conversation

@NickAkhmetov
Copy link
Contributor

This PR adds the is_integrated boolean to the transformation process. All externally processed data and datasets with more than 1 dataset in its ancestor tree are treated as integrated, as:

  • If there's more than one ancestor dataset, the redirect is jarring/unclear.
  • Externally processed datasets may have contributor sets that are different than their parents, even if they have only one parent.

Since this relies on the ancestor dataset counts, I've added this as a separate transformation called in transform.py after the add_counts transformation and adjusted the test_transform tests to include integrated dataset cases. I can also add doctests to the function itself if desired.

@yomatters
Copy link

@NickAkhmetov, would it make sense to add another flag to differentiate the internally-processed integrated datasets from the externally-processed ones? It seems like we'll need to know that for the UI, but maybe there are existing fields we can use to make that determination.

@NickAkhmetov
Copy link
Contributor Author

@yomatters Since we can determine that based on the creation_action, which we already request for the dataset page, the current approach should suffice - it'd be a value derived from a single other field so we don't gain much from indexing it and no additional information is made available by including it.

@yuanzhou
Copy link
Member

yuanzhou commented Jan 5, 2026

@yomatters As part of the review process, can you approve this PR if all looks good to you?

I'll take care of the merge and deployment. Our established workflow is to get things tested on DEV and TEST first with full reindex (when necessary like this change). Then the portal team reviews and gives me the green light for PROD release.

Copy link

@yomatters yomatters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

@yomatters
Copy link

@yuanzhou, apologies for the delay! I'm new to the project and wasn't familiar with the process. I just submitted my approval.

@yuanzhou
Copy link
Member

yuanzhou commented Jan 5, 2026

Thanks @yomatters. Normally I would get tagged automatically for all the PRs in this repository, and several others too, mainly to coordinate the code review and deployment. For changes like this one, I tend to wait until all other reviewers also approve the PR.

Are you a member of the HuBMAP Slack Workspace too? It'll be helpful to include you as I normally keep everyone posted once the changes are running on DEV/TEST and PROD.

@yomatters
Copy link

@yuanzhou, thanks! Yes, I'm in the HuBMAP Slack. Which channel do you post notifications in?

@yuanzhou yuanzhou merged commit b403d83 into dev-integrate Jan 6, 2026
4 checks passed
@yuanzhou
Copy link
Member

yuanzhou commented Jan 6, 2026

@yomatters I've added you to #hive-developers and #elasticsearch channels on HuBMAP Slack.

@yuanzhou yuanzhou deleted the nickakhmetov/add-is-integrated branch January 9, 2026 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants