Skip to content

Add federated site search functionality#2534

Closed
Rutvikrj26 wants to merge 9 commits intoMIT-LCP:devfrom
T-CAIREM:Rutvikrj26/federated-site-search-implementation
Closed

Add federated site search functionality#2534
Rutvikrj26 wants to merge 9 commits intoMIT-LCP:devfrom
T-CAIREM:Rutvikrj26/federated-site-search-implementation

Conversation

@Rutvikrj26
Copy link
Copy Markdown
Contributor

Implement a fully working federated site search feature, allowing users to include results from registered federated sites alongside local projects. This includes necessary backend changes, new forms, and UI components for managing federated sites.

@Rutvikrj26 Rutvikrj26 marked this pull request as draft November 25, 2025 16:37
@Rutvikrj26 Rutvikrj26 marked this pull request as ready for review November 27, 2025 18:57
@bemoody
Copy link
Copy Markdown
Collaborator

bemoody commented Dec 9, 2025

Thanks; I like the general approach.

Please separate the unrelated dependency changes (poetry.lock / pyproject.toml) into another pull request.

I think it'd probably be more manageable to define the models in the search app instead of the project app. It doesn't look like there's anything here that is specifically tied to the project app.

We should think carefully about what stuff we are storing in the FederatedProject model. We don't want this to be too tightly coupled to the physionet-build data model.

  • Integer resource_type and access_policy codes seem dubious.

  • We want some kind of unique stable identifier for a foreign project, but I'm not sure about using slug+version for that purpose. Perhaps doi would be better, or perhaps a UUID (that would need to be added to PublishedProject.)

  • On the other hand, we do want to be aware of versioning; the simplest thing is to search only latest versions by default.

What is your thinking behind retaining "stale" objects? If a foreign project no longer exists, why wouldn't we just want to delete it?

@Rutvikrj26
Copy link
Copy Markdown
Contributor Author

Please separate the unrelated dependency changes (poetry.lock / pyproject.toml) into another pull request.

For sure, I'll revert these changes in the current branch.

I think it'd probably be more manageable to define the models in the search app instead of the project app. It doesn't look like there's anything here that is specifically tied to the project app.

The reasoning behind keeping it in the project folder is - we are storing projects. Search app implements search functionalities while the project app is the repository containing all the code for project management (local/federated).

We should think carefully about what stuff we are storing in the FederatedProject model. We don't want this to be too tightly coupled to the physionet-build data model.

  • Integer resource_type and access_policy codes seem dubious.

Fully agreed - they are completely separated in terms of models. The only thing that is reused is the types, and the reason is so that we can maintain the same search UI, and don't have to refactor for string based implementation. Since we are only federating across physionet instances, this should not be an issue, and won't drift apart.

  • We want some kind of unique stable identifier for a foreign project, but I'm not sure about using slug+version for that purpose. Perhaps doi would be better, or perhaps a UUID (that would need to be added to PublishedProject.)

I kinda agree, and would like to add uuid to the published project. The only reason I want to avoid doi is - not all projects might have doi. Slug + Version is something that is critical as that is what created the individual project pages in the first place, and hence this is always going to be unique.

  • On the other hand, we do want to be aware of versioning; the simplest thing is to search only latest versions by default.

What is your thinking behind retaining "stale" objects? If a foreign project no longer exists, why wouldn't we just want to delete it?

I was thinking of keeping the system version-aware and hence the stale objects as well (similar workflow to how things are implemented for local projects). But I agree with your point. We might not want to keep stale and only give the latest versions in our search.

I'll make the necessary implementation changes and update the pr.

@Rutvikrj26
Copy link
Copy Markdown
Contributor Author

@tompollard

As discussed, I'll update the implementation to put the models into search app, add uuid, and leave the refresh logic as-is.

@bemoody
Copy link
Copy Markdown
Collaborator

bemoody commented Dec 16, 2025

Fully agreed - they are completely separated in terms of models. The only thing that is reused is the types, and the reason is so that we can maintain the same search UI, and don't have to refactor for string based implementation. Since we are only federating across physionet instances, this should not be an issue, and won't drift apart.

As I think about this more, the issue is not so much about how the information is stored in the FederatedProject model; the issue is how the information is represented in the public JSON API. The API is not just for federating between sites running identical software - it's meant to be used by everybody.

These fields aren't currently in the API at all, which means we would need to add them. If we add them to the API, they should be added in a way that is extensible and future-proof.

I think perhaps it would be better to split this into two pull requests: one to add additional stuff to the API, and one to add the ability to sync/search federated sites.

@tompollard
Copy link
Copy Markdown
Member

Superseded by #2546

@tompollard tompollard closed this Jan 20, 2026
github-merge-queue bot pushed a commit that referenced this pull request Feb 25, 2026
Enhance API responses by including resource type, access policy, and
topics. Implement federated search functionality with UUID support for
project models.

## API Updates
- Expose public_project_uuid in PublishedProjectSerializer and
PublishedProjectDetailSerializer
- Update API documentation in export/views.py to include UUID field

## Federation Models Migration
- Move FederatedSite, FederatedProject, and FederationSyncLog models
from project app to search app
- Update FederatedProject model:
  * Add public_project_uuid field for stable identification
* Change resource_type and access_policy to string fields (from integer
codes)
  * Remove is_stale field (using full refresh strategy instead)
* Update unique_together to use public_project_uuid instead of
slug+version

## Search Functionality
- Create search/federation.py with federated search logic
- Update resource_type filtering to work with string values
- Remove is_stale filtering (full refresh approach)

## Management Commands
- Create sync_federated_sites command in search app
- Update sync logic to:
  * Use public_project_uuid for identification
  * Validate presence of UUID in API responses
  * Use full refresh (delete + recreate) instead of stale marking
  * Support string values for resource_type and access_policy

## Admin Interface
- Register federation models in search/admin.py with comprehensive admin
classes
- Add list displays, filters, and fieldsets for all federation models

Based on PR #2534 feedback from @bemoody:
- UUID replaces slug+version as stable identifier
- String values for resource_type and access_policy for API
extensibility
- Models moved to search app (proper separation of concerns)
- Removed stale objects logic in favor of batch delete+add
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants