Remove elastic search duplicates #188
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes
smart-search-plugin.mjsRefactored to ensure consistent ID generation for MDX documents using cleaned paths, preventing duplicates in the search index.
Updated deleteExistingDocs to use the deleteMany mutation, targeting content_type: 'mdx_doc' and removing any residual documents.
Added detailed logging during indexing to aid in debugging and ensure proper paths and IDs for indexed documents.
pages/api/search.jsModified the search API handler to:
Handle both content_type and post_type fields to ensure compatibility with existing WordPress and MDX content in the search index.
Remove duplicates from search results using a unique filter based on document IDs.
Clean up paths for MDX content by stripping unnecessary prefixes like src/pages and pages/ and trimming .mdx extensions.
Properly map and format results for:
MDX documents (content_type: 'mdx_doc')
WordPress posts (content_type: 'wp_post' or post_type: 'post').
Bug Fixes
Results: