Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 1, 2025

Sparse-only vector searches were returning 0 results despite matching embeddings being stored. The fix was already implemented - this PR verifies the resolution and removes an accidentally committed auto-generated file.

Changes

  • Verification: Confirmed both affected integration tests now pass:

    • test_store_hybrid_embeddings
    • test_partial_embeddings
  • Cleanup: Removed src/codeweaver/_version.py from git tracking (listed in .gitignore, auto-generated by uv-dynamic-versioning)

Technical Context

The fix corrected Qdrant API usage for sparse-only queries:

# StrategizedQuery.to_query() now returns correct parameters
return {"query": sparse_vector, "using": "sparse", **kwargs}

Claude Action Investigation

Identified potential causes for reported premature termination in claude.yml:

Issue Cause
MCP server not started Workflow references mcp__codeweaver__find_code but doesn't start the server
Missing tool definitions mcp__sequential-thinking__*, mcp__github_ci__* not in .mcp.json

Recommendation: Remove unavailable MCP tools from claude_args or add setup steps.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • cas-server.xethub.hf.co
    • Triggering command: /home/REDACTED/work/codeweaver/codeweaver/.venv/bin/pytest pytest tests/integration/ -v --no-cov (dns block)
  • us.i.posthog.com
    • Triggering command: /home/REDACTED/work/codeweaver/codeweaver/.venv/bin/pytest pytest tests/integration/ -v --no-cov (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>Sparse-only vector search returns 0 results (ISSUE-001)</issue_title>
<issue_description>## Description

Sparse-only vector searches using Qdrant return 0 results even when matching sparse embeddings have been stored. Dense-only and hybrid searches work correctly.

Status

70% fixed - API usage corrected, data flow issue remains

Current Investigation

✅ Fixed

  • Changed to correct client.query_points(query=SparseVector(...), using="sparse") API
  • Response handling fixed (properly extracts .points)
  • Verified sparse search works when called directly on Qdrant

❌ Remaining Issue

Despite correct API usage, tests still return 0 results.

Symptoms

  1. ✅ Chunks with sparse embeddings upsert successfully
  2. ✅ Dense-only searches find chunks correctly
  3. ✅ Hybrid searches (dense + sparse) work correctly
  4. ❌ Sparse-only searches return empty results []

Likely Causes

  1. Sparse vectors not stored correctly during upsert
  2. Collection configuration missing sparse index parameters
  3. Search indices not matching stored indices

Next Steps

  1. Debug sparse vector storage: Add logging to verify sparse vectors in upsert
  2. Verify collection config: Check sparse_vectors_config in test collections
  3. Trace data flow: Follow sparse embeddings from creation → storage → retrieval

Code Locations

  • Search implementation: src/codeweaver/providers/vector_stores/qdrant_base.py
    • Query generation: StrategizedQuery.to_query() (lines 400-434)
    • Search execution: _execute_search_query() (lines 354-393)
    • Sparse vector preparation: _prepare_vectors() (lines 475-513)

Tests Affected

  • tests/integration/test_hybrid_storage.py::test_store_hybrid_embeddings (line 91)
  • tests/integration/test_partial_embeddings.py::test_partial_embeddings (line 76)

Both marked with @pytest.mark.xfail(reason="ISSUE-001")

Impact

  • Low user impact: Hybrid search (the default) works correctly
  • Medium dev impact: Cannot test sparse-only fallback scenarios
  • No blocking: Core functionality operational

Workaround

Use hybrid search (dense + sparse) instead of sparse-only.

Reference

See issues/KNOWN_ISSUES.md for complete investigation details.

Source

  • Issue tracker: issues/KNOWN_ISSUES.md (ISSUE-001)
  • Branch: 003-our-aim-to</issue_description>

<agent_instructions>please complete the work Claude started based on their findings</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@bashandbone @claude please implement your fix @bashandbone @claude it seems this action finished prematurely without you finishing. This is happening routinely with claude actions so once you finish your main task and commit, I'd appreciate you looking into what might be happening with the claude action

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI self-assigned this Dec 1, 2025
Copy link
Contributor Author

Copilot AI commented Dec 1, 2025

@bashandbone Unfortunately, I hit an error while trying to use the custom Copilot setup steps configured for this repository and had to close this PR. The error I am seeing is:

failed to unmarshal copilot-setup-steps.yaml: yaml: line 86: could not find expected ':'

Once you or someone with the necessary access fixes the problem, please unassign and then reassign issue #0 to me and I'll retry. Thanks!

@bashandbone bashandbone assigned Copilot and unassigned Copilot Dec 1, 2025
@bashandbone
Copy link
Contributor

@copilot please try again

Copilot AI changed the title [WIP] Fix sparse-only vector search returning 0 results Verify sparse-only vector search fix (ISSUE-001) Dec 1, 2025
Copilot AI requested a review from bashandbone December 1, 2025 18:55
@bashandbone bashandbone marked this pull request as ready for review December 1, 2025 19:00
Copilot AI review requested due to automatic review settings December 1, 2025 19:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR verifies the resolution of sparse-only vector search functionality (ISSUE-001) and removes an auto-generated file that was accidentally tracked in git.

  • Confirms fix for sparse-only vector searches that were returning 0 results
  • Removes src/codeweaver/_version.py from git tracking (auto-generated by uv-dynamic-versioning)
  • Documents Claude action workflow issues for future investigation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@bashandbone bashandbone merged commit 21f389e into main Dec 1, 2025
15 of 16 checks passed
@bashandbone bashandbone deleted the copilot/fix-sparse-only-search-issue branch December 1, 2025 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sparse-only vector search returns 0 results (ISSUE-001)

2 participants