Skip to content

nomic-embed-code for code indexing feature.Β #5027

@christopherowen

Description

@christopherowen

App Version

3.21.3

API Provider

OpenAI Compatible

Model Used

nomic-embed-code

πŸ” Steps to Reproduce

Hardcoded into roocode's experimental semantic code indexing and searching feature is the qdrant filter score_threshold to 0.4. The value is not a useful threshold for the nomic-embed-code model.

Additionally, there is text that is required to be included in the nomic-embed-code input.

The script below will help with debugging. It outputs the vector length and then three tests of search with different score_threshold values

❯ cat qdrant_debug.sh
#!/usr/bin/env bash
#
# qdrant-debug.sh  β€”  collect evidence for a Roo-Code + Qdrant search bug
# Outputs GitHub-flavoured Markdown so you can paste it into an issue.

set -euo pipefail

QUERY=${1:-'user authentication'}
COLL="ws-0093205a6054d427"
EMBED_URL="http://localhost:8080/v1/embeddings"
QDRANT_URL="http://localhost:6333/collections/${COLL}/points/search"

# 1. embed the query ----------------------------------------------------------
EMB=$(curl -s -X POST "${EMBED_URL}" \
        -H 'Content-Type: application/json' \
        -d @- <<EOF | jq '.data[0].embedding'
{
  "model": "nomic-embed-code",
  "input": "Represent this query for searching relevant code: ${QUERY}"
}
EOF
)

vec_len=$(echo "${EMB}" | jq 'length')

echo "### πŸ” Query: \`${QUERY}\`"
echo "Vector length: **${vec_len}** (should be 3 584)\n"

# helper: run a search and pretty-print --------------------------------------
search () {
  local threshold=$1
  local limit=${2:-5}
  local body
  if [[ -z "${threshold}" ]]; then
    body="{\"vector\": ${EMB}, \"limit\": ${limit}}"
  else
    body="{\"vector\": ${EMB}, \"limit\": ${limit}, \"score_threshold\": ${threshold}}"
  fi
  curl -s -X POST "${QDRANT_URL}" \
       -H 'Content-Type: application/json' \
       -d "${body}"
}

md_block () { echo -e '```json'; cat -; echo -e '```'; }

# 2. no threshold ------------------------------------------------------------
echo "### βœ… Top 5 results (no score_threshold)\n"
search "" 5 | md_block

# 3. strict threshold 0.40 ---------------------------------------------------
echo -e "\n### 🚫 Results with \`score_threshold = 0.40\`\n"
json=$(search 0.4 50)
count=$(echo "${json}" | jq '.result | length')
echo "**Count:** ${count}"
echo "${json}" | md_block

# 4. lenient threshold 0.15 --------------------------------------------------
echo -e "\n### 🟑 Results with \`score_threshold = 0.15\`\n"
json15=$(search 0.15 50)
count15=$(echo "${json15}" | jq '.result | length')
echo "**Count:** ${count15}"
echo "${json15}" | md_block

# 5. score distribution table ------------------------------------------------
echo -e "\n### πŸ“Š Score distribution (top 10, no threshold)\n"
printf '| Rank | Point ID | Score |\n|------|----------|-------|\n'
search "" 10 | jq -r '.result[] | [.score,(.payload.id//.id)] | @tsv' |
  nl -w2 -s$'\t' | while IFS=$'\t' read -r rank score id; do
    printf '| %s | `%s` | %.4f |\n' "${rank}" "${id}" "${score}"
  done

πŸ’₯ Outcome Summary

This feature will not function with all models with a static score_threshold set to 0.4.

For nomic-embed-code the setting should be maybe 0.10 or 0.15. In my codebase the highest score for "user authentication" was 0.3347 as shown below.

We should consider either:

  • querying for an appropriate threshold.
  • having per-model settings in the code.
  • adding a tunable for the threshold to the settings page.

ADDITIONALLY:

  • "Represent this query for searching relevant code: " is required by the model. I'm unsure if this is unique to nomic-embed-code.

πŸ“„ Relevant Logs or Errors (Optional)

❯ bash qdrant_debug.sh "user authentication"

πŸ” Query: user authentication

Vector length: 3584 (should be 3584)

βœ… Top 5 results (no score_threshold)

{"result":[{"id":"38f29157-c61f-5337-83d3-0f5cfb1ac767","version":10,"score":0.33473045},{"id":"a71a1212-8a2a-566f-8a6b-6a239023574f","version":10,"score":0.28562117},{"id":"7e27640d-d2df-573c-b4ed-329878f85ada","version":53,"score":0.2749608},{"id":"3e7866f4-aa60-5296-8aa6-69a11097743c","version":18,"score":0.2680832},{"id":"865812f7-a510-5ee3-9a75-121315534571","version":10,"score":0.26251754}],"status":"ok","time":0.000835419}

🚫 Results with score_threshold = 0.40

Count: 0

{"result":[],"status":"ok","time":0.000705669}

🟑 Results with score_threshold = 0.15

Count: 50

{"result":[{"id":"38f29157-c61f-5337-83d3-0f5cfb1ac767","version":10,"score":0.33473045},{"id":"a71a1212-8a2a-566f-8a6b-6a239023574f","version":10,"score":0.28562117},{"id":"7e27640d-d2df-573c-b4ed-329878f85ada","version":53,"score":0.2749608},{"id":"3e7866f4-aa60-5296-8aa6-69a11097743c","version":18,"score":0.2680832},{"id":"865812f7-a510-5ee3-9a75-121315534571","version":10,"score":0.26251754},{"id":"3a176d41-6774-5f9e-8c0c-780e5b399349","version":53,"score":0.24374674},{"id":"e781d01b-b5e4-5da9-9c25-4cd0205b843d","version":53,"score":0.24362029},{"id":"2658f249-669d-5043-b0f0-c1878a97c377","version":10,"score":0.24203128},{"id":"81f5c4a5-0869-557b-b5fe-7dfdbc2437b1","version":60,"score":0.23435998},{"id":"210cb401-7f8f-5c07-bbe0-982315e0cf94","version":34,"score":0.2278384},{"id":"16a4f811-979e-5427-a40f-27be99968043","version":12,"score":0.22783059},{"id":"86cf471c-23ff-5188-9466-a787876409c7","version":12,"score":0.22453013},{"id":"33660386-ff6c-5b21-a4a9-0876a7d7850b","version":17,"score":0.22215551},{"id":"c8e244d4-e0bf-53ee-a5cb-a63d6a0b09e6","version":53,"score":0.2219722},{"id":"727a9dde-0cba-5db0-bfab-d7b8ed551397","version":55,"score":0.22156988},{"id":"796074c9-7807-5bfa-a03c-bed4e8bbfb4d","version":10,"score":0.22054857},{"id":"5872c93d-b45f-5d0f-b08f-b812e0edf392","version":10,"score":0.21486811},{"id":"bf3e07ff-99a4-5473-bf28-205edc0ba0ce","version":10,"score":0.21298264},{"id":"f56f22ee-6e0d-5e64-a636-db596a337ccb","version":65,"score":0.21070097},{"id":"987a8e8e-f602-5bbb-a5d2-b29ef3fab2f8","version":55,"score":0.21006832},{"id":"adf7e7be-d755-5631-96e6-ed285e897293","version":55,"score":0.21006832},{"id":"74be01c2-1b24-5ae9-8b07-e80a0608b15a","version":55,"score":0.21006832},{"id":"a11b1ca5-4b4e-5681-9cfb-d657c259bdd8","version":55,"score":0.2100571},{"id":"1f76c7e0-a5b1-55cb-a1f4-c9df84cf02a2","version":10,"score":0.20893657},{"id":"83e575c6-9071-58a6-93e9-586e4f50941b","version":30,"score":0.20667195},{"id":"ce8dfd42-903a-549e-a60b-fb73582aad26","version":13,"score":0.2035707},{"id":"e46ca85c-8dee-594b-bb13-2e51dfb1e3f4","version":10,"score":0.19638401},{"id":"cdac8cc4-a9c3-5171-8cff-27e2ff77969b","version":31,"score":0.19503045},{"id":"211701b7-1c46-5bee-a6bc-0dd881c4c239","version":34,"score":0.19351412},{"id":"50ec5ccc-5ba8-52b1-8e2f-c1cee138a89d","version":15,"score":0.19167314},{"id":"1db2fdd0-4921-5ffe-b150-6da2d50ab5fb","version":10,"score":0.19081627},{"id":"f950d64f-6c0b-522b-bdf1-47abf6885645","version":14,"score":0.188555},{"id":"9bea4f7a-a51e-5ad3-b10d-6dfd14a8892d","version":77,"score":0.18814057},{"id":"1391843a-0006-50e5-a436-75944f52b05c","version":15,"score":0.18806493},{"id":"6baad56d-5200-5975-b143-2cf4ee054bab","version":10,"score":0.18738957},{"id":"785c2f5b-eff1-58d1-8864-674a75b044c6","version":30,"score":0.18574986},{"id":"d737af7d-7bbf-513c-acad-66e4e0179357","version":14,"score":0.18165587},{"id":"fa5b994c-e2f2-5e4c-b766-fec83c055dd8","version":14,"score":0.18165587},{"id":"e98ca363-2463-5f3a-ad35-0a6462026399","version":77,"score":0.1810772},{"id":"5adcfc65-6194-559c-9e8e-da2bcb5166d9","version":10,"score":0.1786707},{"id":"01217221-33d9-5218-a1be-eb546fb07f18","version":10,"score":0.1765967},{"id":"7d911de8-9908-5b59-853f-273c7f4de6de","version":53,"score":0.17619362},{"id":"62828bdd-c17f-506f-baac-2209a978c10a","version":15,"score":0.1745193},{"id":"9c2a4051-6665-531e-b484-652dadd2638a","version":15,"score":0.17087604},{"id":"699aad64-5925-5248-8e35-5c4a0fdfa57b","version":17,"score":0.17049417},{"id":"76301559-e095-51be-96e0-e29274a5e8c0","version":57,"score":0.16999713},{"id":"4333c069-295c-592c-8008-a495ff41bb99","version":77,"score":0.16764553},{"id":"0bd52652-2673-5b4f-baa4-7300388e5f05","version":13,"score":0.16714028},{"id":"303f9fe2-b63f-5a3c-930b-65eded701807","version":74,"score":0.1670577},{"id":"deabcbde-fe3b-599f-9133-f52707a3dbb6","version":41,"score":0.1628649}],"status":"ok","time":0.000702127}

πŸ“Š Score distribution (top 10, no threshold)

Rank Point ID Score
1 38f29157-c61f-5337-83d3-0f5cfb1ac767 0.3347
2 a71a1212-8a2a-566f-8a6b-6a239023574f 0.2856
3 7e27640d-d2df-573c-b4ed-329878f85ada 0.2750
4 3e7866f4-aa60-5296-8aa6-69a11097743c 0.2681
5 865812f7-a510-5ee3-9a75-121315534571 0.2625
6 3a176d41-6774-5f9e-8c0c-780e5b399349 0.2437
7 e781d01b-b5e4-5da9-9c25-4cd0205b843d 0.2436
8 2658f249-669d-5043-b0f0-c1878a97c377 0.2420
9 81f5c4a5-0869-557b-b5fe-7dfdbc2437b1 0.2344
10 210cb401-7f8f-5c07-bbe0-982315e0cf94 0.2278

Metadata

Metadata

Assignees

Labels

Issue - Unassigned / ActionableClear and approved. Available for contributors to pick up.bugSomething isn't workingenhancementNew feature or request

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions