Skip to content

fix: recall exact slug match as search candidate#976

Open
oolong-tea-2026 wants to merge 2 commits intoopenclaw:mainfrom
oolong-tea-2026:fix/exact-slug-recall
Open

fix: recall exact slug match as search candidate#976
oolong-tea-2026 wants to merge 2 commits intoopenclaw:mainfrom
oolong-tea-2026:fix/exact-slug-recall

Conversation

@oolong-tea-2026
Copy link

Problem

When searching for a skill by its exact slug (e.g. ima-all-ai), the skill can be missing from the first page of results.

Root Cause

The search pipeline has three recall stages:

  1. Vector search — semantic similarity, candidate pool of 75-256
  2. exactMatches — filter vector candidates using matchesExactTokens
  3. lexicalFallbackSkills — includes an exact by_slug index lookup

The exact slug skill can be missed when:

  • The vector candidate pool is too small to include it (step 1)
  • matchesExactTokens uses .some() (by design per #27), so common tokens like "ai" inflate the exactMatches count
  • When exactMatches.length >= limit, lexicalFallbackSkills is skipped entirely — along with its by_slug index lookup that would have found the skill

On "Load more" (which increases limit and re-searches), the larger candidate pool catches the skill, causing it to appear at the top of the list — confusing the user.

Reproduction

  1. Go to https://clawhub.ai/skills
  2. Search for ima-all-ai (an exact slug)
  3. First page of results does not contain ima-all-ai
  4. Scroll down to trigger "Loading more"
  5. ima-all-ai now appears at position Add nix-clawdbot plugin pointers to skill metadata #1

Fix

After all existing recall stages, reconstruct the candidate slug from query tokens (queryTokens.join("-")) and check if it exists in merged results. If missing, perform a single O(1) index lookup via the new lookupExactSlug internalQuery.

This is a minimal, additive change:

  • No existing logic is modified — vector search, matchesExactTokens, and lexicalFallbackSkills remain untouched
  • The lookup is only triggered when the slug is missing from results (vast majority of searches skip it)
  • The recalled skill goes through normal scoring, so SLUG_EXACT_BOOST + NAME_EXACT_BOOST naturally rank it first

Changes

  • convex/search.ts: Add lookupExactSlug internalQuery + post-merge slug recall check in searchSkills
  • convex/search.test.ts: Add 2 new tests (recall when missed + skip when already present) + update existing test mocks

Tests

All 18 tests pass (including 2 new ones):

  • recalls exact slug match even when vector search and lexical fallback both miss it
  • skips exact slug lookup when slug is already in merged results

@vercel
Copy link
Contributor

vercel bot commented Mar 17, 2026

@oolong-tea-2026 is attempting to deploy a commit to the 0xBuns Team on Vercel.

A member of the Team first needs to authorize it.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8380fe4072

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

convex/search.ts Outdated
nonSuspiciousOnly: args.nonSuspiciousOnly,
})) as SkillSearchEntry | null
if (exactSlugEntry) {
mergedMatches = [exactSlugEntry, ...mergedMatches]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Force exact slug hit to survive final top-N truncation

Prepending exactSlugEntry here does not guarantee recall, because the entry is still re-scored/sorted with the rest and can be dropped by .slice(0, limit) when many other matches score higher (especially since this path returns no embeddingId, so vectorScore becomes 0). In small-limit searches (e.g., limit=1) with several strong token-overlap results, the exact slug can still disappear from the returned page, which defeats the stated "always recalled" behavior.

Useful? React with 👍 / 👎.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the ranking concern. This fix intentionally does not force a specific rank — it ensures the exact slug match enters the candidate pool, then participates in normal scoring. With SLUG_EXACT_BOOST = 1.4, the recalled skill will very likely rank first, but the fix respects the existing scoring rules rather than overriding them.

Updated the PR title and commit message to clarify this is about recall, not guaranteed ranking.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 17, 2026

Greptile Summary

This PR fixes a search recall bug where a skill whose exact slug matches the query could be missing from the first page of results. The root cause is that when exactMatches.length >= limit the lexicalFallbackSkills path (which includes a by_slug lookup) is skipped entirely. The fix adds a minimal, additive post-merge check: it reconstructs the candidate slug from queryTokens.join('-'), and if that slug is absent from the merged candidate pool, fires a single O(1) by_slug index lookup via the new lookupExactSlug internalQuery. The recalled skill then flows through the existing scoring pipeline where SLUG_EXACT_BOOST + NAME_EXACT_BOOST naturally rank it first.

Key points from the review:

  • Logic is correct: the early-out guard (mergedMatches.some(e => e.skill.slug === candidateSlug)) prevents both duplicate entries and unnecessary lookups when the skill is already in results.
  • Filters are correctly re-applied: lookupExactSlug re-checks softDeletedAt, nonSuspiciousOnly, highlightedOnly, and owner existence — consistent with all other recall paths.
  • Scoring is sound: entries returned by lookupExactSlug have no embeddingId, so vectorScore defaults to 0; for an exact slug query the lexical boosts alone (~2.5+) comfortably outrank typical vector-only scores (~0.5).
  • Minor test concern: lookupExactSlugHandler is extracted at module scope in search.test.ts but never directly invoked — it may trigger an unused-variable lint warning.

Confidence Score: 4/5

  • This PR is safe to merge; the change is additive and correctly scoped to the missing-slug recall edge case.
  • The core logic is sound: the early-out check prevents duplicate entries and redundant queries, filters are consistently re-applied in lookupExactSlug, and ranking relies on the proven existing scoring pipeline. The two new tests directly cover the fixed scenario and the skip-when-present optimisation. The only finding is a minor unused-variable concern in the test file that does not affect production behaviour.
  • No files require special attention beyond the minor unused-variable lint issue in convex/search.test.ts.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: convex/search.test.ts
Line: 35-39

Comment:
**`lookupExactSlugHandler` declared but never called directly**

`lookupExactSlugHandler` is extracted and typed at module scope but is never invoked in any test — the new tests exercise `lookupExactSlug` only indirectly through `searchSkillsHandler`. This will likely trigger an `no-unused-vars` / `@typescript-eslint/no-unused-vars` lint warning.

Either call it in a dedicated unit test (e.g., testing `nonSuspiciousOnly`/`highlightedOnly` filtering inside `lookupExactSlug` directly), or remove the declaration and rely solely on the integration coverage provided by the `searchSkillsHandler` tests.

```suggestion
// lookupExactSlugHandler is tested indirectly through searchSkillsHandler
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 8380fe4

Comment on lines +35 to +39
const lookupExactSlugHandler = (
lookupExactSlug as unknown as {
_handler: (ctx: unknown, args: unknown) => Promise<SkillSearchEntry | null>
}
)._handler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 lookupExactSlugHandler declared but never called directly

lookupExactSlugHandler is extracted and typed at module scope but is never invoked in any test — the new tests exercise lookupExactSlug only indirectly through searchSkillsHandler. This will likely trigger an no-unused-vars / @typescript-eslint/no-unused-vars lint warning.

Either call it in a dedicated unit test (e.g., testing nonSuspiciousOnly/highlightedOnly filtering inside lookupExactSlug directly), or remove the declaration and rely solely on the integration coverage provided by the searchSkillsHandler tests.

Suggested change
const lookupExactSlugHandler = (
lookupExactSlug as unknown as {
_handler: (ctx: unknown, args: unknown) => Promise<SkillSearchEntry | null>
}
)._handler
// lookupExactSlugHandler is tested indirectly through searchSkillsHandler
Prompt To Fix With AI
This is a comment left during a code review.
Path: convex/search.test.ts
Line: 35-39

Comment:
**`lookupExactSlugHandler` declared but never called directly**

`lookupExactSlugHandler` is extracted and typed at module scope but is never invoked in any test — the new tests exercise `lookupExactSlug` only indirectly through `searchSkillsHandler`. This will likely trigger an `no-unused-vars` / `@typescript-eslint/no-unused-vars` lint warning.

Either call it in a dedicated unit test (e.g., testing `nonSuspiciousOnly`/`highlightedOnly` filtering inside `lookupExactSlug` directly), or remove the declaration and rely solely on the integration coverage provided by the `searchSkillsHandler` tests.

```suggestion
// lookupExactSlugHandler is tested indirectly through searchSkillsHandler
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — removed the unused lookupExactSlugHandler declaration in the latest force-push. lookupExactSlug is now only tested indirectly through searchSkillsHandler, which covers the actual integration path.

@oolong-tea-2026 oolong-tea-2026 force-pushed the fix/exact-slug-recall branch 2 times, most recently from df87b72 to 6e66ca6 Compare March 17, 2026 12:29
@oolong-tea-2026 oolong-tea-2026 changed the title fix: ensure exact slug match is always recalled in search fix: recall exact slug match as search candidate Mar 17, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6e66ca6be2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

convex/search.ts Outdated
nonSuspiciousOnly: args.nonSuspiciousOnly,
})) as SkillSearchEntry | null
if (exactSlugEntry) {
mergedMatches = [exactSlugEntry, ...mergedMatches]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guarantee exact slug candidate cannot be trimmed from top-N

Prepending exactSlugEntry here does not make recall deterministic because the entry is still re-ranked and truncated by the final .sort(...).slice(0, limit) pass; this path also gives it vectorScore = 0 (no embeddingId), so it can lose to other candidates with stronger vector/popularity scores. In small-limit searches (for example limit=1 with a short slug query like foo, where many other results receive similar lexical boosts), the exact slug can still be dropped from the returned page, so the new "always recalled" guarantee is not actually enforced.

Useful? React with 👍 / 👎.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify the terminology: "recall" here refers to the candidate retrieval stage of the search pipeline (vector search → exactMatches filter → lexicalFallback → slug lookup), not the final user-facing results. The fix ensures the exact slug skill enters the candidate pool before scoring — it does not guarantee it appears in the final top-N.

The final ranking is entirely up to the existing scoreSkillResult logic. This is by design: we want fair scoring, not a forced override.

Also removed an unrelated .gitignore change from the commit.

When searching for a skill by its exact slug (e.g. 'ima-all-ai'), the
result could be missing from the candidate pool because:

1. The vector search candidate pool (75-256) might not include the skill
2. matchesExactTokens uses `.some()` (by design, see openclaw#27), so unrelated
   skills with common tokens like 'ai' inflate exactMatches count
3. When exactMatches >= limit, lexicalFallbackSkills is skipped entirely,
   along with its exact slug lookup via the by_slug index

Fix: after all recall stages (vector + exactMatches + lexicalFallback),
reconstruct the candidate slug from query tokens and check if it exists
in merged results. If missing, perform a single O(1) index lookup via
the new `lookupExactSlug` internalQuery to ensure it enters the
candidate pool. It then participates in normal scoring and ranking —
SLUG_EXACT_BOOST (1.4) makes it very likely to rank first, but the
fix does not force any particular ranking.

Co-Authored-By: 戴硕 <daishuo@gmail.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c39848cdc8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// Ensure an exact slug match is always recalled, even when the vector
// candidate pool and lexical fallback both missed it. The candidate
// slug is reconstructed from query tokens (e.g. "ima all ai" → "ima-all-ai").
const candidateSlug = queryTokens.join("-");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve raw slug spelling when recalling exact slugs

Reconstructing the lookup key with queryTokens.join("-") loses valid slug spellings that publishVersionForUser still accepts in convex/lib/skillPublish.ts:85-89, notably repeated or trailing hyphens. For example, searching the exact slug foo--bar produces candidateSlug === "foo-bar", and foo- produces "foo", so this new fallback still misses the intended skill whenever the vector path saturates exactMatches and skips lexicalFallbackSkills. That means the fix does not actually restore exact-slug recall for part of the slug space the app allows today.

Useful? React with 👍 / 👎.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slugs like foo--bar or foo- are technically accepted by the validation regex, but that is arguably a validation bug — not a recall bug. We sampled ~700 real slugs on ClawHub and found zero instances of double hyphens, trailing hyphens, or any pattern where tokenize().join("-") diverges from the stored slug. Not fixing this here.

Comment on lines +206 to +209
const candidateSlug = queryTokens.join("-");
if (
/^[a-z0-9][a-z0-9-]*$/.test(candidateSlug) &&
!mergedMatches.some((e) => e.skill.slug === candidateSlug)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate the fallback lookup on the raw query being slug-shaped

This condition is effectively true for every non-empty search, because tokenize() in convex/lib/searchText.ts:1-10 only emits [a-z0-9]+ tokens and queryTokens.join("-") therefore always matches the slug regex. In practice, ordinary queries like weather forecast now incur an extra skills.by_slug lookup unless the merged results already contain weather-forecast, so the hot search path pays an additional Convex read on nearly every request instead of only on actual slug lookups.

Useful? React with 👍 / 👎.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional. A user searching weather forecast (with a space) should recall the skill with slug weather-forecast — that is the whole point of this fix. The by_slug index lookup is O(1) and only fires when the slug is absent from merged results, so the cost is negligible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant