Skip to content

Fix multi-word search returning 0 results with JenaText#1932

Open
fvogel wants to merge 1 commit intoNatLibFi:mainfrom
fvogel:fix/multiword-search-space-escape
Open

Fix multi-word search returning 0 results with JenaText#1932
fvogel wants to merge 1 commit intoNatLibFi:mainfrom
fvogel:fix/multiword-search-space-escape

Conversation

@fvogel
Copy link

@fvogel fvogel commented Feb 12, 2026

Summary

Fixes #1930 — multi-word search queries (e.g. "Siamese cat", "Labrador retriever") return 0 results when using sparqlDialect "JenaText".

Root cause: LUCENE_ESCAPE_CHARS includes a space character, so createTextQueryCondition() escapes spaces as \ , producing a single-token query like "Siamese\ cat*". Since StandardAnalyzer tokenizes labels into individual words, no token ever contains a literal space and the query never matches.

Fix:

  • Remove space from LUCENE_ESCAPE_CHARS
  • Split multi-word terms on whitespace and prefix each word with + (Lucene "required" operator)
  • "Siamese cat*" becomes "+Siamese +cat*" — each word must match independently

Test plan

  • Single-word search still works: Siamese* → 1 result
  • Multi-word prefLabel search now works: Siamese cat* → 1 result (was 0)
  • Multi-word altLabel search now works: Domestic cat* → 1 result (was 0)
  • Wildcard expansion preserved: Lab* → 1 result (Labrador retriever)
  • Existing PHPUnit tests pass

Tested against Skosmos 3.1 with Fuseki 5.4.0 (StandardAnalyzer, default config).

Remove space from LUCENE_ESCAPE_CHARS and split multi-word queries
into individual required Lucene terms using the '+' operator.

Previously, spaces were escaped as '\ ', creating a single-token query
like "Siamese\ cat*" that never matches any indexed term because
StandardAnalyzer tokenizes labels into individual words.

Now "Siamese cat*" becomes "+Siamese +cat*", requiring each word to
match independently.

Fixes NatLibFi#1930
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JenaTextSparql: multi-word search returns 0 results (space escaped as Lucene special character)

1 participant