Add wildcard analyzer #392

MBkkt · 2023-12-15T18:12:43Z

Upstream PRs

3.10:
3.11:
3.12: arangodb/enterprise-preview:devel-nightly

arangodb-docs-automation · 2023-12-15T18:12:46Z

Deploy Preview Available Via
https://deploy-preview-392--docs-hugo.netlify.app

site/content/3.12/index-and-search/analyzers.md

Simran-B · 2023-12-22T17:00:35Z

The Analyzer seems to work differently than the one in Elastic, at least based on what they show in their blog post. They don't seem to create all of the n-grams, e.g. for avocado and n-gram size = 3, they would apparently add avo, oca, ado to the index but our Analyzer returns:

av, avo, voc, oca, cad, ado, do, o.

The first token seems to always be a prefix with ngramSize - 1? Then all the trigrams, and I suppose the rest of the strings down to a size of 1 are for suffix matching. I don't quite understand the first entry. If ngramSize is e.g. 5 but the input is shorter than this, the first two entries are identical. Is this intentional?

MBkkt · 2023-12-22T17:33:57Z

@Simran-B

It returns all ngram with specified ngramSize, and all suffix ngrams smaller than ngramSize

It also change input text to the \xFFtext\xFF -- \xFF invalid utf-8 and cannot appear in any text, so it real output is
\xFFav, avo, voc, oca, cad, ado, do\xFF, o\xFF

This is necessary to speedup queries with max sub pattern size smaller than ngramSize and prefix/suffix queries

MBkkt · 2023-12-22T17:53:12Z

@Simran-B

The Analyzer seems to work differently than the one in Elastic, at least based on what they show in their blog post. They don't seem to create all of the n-grams, e.g. for avocado and n-gram size = 3, they would apparently add avo, oca, ado to the index but our Analyzer returns:

It's just simplification of blog-post
If analyzer will produce for avocado only avo oca ado.
It will imposible to fast find it with query like %cad%

But it's possible on search phase avoid all tokens which intersects, except first and last ngram
So if you honestly ngram avocado, it's possible to search avocado with \xFFav oca do\xFF, and you don't need to search avo, voc, cad, ado
It can be better in some cases, but from our measurements commonly it slower (because smaller count of distinct terms in approximation query)

…kt-patch-1

site/content/3.12/index-and-search/analyzers.md

MBkkt

lgtm

Simran-B · 2024-02-05T13:19:48Z

/generate

…kt-patch-1

Simran-B · 2024-02-06T17:02:41Z

/generate

Simran-B · 2024-02-06T17:09:54Z

The API change described in #447 broke an example. The example erroneously tried to drop a collection that is part of a graph but dropping the example graph drops all graph collections anyway.

Another issue that surfaced is that curl examples that purposefully trigger an error need to make use of // xpError(...) or the new toolchain complains about an unexpected error. In the old toolchain, the correct example behavior was only ensured by assert() statements.

…--- to em dash This interfered with ArangoSearch wildcard Analyzer examples in result tables where the verbatim -- needs to be displayed

Simran-B · 2024-02-06T17:44:21Z

/commit

…Show more button

nerpaula

LGTM

Add wildcard analyzer

3cba4fc

cla-bot bot added the cla-signed label Dec 15, 2023

MBkkt commented Dec 15, 2023

View reviewed changes

site/content/3.12/index-and-search/analyzers.md Outdated Show resolved Hide resolved

Update site/content/3.12/index-and-search/analyzers.md

57bac9f

Simran-B self-assigned this Dec 22, 2023

Simran-B added 3 commits January 11, 2024 09:33

Rework

7b73834

Merge branch 'main' of https://github.com/arangodb/docs-hugo into MBk…

ce18718

…kt-patch-1

Merge branch 'main' of https://github.com/arangodb/docs-hugo into MBk…

78a929a

…kt-patch-1

MBkkt commented Jan 29, 2024

View reviewed changes

site/content/3.12/index-and-search/analyzers.md Outdated Show resolved Hide resolved

aMahanna mentioned this pull request Feb 2, 2024

DE-749 | wildcard analyzer [3.12] arangodb/python-arango#324

Merged

Add examples and more release notes

2d1b2f6

MBkkt commented Feb 5, 2024

View reviewed changes

Simran-B marked this pull request as ready for review February 5, 2024 13:18

Merge branch 'main' into MBkkt-patch-1

c614b7a

Simran-B added the 9 Waiting for CI label Feb 5, 2024

Simran-B requested a review from nerpaula February 5, 2024 13:20

Simran-B added 2 commits February 6, 2024 10:31

Merge branch 'main' of https://github.com/arangodb/docs-hugo into MBk…

d48b193

…kt-patch-1

Don't try to drop collections that dropGraph deletes anyway

7550dd6

Simran-B added 4 commits February 6, 2024 18:30

Fix old headline level 2 markup

dc3dc3e

Replace intentional double hyphen with en dash

cce3d63

Add missing code block markup

e76e63f

Disable automatic typography feature that converts -- to en dash and …

806bfc3

…--- to em dash This interfered with ArangoSearch wildcard Analyzer examples in result tables where the verbatim -- needs to be displayed

[skip ci] Automatic commit of generated files from CircleCI

3414edb

This comment was marked as duplicate.

Sign in to view

cla-bot bot removed the cla-signed label Feb 6, 2024

Add line breaks to code to avoid the content getting cut off without …

ebc93fa

…Show more button

This comment was marked as duplicate.

Sign in to view

Simran-B removed the 9 Waiting for CI label Feb 6, 2024

nerpaula approved these changes Feb 7, 2024

View reviewed changes

nerpaula merged commit 8515e34 into main Feb 7, 2024

nerpaula deleted the MBkkt-patch-1 branch February 7, 2024 10:48

Add wildcard analyzer #392

Add wildcard analyzer #392

Uh oh!

Conversation

MBkkt commented Dec 15, 2023 • edited by Simran-B Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Upstream PRs

Uh oh!

arangodb-docs-automation bot commented Dec 15, 2023

Uh oh!

Uh oh!

Simran-B commented Dec 22, 2023

Uh oh!

MBkkt commented Dec 22, 2023

Uh oh!

MBkkt commented Dec 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

MBkkt left a comment

Choose a reason for hiding this comment

Uh oh!

Simran-B commented Feb 5, 2024

Uh oh!

Simran-B commented Feb 6, 2024

Uh oh!

Simran-B commented Feb 6, 2024

Uh oh!

Simran-B commented Feb 6, 2024

Uh oh!

This comment was marked as duplicate.

This comment was marked as duplicate.

nerpaula left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MBkkt commented Dec 15, 2023 •

edited by Simran-B

Loading

MBkkt commented Dec 22, 2023 •

edited

Loading