Use a more sane tokenizer for source code search

### Feature Description

As of today, the elastic search search uses the [default analizer](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.html) when indexing the source code contents. This implementation uses whitespaces to break the tokens. 

I feel this approach is not particularly suitable for source code search. To illustrate the issue, let us consider the code snippet below:

```
public baz(Foo foo) {
   return foo.bar();
}
```

It is fair to think that searching for `bar` returns the code above. As of today, however, this is not the case: ES will assume that `foo.bar()` is a single token. As such, ES will not match the criterion `bar`.

I suggest we use the [pattern tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-analyzer.html) instead.  It uses regular expressions to separate tokens. By default, it uses any (non-word character as a token separator).  In such a case, the snippet `foo.bar()` would yield two tokens -- `foo` and `bar` (the second token will match the given criterion).

What do you guys think? 



### Screenshots

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Use a more sane tokenizer for source code search #32220

Feature Description

Screenshots

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Use a more sane tokenizer for source code search #32220

Description

Feature Description

Screenshots

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions