Skip to content

API proposal: radical alternative search (FT.SEARCH, FT.AGGREGATE, FT.HYBRID) #455

@mjgaux

Description

@mjgaux

(this is actually @mgravell, using a non-work device; blame org security! I had the same problems when at MSFT)

Statement: RediSearch is hard. The API requires knowing lots of awkward syntax and dialects, and the differences between FT.SEARCH, FT.AGGREGATE and FT.HYBRID. Additionally, the type model is super weak, meaning: the queries are all strings, and any model mapping is manual.

C# generalized query way back in .NET 3.5 with LINQ and the "expression" API.

Proposal: we could overhaul RediSearch with a LINQ-like (not IQuerayble<T> - something bespoke) API.

Consider defining your model like:

[Index("ft_myindex")]
public class Customer
{
    [Key("id")] public int Id { get; set; }
    [Text("name")] public string? Name { get; set; }
    [Text("url")] public string? Url { get; set; }
    [Text("country")] public string? Country { get; set; }
    [Text("title")] public string? Title { get; set; }
    [Timestamp("timestamp")] public DateTime Date { get; set; }
    [Tags("cat")] public string[] Categories { get; set; } = [];
}
...
var query = db.Query<Customer>();

now imagine we can do things like:

// FT.AGGREGATE ft_myindex GROUPBY 0 REDUCE count 0
var count = query.Aggregate().Count();

// FT.AGGREGATE ft_myindex GROUPBY 0 REDUCE max 1 @timestamp
var maxDate = query.Aggregate().Max(x => x.Date);

// FT.AGGREGATE ft_myindex GROUPBY 0 REDUCE count 0 AS count REDUCE min 1 @timestamp AS min REDUCE max 1 @timestamp as max 
var composite =
    (from row in query
    group row by Index.All()
    into agg
    select (
        Count: agg.Count(),
        Min: agg.Min(x => x.Date),
        Max: agg.Max(x => x.Date)
    )).Single(); 

That was fun, but we can imagine more exotic things:

// FT.SEARCH ft_myindex @title:$x0 RETURN 2 id name PARAMS 2 x0 dogs DIALECT 2
var rows =
    from x in query
    where x.Title == "dogs"
    select new { x.Id, x.Name, Score = Index.Score() };

// FT.SEARCH ft_myindex @cat:{$x0} RETURN 1 id PARAMS 2 x0 foo DIALECT 2
var rows2 =
    from x in query
    where x.Categories.Contains("foo")
    select x.Id;

// FT.SEARCH ft_myindex @cat:{$x0} RETURN 1 id LIMIT 10 20 PARAMS 2 x0 foo DIALECT 2
var page = rows2.Skip(10).Take(20);

// FT.SEARCH ft_myindex @title:$x0 RETURN 2 id name SORTBY id DESC PARAMS 2 x0 dogs DIALECT 2
var ordered = from row in rows
    orderby row.Id descending
    select row;

// FT.SEARCH ft_myindex @title:$x0 RETURN 2 id name SCORER TFIDF PARAMS 2 x0 dogs DIALECT 2
var withScorers = rows.WithScorer(Scorers.TfIdf);

var urls = from row in query
    where row.Url == "about.html"
    group row by (row.Country, row.Date.Date)
    into grp
    orderby grp.Key.Date descending
    select (grp.Key, visits: grp.Count());

var grouped = from row in rows
    group row by row.Name
    into g
    select new { g.Key, Count = g.Count() };

I think we can also build FT.HYBRID into the same, but I haven't scratched through the API yet; presumably something like:

var hybrid =  query.Where(x => x.Name == "foo).CombineVector(x => x.VectorField, vectorValue) ... blah

We would also presumably consider "prepared" queries, meaning: we can parameterize everything and pre-process the expressions once rather than processing the LINQ each time.

Advantages:

  • the users are now using a strong type model that matches their data
  • the users haven't needed to learn a new .NET API, or the various FT.blah syntax
  • the library would deal with all data binding into their projections etc

The effort involved here isn't trivial, but also isn't infeasible - I'm pretty experienced with LINQ metaprogramming.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions