-
Notifications
You must be signed in to change notification settings - Fork 55
Description
(this is actually @mgravell, using a non-work device; blame org security! I had the same problems when at MSFT)
Statement: RediSearch is hard. The API requires knowing lots of awkward syntax and dialects, and the differences between FT.SEARCH, FT.AGGREGATE and FT.HYBRID. Additionally, the type model is super weak, meaning: the queries are all strings, and any model mapping is manual.
C# generalized query way back in .NET 3.5 with LINQ and the "expression" API.
Proposal: we could overhaul RediSearch with a LINQ-like (not IQuerayble<T> - something bespoke) API.
Consider defining your model like:
[Index("ft_myindex")]
public class Customer
{
[Key("id")] public int Id { get; set; }
[Text("name")] public string? Name { get; set; }
[Text("url")] public string? Url { get; set; }
[Text("country")] public string? Country { get; set; }
[Text("title")] public string? Title { get; set; }
[Timestamp("timestamp")] public DateTime Date { get; set; }
[Tags("cat")] public string[] Categories { get; set; } = [];
}
...
var query = db.Query<Customer>();
now imagine we can do things like:
// FT.AGGREGATE ft_myindex GROUPBY 0 REDUCE count 0
var count = query.Aggregate().Count();
// FT.AGGREGATE ft_myindex GROUPBY 0 REDUCE max 1 @timestamp
var maxDate = query.Aggregate().Max(x => x.Date);
// FT.AGGREGATE ft_myindex GROUPBY 0 REDUCE count 0 AS count REDUCE min 1 @timestamp AS min REDUCE max 1 @timestamp as max
var composite =
(from row in query
group row by Index.All()
into agg
select (
Count: agg.Count(),
Min: agg.Min(x => x.Date),
Max: agg.Max(x => x.Date)
)).Single();
That was fun, but we can imagine more exotic things:
// FT.SEARCH ft_myindex @title:$x0 RETURN 2 id name PARAMS 2 x0 dogs DIALECT 2
var rows =
from x in query
where x.Title == "dogs"
select new { x.Id, x.Name, Score = Index.Score() };
// FT.SEARCH ft_myindex @cat:{$x0} RETURN 1 id PARAMS 2 x0 foo DIALECT 2
var rows2 =
from x in query
where x.Categories.Contains("foo")
select x.Id;
// FT.SEARCH ft_myindex @cat:{$x0} RETURN 1 id LIMIT 10 20 PARAMS 2 x0 foo DIALECT 2
var page = rows2.Skip(10).Take(20);
// FT.SEARCH ft_myindex @title:$x0 RETURN 2 id name SORTBY id DESC PARAMS 2 x0 dogs DIALECT 2
var ordered = from row in rows
orderby row.Id descending
select row;
// FT.SEARCH ft_myindex @title:$x0 RETURN 2 id name SCORER TFIDF PARAMS 2 x0 dogs DIALECT 2
var withScorers = rows.WithScorer(Scorers.TfIdf);
var urls = from row in query
where row.Url == "about.html"
group row by (row.Country, row.Date.Date)
into grp
orderby grp.Key.Date descending
select (grp.Key, visits: grp.Count());
var grouped = from row in rows
group row by row.Name
into g
select new { g.Key, Count = g.Count() };
I think we can also build FT.HYBRID into the same, but I haven't scratched through the API yet; presumably something like:
var hybrid = query.Where(x => x.Name == "foo).CombineVector(x => x.VectorField, vectorValue) ... blah
We would also presumably consider "prepared" queries, meaning: we can parameterize everything and pre-process the expressions once rather than processing the LINQ each time.
Advantages:
- the users are now using a strong type model that matches their data
- the users haven't needed to learn a new .NET API, or the various FT.blah syntax
- the library would deal with all data binding into their projections etc
The effort involved here isn't trivial, but also isn't infeasible - I'm pretty experienced with LINQ metaprogramming.