-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Hi @coady, thanks for all your hard work on lupyne, its been super helpful for me! I used your Dockerfile as a basis for compiling JCC & PyLucene to wheel files in my own non-Docker environment and now I've been able to successfully run some of the examples and setup my own 14 GB corpus, index it to a directory, and do some basic searches based on the examples you provided in the docs.
Right now I'm trying to write a slightly more complex query, but was having some trouble and hoping you might be able to point me in the right direction.
I have a fairly simple index that has 4 stored fields. A text field containing the article text, a text field containing the name of the company (the list of company names is finite and each document is associated with exactly one company), a datetime field that contains the date the article was published, and an article id.
I'm trying to write a query that does the following: find all documents that contain the phrase "lupyne is great" and occur between some arbitrary date range and that have a company_name field value of 'company a', 'company_b', or 'company_c'.
I've tried the following:
import lucene
from lupyne import engine
from datetime import date
assert lucene.getVMEnv() or lucene.initVM()
index_path: str = r'myindexdir'
query_str: str = 'lupyne is great'
start_date: date = date(year=2020, month=2, day=14)
companies: [str] = ['company a', 'company b', 'company c']
indexer = engine.Indexer(index_path, mode='r', nrt=True)
indexer.set('article_id', stored=True)
indexer.set('company_name', stored=True)
indexer.set('date', engine.DateTimeField, stored=True)
indexer.set('text', engine.Field.Text, stored=True)
query_engine = engine.Query
# The following works with the query string 'lupyne'
query_str: str = 'lupyne'
query = indexer.fields['date'].range(start_date, None) & query_engine.term('text', query_str)
# This does not with the query_string 'lupyne is great',
query_str: str = 'lupyne is great'
query = indexer.fields['date'].range(start_date, None) & query_engine.phrase('text', query_str)
# TypeError: unsupported operand type(s) for &: 'Query' and 'MultiPhraseQuery'
# This also does not work
range_query = query_engine.range('date', date_field.timestamp(start_date), None)
# java.lang.IncompatibleClassChangeError
# at org.apache.lucene.util.BytesRef.<init>(BytesRef.java:84)
# This will also break
range_query = query_engine.range('date', start_date, None)
# lucene.InvalidArgsError: (<class 'org.apache.lucene.util.BytesRef'>, '__init__', (datetime.date(2021, 2, 2),))Any suggestions on how I might go about this? Thanks again for all the hard work!
EDIT: So, it looks like this might be because Query.ranges() doesn't return a lupyne Query object as seen here, but instead directly returns a pylucene query object. Any good way to get around this?