Skip to content

Combining Querys with BooleanQuerys #7

@ZeroCool2u

Description

@ZeroCool2u

Hi @coady, thanks for all your hard work on lupyne, its been super helpful for me! I used your Dockerfile as a basis for compiling JCC & PyLucene to wheel files in my own non-Docker environment and now I've been able to successfully run some of the examples and setup my own 14 GB corpus, index it to a directory, and do some basic searches based on the examples you provided in the docs.

Right now I'm trying to write a slightly more complex query, but was having some trouble and hoping you might be able to point me in the right direction.

I have a fairly simple index that has 4 stored fields. A text field containing the article text, a text field containing the name of the company (the list of company names is finite and each document is associated with exactly one company), a datetime field that contains the date the article was published, and an article id.

I'm trying to write a query that does the following: find all documents that contain the phrase "lupyne is great" and occur between some arbitrary date range and that have a company_name field value of 'company a', 'company_b', or 'company_c'.

I've tried the following:

import lucene
from lupyne import engine
from datetime import date

assert lucene.getVMEnv() or lucene.initVM()

index_path: str = r'myindexdir'

query_str: str = 'lupyne is great'
start_date: date = date(year=2020, month=2, day=14)
companies: [str] = ['company a', 'company b', 'company c']

indexer = engine.Indexer(index_path, mode='r', nrt=True)

indexer.set('article_id', stored=True)
indexer.set('company_name', stored=True)
indexer.set('date', engine.DateTimeField, stored=True)
indexer.set('text', engine.Field.Text, stored=True)

query_engine = engine.Query

# The following works with the query string 'lupyne'
query_str: str = 'lupyne'
query = indexer.fields['date'].range(start_date, None) & query_engine.term('text', query_str)

# This does not with the query_string 'lupyne is great',
query_str: str = 'lupyne is great'
query = indexer.fields['date'].range(start_date, None) & query_engine.phrase('text', query_str)
# TypeError: unsupported operand type(s) for &: 'Query' and 'MultiPhraseQuery'

# This also does not work
range_query = query_engine.range('date', date_field.timestamp(start_date), None)
# java.lang.IncompatibleClassChangeError
#        at org.apache.lucene.util.BytesRef.<init>(BytesRef.java:84)

# This will also break
range_query = query_engine.range('date', start_date, None)
# lucene.InvalidArgsError: (<class 'org.apache.lucene.util.BytesRef'>, '__init__', (datetime.date(2021, 2, 2),))

Any suggestions on how I might go about this? Thanks again for all the hard work!

EDIT: So, it looks like this might be because Query.ranges() doesn't return a lupyne Query object as seen here, but instead directly returns a pylucene query object. Any good way to get around this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions