Skip to content

Add mixture of queries #10

@nicolo-rinaldi

Description

@nicolo-rinaldi

Description

During the generation of the queries in the dataset_generator module, we are setting a parameter called max_query_terms to choose the length (in words) of the generated queries. This value is then used to design the prompt that is sent to the LLM. This means that the queries of the evaluation dataset end up being less diverse than the queries of the users.

Idea

Add a mixture of queries feature, where instead of fixing a number of words for each query, we randomly sample in {1, ..., n, None}, where None means that we don't give a restriction on the number of words for each query, ending up with long and detailed natural language queries.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions