monkeSearch

A prototype system that brings natural language search capabilities to your file system (macOS only for now), allowing you to search for files using everyday language like "python scripts from last week" or "photos from yesterday". Nothing leaves your pc, offline inference and can even run on potato PCs. You don't need a massive GPU rig to run the small model backing the intelligence.

⚠️ Prototype: This is an initial proof-of-concept implementation. Expect rough edges and limited functionality. Currently aimed at macOS but the logic is independent for cross platform adaptations. (In the works!) visit discussions

Developer note:

I've been working on this project since long and this idea had many versions. Future plans include finetuning Gemma 3 270M and adding more smart features like temporal expressions and operators + smarter aggregation (See future plans and please help me in implementing them! ).
The current turnaround time for this tool to recieve a query and give out files is around 1 second and doesn't exceed it, the largest bottleneck is model inference. This is under active development and any new suggestions + PRs are welcome. My goal for this tool is to be open source, safe and cross platform. So developers experienced in Windows/Linux Indexing are very welcome to collaborate and develop their versions together.

please star the repo too, if you've read it till here :P

Overview

shows zero results because i don't have any videos related to "party"

This system combines:

AI-powered query parsing using a local LLM (Qwen 0.6B) via llama.cpp to understand natural language
Native macOS Spotlight integration for fast, efficient file searching. (cross platform support is very welcome for development!)
Intelligent file type recognition that understands context (e.g., "resume" → PDF/DOCX files)
Temporal expression parsing for time-based searches. (3 weeks ago, 10 months ago, etc.)

Implementation versions

There are multiple implementations in different branches written in achieving the same task, for testing purposes. Rigorous evals and testing will be done before finalizing on a single one for the main release.

Initial implementation using LangExtract (Both Ollama and local llama cpp server support)
llama.cpp rewrite to remove dependency on LangExtract (this branch)
llama.cpp feature branch with more detailed response model. Currently being worked upon, and evals are being done.

Example Queries

You can convert any natural language query to 3 major constituents: File type, temporal data (time related), and miscellaneous (file name, path etc.) I used this idea as base to build the whole project, and yes it is that simple.

Natural Language Query	What It Finds
`"photos from yesterday"`	Image files modified in the last day
`"python scripts from three days ago"`	.py and .ipynb files from 3 days ago
`"old music files"`	Audio files with "old" in name or content
`"pdf invoices from 2023"`	PDF files from 2023 with "invoices" keyword
`"resume from last week"`	Recent DOC/DOCX/PDF files with "resume"
`"code files"`	Source code files of any language

Requirements

macOS (required for Spotlight integration)
Python 3.8+
llama-cpp-python with Qwen3-0.6B GGUF (local LLM inference)

Currently planning to fine tune Gemma 3 270M for a smaller and faster model for this use case.

Installation

1. Clone the Repository

git clone https://github.com/monkesearch/monkesearch
cd monkeSearch

2. Install dependencies using the requirements file:

pip install -r requirements.txt

3. Install and Setup llama-cpp-python

See the llama-cpp-python installation guide for detailed instructions.

You'll need to download Qwen3-0.6B GGUF model file and place it in your project directory.

4. Verify Setup

# Test the parser
python parser.py "python files from yesterday"

Usage

Command Line

cd app/
# Basic search
python parser.py "photos from last week"

# More examples
python parser.py "python scripts modified yesterday"
python parser.py "pdf invoices from 2023"
python parser.py "music files"
python parser.py "old presentations"

As a Module

from parser import FileSearchParser

# Initialize the parser
searcher = FileSearchParser()

# Perform a search
results, parsed_data = searcher.search("python files from last week")

# results contains file paths
for path in results:
    print(path)

Limitations

Indexed Files Only: Only searches files indexed by Spotlight
Local Model Limitations: The small AI model may misunderstand very complex queries
Basic Temporal Parsing: Currently supports simple time expressions (More features to be added soon! See technical for planned features)

License

Apache-2.0 license

Star History

Acknowledgments

Big thanks to utitools
llama-cpp-python for local LLM inference
Uses Apple's Spotlight and Foundation frameworks.

Note: This is an experimental prototype created to explore natural language file searching on macOS. It's not production-ready and should be used for experimentation and learning purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
app		app
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

monkeSearch

Developer note:

Overview

Implementation versions

Example Queries

You can convert any natural language query to 3 major constituents: File type, temporal data (time related), and miscellaneous (file name, path etc.) I used this idea as base to build the whole project, and yes it is that simple.

Requirements

Installation

1. Clone the Repository

2. Install dependencies using the requirements file:

3. Install and Setup llama-cpp-python

4. Verify Setup

Usage

Command Line

As a Module

Limitations

License

Star History

Acknowledgments

About

Uh oh!

Contributors 2

Uh oh!

Languages

License

monkesearch/monkeSearch

Folders and files

Latest commit

History

Repository files navigation

monkeSearch

Developer note:

Overview

Implementation versions

Example Queries

You can convert any natural language query to 3 major constituents: File type, temporal data (time related), and miscellaneous (file name, path etc.) I used this idea as base to build the whole project, and yes it is that simple.

Requirements

Installation

1. Clone the Repository

2. Install dependencies using the requirements file:

3. Install and Setup llama-cpp-python

4. Verify Setup

Usage

Command Line

As a Module

Limitations

License

Star History

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages