A prototype system that brings natural language search capabilities to your file system (macOS only for now), allowing you to search for files using everyday language like "python scripts from last week" or "photos from yesterday". Nothing leaves your pc, offline inference and can even run on potato PCs. You don't need a massive GPU rig to run the small model backing the intelligence.
⚠️ Prototype: This is an initial proof-of-concept implementation. Expect rough edges and limited functionality. Currently aimed at macOS but the logic is independent for cross platform adaptations. (In the works!) visit discussions
I've been working on this project since long and this idea had many versions. Future plans include finetuning Gemma 3 270M and adding more smart features like temporal expressions and operators + smarter aggregation (See future plans and please help me in implementing them! ).
The current turnaround time for this tool to recieve a query and give out files is around 1 second and doesn't exceed it, the largest bottleneck is model inference. This is under active development and any new suggestions + PRs are welcome. My goal for this tool is to be open source, safe and cross platform. So developers experienced in Windows/Linux Indexing are very welcome to collaborate and develop their versions together.please star the repo too, if you've read it till here :P
shows zero results because i don't have any videos related to "party"
This system combines:
- AI-powered query parsing using a local LLM (Qwen 0.6B) via llama.cpp to understand natural language
- Native macOS Spotlight integration for fast, efficient file searching. (cross platform support is very welcome for development!)
- Intelligent file type recognition that understands context (e.g., "resume" → PDF/DOCX files)
- Temporal expression parsing for time-based searches. (3 weeks ago, 10 months ago, etc.)
There are multiple implementations in different branches written in achieving the same task, for testing purposes. Rigorous evals and testing will be done before finalizing on a single one for the main release.
-
Initial implementation using LangExtract (Both Ollama and local llama cpp server support)
-
llama.cpp rewrite to remove dependency on LangExtract (this branch)
-
llama.cpp feature branch with more detailed response model. Currently being worked upon, and evals are being done.
You can convert any natural language query to 3 major constituents: File type, temporal data (time related), and miscellaneous (file name, path etc.) I used this idea as base to build the whole project, and yes it is that simple.
Natural Language Query | What It Finds |
---|---|
"photos from yesterday" |
Image files modified in the last day |
"python scripts from three days ago" |
.py and .ipynb files from 3 days ago |
"old music files" |
Audio files with "old" in name or content |
"pdf invoices from 2023" |
PDF files from 2023 with "invoices" keyword |
"resume from last week" |
Recent DOC/DOCX/PDF files with "resume" |
"code files" |
Source code files of any language |
- macOS (required for Spotlight integration)
- Python 3.8+
- llama-cpp-python with Qwen3-0.6B GGUF (local LLM inference)
Currently planning to fine tune Gemma 3 270M for a smaller and faster model for this use case.
git clone https://github.com/monkesearch/monkesearch
cd monkeSearch
pip install -r requirements.txt
See the llama-cpp-python installation guide for detailed instructions.
You'll need to download Qwen3-0.6B GGUF model file and place it in your project directory.
# Test the parser
python parser.py "python files from yesterday"
cd app/
# Basic search
python parser.py "photos from last week"
# More examples
python parser.py "python scripts modified yesterday"
python parser.py "pdf invoices from 2023"
python parser.py "music files"
python parser.py "old presentations"
from parser import FileSearchParser
# Initialize the parser
searcher = FileSearchParser()
# Perform a search
results, parsed_data = searcher.search("python files from last week")
# results contains file paths
for path in results:
print(path)
- Indexed Files Only: Only searches files indexed by Spotlight
- Local Model Limitations: The small AI model may misunderstand very complex queries
- Basic Temporal Parsing: Currently supports simple time expressions (More features to be added soon! See technical for planned features)
Apache-2.0 license
- Big thanks to utitools
- llama-cpp-python for local LLM inference
- Uses Apple's Spotlight and Foundation frameworks.
Note: This is an experimental prototype created to explore natural language file searching on macOS. It's not production-ready and should be used for experimentation and learning purposes.