This project evaluates multiple local AI models for generating Kotlin code and code diffs for Android development tasks. It uses LangChain4j to connect to local models (via LM Studio) and validates the generated code using tree-sitter parsing.
More resources for this project:
- Screen Recording: Which Local Model is the “Best” for Android Development?)
- Presentation: Google Slide
- Result Data: Google Sheet
The project consists of two main components:
- Code Generation (
notebooks/kotlin/code-generation.ipynb): A Kotlin Jupyter notebook that generates code using multiple AI models - Validation (
run_validation.py): A Python script that validates the generated code against predefined criteria
- Python 3.x with pip
- Jupyter Notebook
- Kotlin Jupyter kernel
- LM Studio (or compatible local model server) running on
http://127.0.0.1:1234 - Tree-sitter binaries (already included in
build/tree-sitter-binaries/)
pip install -r requirements.txtThe code-generation.ipynb notebook generates code for 5 different tasks:
- test1-preview: Generate Kotlin Preview composables
- test2-unit-test: Generate unit tests
- test3-instrumentation-test: Generate instrumentation tests
- test4-deprecated-material: Generate diff for Material deprecation migration
- test5-deprecated-plugin: Generate diff for plugin migration
microsoft/phi-4openai/gpt-oss-20bmistralai/devstral-small-2-2512google/gemma-3-27bqwen/qwen3-coder-30b
- Start LM Studio and load a model
- Open
notebooks/kotlin/code-generation.ipynbin Jupyter - Execute all cells sequentially
The notebook will:
- Connect to the local model server
- Generate code for each task using each model
- Monitor resource usage (RAM/VRAM)
- Save results to
build/directories - Generate execution metrics CSV files
build/
├── test1-preview/
│ ├── result1-microsoft_phi-4.kt
│ ├── result2-openai_gpt-oss-20b.kt
│ ├── ...
│ └── execution-results.csv
├── test2-unit-test/
│ └── ...
└── ...
Each task directory contains:
- Generated code files (
.ktor.diff) execution-results.csvwith metrics (duration, token counts, RAM/VRAM usage)
The run_validation.py script validates generated code using tree-sitter parsing to check:
- Kotlin files: Syntax correctness, required annotations, imports, function counts
- Diff files: Correct syntax, required deletions/additions
python run_validation.py \
--mappings resources/config/file-mappings-target.json \
--output build/validation-results/results-target.csv- test1-preview: Valid syntax,
@Previewand@Composableannotations - test2-unit-test: Valid syntax, required imports, use case constructor, exactly 2 test functions
- test3-instrumentation-test: Valid syntax, TopicEntity import, DatabaseTest implementation, at least 5 test functions
- test4-deprecated-material: Valid diff syntax, required Material API changes
- test5-deprecated-plugin: Valid diff syntax, required plugin migration changes
The output CSV includes validation results for each model-task combination with success/failure details.
.
├── notebooks/kotlin/
│ └── code-generation.ipynb # Main code generation notebook
├── resources/
│ ├── prompts/ # System prompts and task prompts
│ └── config/
│ └── file-mappings-target.json # File mappings for validation
├── build/ # Generated outputs
├── run_validation.py # Validation script
└── requirements.txt # Python dependencies