|
| 1 | +Additional context on this repository for AI tools & coding agents |
| 2 | + |
| 3 | +- Python 3.12+ code, unless otherwise specified |
| 4 | +- Python code uses single outer quotes, including triple single quotes for e.g. docstrings |
| 5 | +- prefer absolute imports to relative imports |
| 6 | +- Use a decent amount of comments |
| 7 | + - not *too* many, just enough that anybody familiar with the code can use them as a reference point. Not meant to teach somebody new every intricacy of the code, just help keep the reader oriented. |
| 8 | +- if it saves a line, put a comment after a line rather than above it |
| 9 | + - use the standard two spaces before the comment character, eg. `CODE # COMMENT` |
| 10 | +- Try to stick to 120 characters per line |
| 11 | + - if one of those comments would break this guideline, just put that comment above the line instead, as is standard convention |
| 12 | +- If there is a pyproject.toml in place, use it as a reference for builds, installs, etc. The basic packaging and dev preference, including if you have to supply your own pyproject.toml, is as follows: |
| 13 | + - Use pyproject.toml with hatchling, not e.g. setup.py |
| 14 | + - Reusable Python code modules are developed in the `pylib` folder, and installed using e.g. `uv pip install -U .`, which includes proper mapping to Python library package namespace via `tool.hatch.build.sources`. The `__init__.py` and other modules in the top-level package go directly in `pylib`, though submodules can use subdirectories, e.g. `pylib/a/b` becomes `installed_library_name.a.b`. Ultimately this will mean the installed package is importable as `from installed_library_name.etc import …` |
| 15 | + - Yes this means editable and "dev mode" environments are NOT desirable, nor are shenanigans adding pylib to `sys.path`. Layer-efficient dockerization is an option if that's needed. |
| 16 | + - The ethos is to always develop keeping things properly installable. No dev mode shortcuts |
| 17 | + - Prefer hatchling build system over setuptools, poetry, etc. Avoid setuptools as much as possible. Use `[tool.hatch.build.sources]` to map source directories to package namespaces (e.g., `"pylib" = "installed_library_name"`). |
| 18 | + - Use `[tool.hatch.build.targets.wheel]` with `only-include = ["pylib"]` to ensure the pylib directory structure gets included properly in the wheel, avoiding the duplication issue that can occur with sources mapping |
| 19 | +- **Debugging package issues**: When modules aren't importing correctly after installation, check: |
| 20 | + - That you are in the correct virtualenv (you may have to ask the developer) |
| 21 | + - Package structure in site-packages (e.g., `ls -la /path/to/site-packages/package_name/`) |
| 22 | +- Use uv, but pay attention to the above |
| 23 | + - Again always use `uv pip install -U .` for full installation, never editable installs (`pip install -e`). This ensures proper testing of the actual distribution. |
| 24 | +- Use async (e.g. asyncio) wherever it makes sense. Avoid multithreading, though multiprocessing is OK. Multiprocess for CPU-bound concurrency, and asyncIO for I/O bound, cooperative etc. |
| 25 | +- Be pythonic. Avoid e.g. complex abstract class hierarchies for the sake of them, though classes are also fine in many usage patterns. We love dictionaries, dynamic dispatch, etc. |
| 26 | + - I don't consider Pydantic very Pythonic, so we can tolerate it if need be (e.g. we're using a toolkit that strictly works with Pydantic), but otherwise, simple dataclasses are better. |
| 27 | +- Type hints are OK in moderation, but avoid absolutely littering the code with them. |
| 28 | + - No excess imports & symbols, e.g. Use type | None rather than Optional[type] |
| 29 | +- use iterator patterns as much as practical. Also functional programming approaches, including partials (currying) and decorators |
| 30 | +- Prefereed tools: |
| 31 | + - Logging: structlog |
| 32 | + - Retries on failure: tenacity |
| 33 | + - CLI argument processing: fire—avoid argparse except for truly trivial usage |
| 34 | + - CLI formatting: rich |
| 35 | + - HTTP client: httpx (async) |
| 36 | + - HTML/XML parsing: selectolax (though for now we're using html5-modern as the base implementation for our html5 features) |
| 37 | + - Browser-like Web crawling/scraping: Python playwright (with playwright_stealth if needed) |
| 38 | + - pytest, as well as pytest-mock, pytest-httpx, pytest-asyncio |
| 39 | + - rapidfuzz for fuzzy text matching |
| 40 | +- AVOID the following unless explicitly requested or otherwise unavoidable: |
| 41 | + - langchain |
| 42 | + |
| 43 | +- Once again PREFER SINGLE QUOTES |
0 commit comments