- AI-First Context: Deliver precise codebase context for agent requests
- Simple AI Interface: Agents don't need to create requests that fit complex data structures. Tool calls should be in plain language with no more than one required parameter.
- Transparency: Clear, understandable processes and results for developers
- Simple Modularity: Extensible yet intuitive - purpose should be obvious
- Ecosystem Alignment: Leverage pydantic ecosystem (
pydantic,pydantic-settings,pydantic-ai,FastMCP(uses pydantic)) over reinvention - Proven Patterns: Follow established abstractions - simple, powerful, well-validated interfaces
Why: Familiar patterns reduce learning curve, plus it's clean and effective.
Study pydantic ecosystem patterns: pydantic, FastAPI, FastMCP, pydantic-ai.
- Flexible Generics: Few broadly-defined models/protocols/ABCs for wide reuse
- Smart Decorators: Extend class/function roles cleanly (think: '
@pydantic.after_validatoror basically all ofFastAPI) - Dependency Injection: Explicit dependencies, organized pipelines
- Flat Structure: Group closely related modules in packages (e.g. all providers), otherwise keep root-level
- Core types in
__init__.pyif a subpackage (not root init.py) - Foundations in private mirror modules (
chunking.py←_chunking.py) - Extensive internal logic in an
_internalssubpackage.
- Core types in
- Types as Functionality: Types are the behavior, not separate from it
- Docstrings: Google convention, plain language, active voice, present tense
- Start with verbs: "Adds numbers" not "This function adds numbers"
- But not exacting: Don't waste space explaining the obvious. We have strong typing that makes args/returns clear. A brief sentence may be enough.
- Line length: 100 characters
- Auto-formatting: Ruff configuration enabled
- Python typing: Modern (≥3.12) -
typing.Self,typing.Literal, piped unions (int | str), constructors as types (list[str]),typekeyword fortyping.TypeAliasType.
Why: Performance + memory efficiency + fewer debugging headaches
- Sequences: Use
Generator/AsyncGenerator,tuple/NamedTupleover lists. Usefrozensetfor set-like objects - Dicts: Read-only dicts use
types.MappingProxyType - Models: Use
frozen=Truefor dataclasses/models set at instantiation- Need modifications? Create new instances rather than mutating 1
- Need computed properties? Use factory functions or classmethods
Why: Maintainability, self-documentation, easier debugging. Get it right upfront.
- Strict typing with opinionated
tyrules - Structured data: Use
TypedDict,Protocol,NamedTuple,enum.Enum,typing_extensions.TypeIs(similar to typing.TypeGuard but more flexible, typing.TypeGuard also OK)- Use the project's derivative for these:
dataclass->pydantic.dataclasses.dataclassandcodeweaver.core.types.models.DataclassSerializationMixinpydantic.BaseModel->codeweaver.core.types.models.BasedModelpydantic.ConfigDict->enum.Enum->codeweaver.core.types.enum.BaseEnum
- Use the project's derivative for these:
- Define structures: Don't be lazy - use
TypedDict,NamedTuple,dataclassorBasedModelto define structured data. Only use vague/generic types likedict[str, Any]when the types/structure are truly unknown or have many possibilities. As a rule: if you know what the keys to an object will be, then define it as a class.- Complex objects:
dataclassorBaseModeldescendants - New pattern for complex member-like objects: dataclass enums where each member has a dataclass value -- current plan is to replace most of the complex string enums with this pattern. See
codeweaver.semantic.classification.AgentTaskfor an implementation example. - Simple objects:
NamedTupleif the object would benefit from methods or will be nested;TypedDictotherwise.
- Complex objects:
- Generics: Define proper generic types/protocols/guards
- Use newer python parameterized generics syntax:
class SomeClass[SomeGeneric]:-- don't usetyping.Generic - Use newer
typekeyword for aliases:type MyAlias = tuple[Literal["like this"]]notTypeAlias
- Use newer python parameterized generics syntax:
- Avoid string literals: For most cases, favor
enum.Enum(usingBaseEnum) overtyping.Literal(CLI:cycloptshandles enum parsing).- Exception: If the type will only be used once in one small section, and there are only 1-3 valid values,
Literalis OK, but must be typed withLiteral - Keep properties with their objects. Use
enummethods to keep logic related to members with the class. You shouldn't add/define properties or attributes for members elsewhere (seecodeweaver.language.SemanticSearchLanguagefor an extreme example).
- Exception: If the type will only be used once in one small section, and there are only 1-3 valid values,
from typing import Annotated
from pydantic import ConfigDict, Field
from codeweaver.core.types import BasedModel
class MyModel(BasedModel):
model_config = ConfigDict(extra="allow")
name: Annotated[str, Field(description="Item name")]
value: Annotated[int, Field(ge=0, description="Non-negative value")] = 0- Use
ConfigDictfor configuration (extra="allow"for plugins,extra="forbid"for strict) - Prefer domain-specific subclasses (codeweaver's
BasedModelfor most,BaseNodefor pydantic graph) over rawBaseModel
- No f-strings in log statements: Use
%sformatting orextra={"key": value} - No print statements: Use logging in production
- Use logging.warning for most errors: Using logging.warning allows CodeWeaver's UI to handle displaying the error based on severity and user configuration (like the
--verboseflag).logging.exceptionbypasses the UI handler.
-
Specify exception types: No bare
except: -
Use
elsefor returns: Aftertryblocks for clarity -
Use
raise from: Maintain exception context -
Use
contextlib.suppress: For intentional exception suppression (not try-except-pass/continue) -
If raising an exception, raise to a specific codeweaver exception (
codeweaver.exceptions)
- Type all call arguments/parameters and returns: Including
-> None - Boolean kwargs only: Use
*separator for boolean parameters (booleans should not be positional arguments)
def my_function(arg1: str, *, flag: bool = False) -> None:
passFollow Google Python Style Guide. See auto-fix script.
- Use
|syntax, not tuple:isinstance(value, str | int | MyClass)(not(str, int, MyClass)) (yes, it really is valid python, I promise.)
Effectiveness over coverage. We prioritize meaningful tests over metrics.
- Doesn't improve user experience
- Doesn't prevent important bugs
- Tests implementation details, not behavior
- Creates barriers to innovation and change ("ugh, we have to update all those tests...")
- Wastes time maintaining low-value tests
- Critical behavior affecting user experience
- Realistic integration scenarios
- Input/output validation for important functions
One solid, realistic test is better than 10 implementation detail tests
Integration testing > unit testing for most cases.
We have a long (probably too long) list of pytest marks that allow us to run granular tests. If you write a new test, review the list, and apply all applicable marks.
Footnotes
-
But be reasonable. If you really need to make a lot of incremental updates to an object, then use a mutable type -- it's a guideline, not a rule. ↩