Skip to content

Latest commit

 

History

History
155 lines (108 loc) · 8.22 KB

File metadata and controls

155 lines (108 loc) · 8.22 KB

Code Style Guide

Design Principles

  1. AI-First Context: Deliver precise codebase context for agent requests
  2. Simple AI Interface: Agents don't need to create requests that fit complex data structures. Tool calls should be in plain language with no more than one required parameter.
  3. Transparency: Clear, understandable processes and results for developers
  4. Simple Modularity: Extensible yet intuitive - purpose should be obvious
  5. Ecosystem Alignment: Leverage pydantic ecosystem (pydantic, pydantic-settings, pydantic-ai, FastMCP (uses pydantic)) over reinvention
  6. Proven Patterns: Follow established abstractions - simple, powerful, well-validated interfaces

Pydantic Architecture

Why: Familiar patterns reduce learning curve, plus it's clean and effective.

Study pydantic ecosystem patterns: pydantic, FastAPI, FastMCP, pydantic-ai.

Key Patterns

  • Flexible Generics: Few broadly-defined models/protocols/ABCs for wide reuse
  • Smart Decorators: Extend class/function roles cleanly (think: '@pydantic.after_validator or basically all of FastAPI)
  • Dependency Injection: Explicit dependencies, organized pipelines
  • Flat Structure: Group closely related modules in packages (e.g. all providers), otherwise keep root-level
    • Core types in __init__.py if a subpackage (not root init.py)
    • Foundations in private mirror modules (chunking.py_chunking.py)
    • Extensive internal logic in an _internals subpackage.
  • Types as Functionality: Types are the behavior, not separate from it

Style Standards

  • Docstrings: Google convention, plain language, active voice, present tense
    • Start with verbs: "Adds numbers" not "This function adds numbers"
    • But not exacting: Don't waste space explaining the obvious. We have strong typing that makes args/returns clear. A brief sentence may be enough.
  • Line length: 100 characters
  • Auto-formatting: Ruff configuration enabled
  • Python typing: Modern (≥3.12) - typing.Self, typing.Literal, piped unions (int | str), constructors as types (list[str]), type keyword for typing.TypeAliasType.

Lazy Evaluation & Immutability

Why: Performance + memory efficiency + fewer debugging headaches

  • Sequences: Use Generator/AsyncGenerator, tuple/NamedTuple over lists. Use frozenset for set-like objects
  • Dicts: Read-only dicts use types.MappingProxyType
  • Models: Use frozen=True for dataclasses/models set at instantiation
    • Need modifications? Create new instances rather than mutating 1
    • Need computed properties? Use factory functions or classmethods

Typing Guidelines 🏴‍☠️

Why: Maintainability, self-documentation, easier debugging. Get it right upfront.

  • Strict typing with opinionated ty rules
  • Structured data: Use TypedDict, Protocol, NamedTuple, enum.Enum, typing_extensions.TypeIs (similar to typing.TypeGuard but more flexible, typing.TypeGuard also OK)
    • Use the project's derivative for these:
      • dataclass -> pydantic.dataclasses.dataclass and codeweaver.core.types.models.DataclassSerializationMixin
      • pydantic.BaseModel -> codeweaver.core.types.models.BasedModel
      • pydantic.ConfigDict ->
      • enum.Enum -> codeweaver.core.types.enum.BaseEnum
  • Define structures: Don't be lazy - use TypedDict, NamedTuple, dataclass or BasedModel to define structured data. Only use vague/generic types like dict[str, Any] when the types/structure are truly unknown or have many possibilities. As a rule: if you know what the keys to an object will be, then define it as a class.
    • Complex objects: dataclass or BaseModel descendants
    • New pattern for complex member-like objects: dataclass enums where each member has a dataclass value -- current plan is to replace most of the complex string enums with this pattern. See codeweaver.semantic.classification.AgentTask for an implementation example.
    • Simple objects: NamedTuple if the object would benefit from methods or will be nested; TypedDict otherwise.
  • Generics: Define proper generic types/protocols/guards
    • Use newer python parameterized generics syntax: class SomeClass[SomeGeneric]: -- don't use typing.Generic
    • Use newer type keyword for aliases: type MyAlias = tuple[Literal["like this"]] not TypeAlias
  • Avoid string literals: For most cases, favor enum.Enum (using BaseEnum) over typing.Literal (CLI: cyclopts handles enum parsing).
    • Exception: If the type will only be used once in one small section, and there are only 1-3 valid values, Literal is OK, but must be typed with Literal
    • Keep properties with their objects. Use enum methods to keep logic related to members with the class. You shouldn't add/define properties or attributes for members elsewhere (see codeweaver.language.SemanticSearchLanguage for an extreme example).

Pydantic Models

from typing import Annotated
from pydantic import ConfigDict, Field

from codeweaver.core.types import BasedModel

class MyModel(BasedModel):
    model_config = ConfigDict(extra="allow")

    name: Annotated[str, Field(description="Item name")]
    value: Annotated[int, Field(ge=0, description="Non-negative value")] = 0
  • Use ConfigDict for configuration (extra="allow" for plugins, extra="forbid" for strict)
  • Prefer domain-specific subclasses (codeweaver's BasedModel for most, BaseNode for pydantic graph) over raw BaseModel

Common Linting Issues

Logging

  • No f-strings in log statements: Use %s formatting or extra={"key": value}
  • No print statements: Use logging in production
  • Use logging.warning for most errors: Using logging.warning allows CodeWeaver's UI to handle displaying the error based on severity and user configuration (like the --verbose flag). logging.exception bypasses the UI handler.

Exception Handling

  • Specify exception types: No bare except:

  • Use else for returns: After try blocks for clarity

  • Use raise from: Maintain exception context

  • Use contextlib.suppress: For intentional exception suppression (not try-except-pass/continue)

  • If raising an exception, raise to a specific codeweaver exception (codeweaver.exceptions)

Functions

  • Type all call arguments/parameters and returns: Including -> None
  • Boolean kwargs only: Use * separator for boolean parameters (booleans should not be positional arguments)
def my_function(arg1: str, *, flag: bool = False) -> None:
    pass

Follow Google Python Style Guide. See auto-fix script.

IsInstance

  • Use | syntax, not tuple: isinstance(value, str | int | MyClass) (not (str, int, MyClass)) (yes, it really is valid python, I promise.)

Testing Philosophy

Effectiveness over coverage. We prioritize meaningful tests over metrics.

Why Not 100% Coverage?

  • Doesn't improve user experience
  • Doesn't prevent important bugs
  • Tests implementation details, not behavior
  • Creates barriers to innovation and change ("ugh, we have to update all those tests...")
  • Wastes time maintaining low-value tests

Focus Instead On

  • Critical behavior affecting user experience
  • Realistic integration scenarios
  • Input/output validation for important functions

One solid, realistic test is better than 10 implementation detail tests

Integration testing > unit testing for most cases.

Make Sure to Include Appropriate Marks for New Tests

We have a long (probably too long) list of pytest marks that allow us to run granular tests. If you write a new test, review the list, and apply all applicable marks.

Footnotes

  1. But be reasonable. If you really need to make a lot of incremental updates to an object, then use a mutable type -- it's a guideline, not a rule.