Derived from the unpythonic style guide, adapted for the Raven codebase. Documents actual patterns observed in the code.
Raven inherits unpythonic's governing principle — "find pythonic ways to do unpythonic things" — but sits at a different point on the spectrum. Where unpythonic is a language extension library with deep metaprogramming, Raven is an application project that uses unpythonic idioms pragmatically:
- Be correct. Handle edge cases. Report errors clearly.
- Be concise but readable. No code golf, but no unnecessary ceremony either.
- Closures over classes when the state is simple. Classes when the state or interface is complex.
- Keep it working. Raven is built quickly and pragmatically. Polish where it matters (architecture, user-facing behavior), tolerate roughness elsewhere.
- No macros. Raven uses
mcpyrateonly for itscolorizerutility. All logic is pure Python. - No currying.
unpythonic.curryis not used. Standard parameter ordering applies.
Modules follow a consistent layout:
"""Short module description.
Longer explanation where useful.
"""
__all__ = ["public_name1", "public_name2"]
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# stdlib imports
import collections
import threading
from typing import Callable, Dict, List, Optional
# third-party imports
import numpy as np
# unpythonic imports
from unpythonic import sym, box, unbox
from unpythonic.env import env
# internal imports (relative)
from ..common import bgtask
from ..common import utils as common_utils
from . import config as librarian_configKey points:
__all__is mandatory and placed immediately after the module docstring, before imports. Populated explicitly.- Logging setup immediately after
__all__. The three-linelogging.basicConfig/logger = ...pattern is standard. - Imports use
from ... import ...style (not bareimport ...), except for large namespaces likenumpy,torch,dearpygui, andjson. - Internal imports use relative paths (
.module,..module). - No star imports.
asrenaming is used sparingly and consistently:env as envcls(whenenvis also a parameter name),config as librarian_config(disambiguation),utils as common_utils/utils as guiutils(disambiguation).
For top-level app modules (visualizer/app.py, server/app.py, librarian/app.py), a heavier startup pattern is used:
logger.info(f"App-name version {__version__} starting.")
logger.info("Loading libraries...")
from unpythonic import timer
with timer() as tim:
import argparse
import threading
# ... all remaining imports ...
logger.info(f"Libraries loaded in {tim.dt:0.6g}s.")This wraps all imports in a timer block to measure and log startup time. Imports go inside the with block.
- Target ~100–500 SLOC per module (docstrings, comments, and blanks don't count).
- Rough upper bound ~800 lines total for library modules.
- App modules (
app.py) are currently larger —visualizer/app.pyis 4400+ lines and is the primary refactoring target. - The librarian component (~8000 lines across 10 modules) is the target architecture: clean layered design with each module at ~300–800 lines.
- Functions:
lowercase_with_underscores. - Classes:
PascalCase, including exception classes. - Module-internal symbols: single underscore prefix (
_update_annotation,_macrosteps_count). - "Constants": lowercase, following Lisp/unpythonic tradition. (Python's
SCREAMING_CASEis not used.) - Sentinel values:
sym("name")for human-readable sentinels:action_ack = sym("ack") action_stop = sym("stop") status_pending = sym("pending")
- Nonce objects:
gensym("label")when you need unique identity with readability. - Config modules: Module-level variables, lowercase, with detailed comments.
- DPG widget tags: String literals,
snake_case, commented with# tagon the same line for searchability.
reStructuredText format. Extensive for public API, pragmatic for internals:
def submit(self, function: Callable, env: env) -> Symbol:
"""Submit a new task.
`function`: callable, must take one positional argument.
`env`: `unpythonic.env.env`, passed to `function` as the only argument.
When `submit` returns, `env` will contain two new attributes:
`task_name`: str, unique name of the task, for use in log messages.
`cancelled`: bool. This flag signals task cancellation.
"""Patterns:
- One-line summary, then blank line, then details.
- Parameters documented inline with backtick-quoted names and indented descriptions.
- NOTE / CAUTION markers for gotchas.
- Reference external resources (URLs, other modules) directly in docstrings.
- Module docstrings list what the module contains and where it sits in the architecture.
- Having no docstring is better than having a placeholder — make the absence explicit.
Comments read like prose and explain why, not what. The style has personality:
# We do this as early as possible, because before the startup is complete,
# trying to `dpg.add_xxx` or `with dpg.xxx:` anything will segfault the app.
# But display at least one entry from each cluster.
if max_n is not None:
...import io # we occasionally need one of Jupiter's moonsRecognized comment markers:
# TODO:for known improvements, often with explanation of tradeoffs.# HACK:for acknowledged workarounds, with context on why.# tagon lines containing DPG widget tag string literals.# pragma: no coveralways accompanied by an explanation.
Major sections within a module are separated by:
# --------------------------------------------------------------------------------
# Section titleThis is used consistently throughout the codebase to visually group related functionality. A shorter variant without a title:
# ----------------------------------------is sometimes used for minor sub-sections within a major section.
- Line width: ~110 characters. Can locally go a few characters over for a more pleasing layout.
- No line breaks in URLs, even if over 110 characters. URLs must be copy-pasteable.
- Blank lines: Play the role of paragraph breaks in prose. Insert when the topic changes.
- One blank line after most function and class definitions.
- Two blank lines when the topic changes across a major boundary (before a horizontal separator, between classes).
- f-strings for all string formatting (not
%or.format()). - European punctuation: One space between full stop and next sentence.
- Timing values formatted with g-format:
f"{tim.dt:0.6g}s".
Parameters that need explanation are documented with backtick-quoted names:
def ai_turn(llm_settings: env,
datastore: chattree.Forest,
retriever: hybridir.HybridIR,
head_node_id: str,
...):
"""Run the AI's response turn.
`llm_settings`: Obtain this by calling `raven.librarian.llmclient.setup` at app start time.
`datastore`: The chat datastore.
`head_node_id`: Current HEAD node of the chat.
"""Type hints from typing should be used wherever they aid readability, on both public and internal functions. Common patterns:
from typing import Any, Callable, Dict, List, Optional, Tuple, Union
def create_node(self,
payload: Any = None,
parent_id: Optional[str] = None,
timestamp: Optional[int] = None) -> str:Arguments without a standard ordering, or flags, use keyword-only syntax:
def get_entries_for_selection(data_idxs, *, sort_field="title", max_n=None):Parameters prefixed with _ indicate internal use and should not be passed by normal callers:
def reset_undo_history(_update_gui=True):- Error messages report what was expected and what was actually received:
raise ValueError(f"Unknown mode '{mode}'; valid values: 'concurrent', 'sequential'.")
- EAFP (try/except) for performance-critical paths and thread-safety. Normal logic uses
if/elif/else. - Custom exceptions inherit from the most appropriate base.
- Logging of unexpected situations via
logger.error()/logger.warning()before raising.
State is captured in closure variables, not on objects, when the interface is simple:
def make_copy_entry_to_clipboard(item):
"""Closure factory: create a callback that copies `item` to clipboard."""
def copy_entry_to_clipboard():
... # uses `item` from enclosing scope
return copy_entry_to_clipboardThis pattern is ubiquitous for DPG button callbacks and event handlers.
unpythonic.call is used to limit the scope of temporary variables in script-style modules:
from unpythonic import call
@call
def _():
"""Set up some config that requires temporary computation."""
temp_value = expensive_computation()
global_state.setting = transform(temp_value)
# `temp_value` does not leak into module scopeenv from unpythonic replaces ad-hoc dictionaries and simple data classes:
from unpythonic.env import env
llm_settings = env(model="Qwen3-VL-30B-A3B",
backend_url="http://localhost:5000",
personas={"assistant": "Aria"})
# Access as attributes
print(llm_settings.model)Used throughout for passing related settings as a bundle. Particularly heavy in llmclient and scaffold.
When you need to replace an immutable value (like a numpy array) from inside a closure or across module boundaries:
from unpythonic import box, unbox
selection_data_idxs_box = box(make_blank_index_array())
# Read
current = unbox(selection_data_idxs_box)
# Write (replace contents)
selection_data_idxs_box << new_arrayHuman-readable sentinel values that are distinct from any data value:
from unpythonic import sym
action_continue = sym("continue")
action_done = sym("done")
status_pending = sym("pending")
status_running = sym("running")These compare by identity (is) and print readably.
gensym("label")— unique identifiers with readable names (e.g. for tree node IDs)timer()— benchmarking context manager (startup timing, pipeline stages)partition(pred, iterable)— split iterable by predicateETAEstimator— progress tracking in long-running pipelinesflatten— flatten nested iterablesmemoize— function result cachingdyn(dynamic variables) — implicit parameter passing through call chains (used inimporter.pyfor status callbacks)Values— multiple named return valuesislice— lazy slicingwindow— sliding window over iterables
OOP is used when the state or interface demands it:
- Data structures:
Forest,PersistentForest(tree storage with persistence) - Infrastructure:
TaskManager(background task scheduling),HybridIR(search index) - GUI components:
DPGChatController,DPGChatMessage,Animator,Animation - Server-side AI modules: Each module in
raven/server/modules/follows a consistent pattern withinit_module(),is_available(), and task-specific functions.
class TaskManager:
def __init__(self, name: str, mode: str, executor: concurrent.futures.Executor):
"""..."""
self.name = name
self.mode = mode
self.executor = executor
self.tasks = {}
self.lock = threading.RLock()__repr__/__str__implemented for debugging where useful.- ABCs and metaclasses used only when needed, with detailed comments explaining why.
Configuration uses Python modules (config-as-code), not YAML/JSON:
# raven/visualizer/config.py
vis_method = "tsne" # good quality, fast (recommended)
extract_keywords = True
clusters_keyword_method = "frequencies"
# clusters_keyword_method = "llm"Patterns:
- Module-level variables with descriptive comments.
- Commented-out alternatives show available options.
devicesdicts map task names to hardware settings (device string, dtype).- Config imports flow downward:
raven.config(global) → component configs (librarian.config,visualizer.config) → modules. - A shorthand alias is common:
gui_config = librarian_config.gui_config. - Prompt templates use
textwrap.dedent("""...""").strip().
All shared mutable state uses threading.RLock():
self.lock = threading.RLock()
def some_operation(self):
with self.lock:
...RLock (reentrant) is preferred over Lock to allow the same thread to enter nested critical sections.
For caches and registries:
try:
return self._cache[key]
except KeyError:
with self._lock:
if key not in self._cache:
self._cache[key] = compute(key)
return self._cache[key]Both the tooltip and info panel build new content in a hidden DPG group, then swap atomically:
- Create new content in a hidden group (background thread)
- Acquire content lock
- Hide old group, show new group
dpg.split_frame()(wait for DPG to render)- Delete old group
- Release lock
Each build gets a unique build number (appended to DPG tags as _buildN) for uniqueness.
Background tasks monitor a cancelled flag set by the task manager:
def my_background_work(task_env):
for item in items:
if task_env.cancelled:
return
process(item)The standard pattern for background work in GUI apps:
from ..common import bgtask
executor = concurrent.futures.ThreadPoolExecutor() # default: number of CPU cores
# "sequential" mode: new task cancels previous one (for GUI updates)
info_panel_task_manager = bgtask.TaskManager("info_panel", mode="sequential", executor=executor)
# "concurrent" mode: tasks run independently
indexing_task_manager = bgtask.TaskManager("indexing", mode="concurrent", executor=executor)Tasks are submitted with an env that receives task_name and cancelled attributes:
task_env = env(data=my_data, callback=my_callback)
info_panel_task_manager.submit(update_info_panel_worker, task_env)High-level operations take optional callbacks for progress reporting:
def ai_turn(llm_settings, datastore, ...,
on_docs_start=None, on_docs_done=None,
on_llm_start=None, on_llm_progress=None, on_llm_done=None,
on_tools_start=None, on_tools_done=None,
on_nomatch_done=None,
on_prompt_ready=None):The controller passes closures that update GUI state. This keeps the orchestration layer GUI-agnostic.
All widget tags are string literals (not integer IDs), using snake_case:
dpg.add_button(label="Undo", tag="selection_undo_button") # tagThe # tag comment marks lines containing widget tag references for searchability.
DPG's container stack is global and not thread-safe. Background threads must always use explicit parent=:
# Good: explicit parent, safe from any thread
dpg.add_text("hello", parent=my_group)
# Bad: uses implicit container stack, not thread-safe
with dpg.group():
dpg.add_text("hello")The with block style is fine in the main thread during GUI setup.
Since DPG button callbacks can't receive custom arguments, closure factories are used:
def make_select_cluster(cluster_id):
def select_cluster():
update_selection(get_data_idxs_for_cluster(cluster_id), mode="replace")
return select_cluster
# In GUI setup:
dpg.add_button(label=f"Select #{cid}", callback=make_select_cluster(cid))DPG widgets store metadata in their user_data field as (kind, data) tuples:
dpg.add_group(user_data=("entry_title_container", data_idx), parent=...)Predicate functions check the kind for O(log n) lookups:
def is_entry_title_container_group(item):
ud = dpg.get_item_user_data(item)
return ud is not None and ud[0] == "entry_title_container"Dependencies flow strictly downward through layers:
Layer 5 - Applications: app.py
Layer 4 - Controller: chat_controller.py
Layer 3 - Orchestration: scaffold.py
Layer 2 - Backends: llmclient.py, hybridir.py
Layer 1 - Utilities: chatutil.py, appstate.py
Layer 0 - Foundation: config.py, chattree.py
Each layer only imports from layers below it. No circular dependencies. This pattern (demonstrated in raven/librarian/) is the target architecture for all components.
All ML inference runs in raven/server/modules/. Client apps call the server via raven/client/api.py. Local fallback is available via raven/client/mayberemote.py when the server is not running.
Tests use pytest and live in tests/ subdirectories within each component:
# raven/librarian/tests/test_chattree.py
import pytest
from raven.librarian.chattree import Forest, PersistentForest
@pytest.fixture
def forest():
return Forest()
@pytest.fixture
def chain(forest):
"""A -> B -> C linear chain."""
a = forest.create_node(payload="A")
b = forest.create_node(payload="B", parent_id=a)
c = forest.create_node(payload="C", parent_id=b)
return forest, a, b, c
class TestCreateNode:
def test_create_root_node(self, forest):
node_id = forest.create_node(payload="root")
assert forest.nodes[node_id]["parent"] is None
def test_create_child_node(self, forest):
parent_id = forest.create_node(payload="parent")
child_id = forest.create_node(payload="child", parent_id=parent_id)
assert forest.nodes[child_id]["parent"] == parent_idPatterns:
- Fixtures for common setups (bare forest, linear chain, branching tree).
- Test classes group related tests by feature area.
- Tests use the public API, not internal state (except for verification assertions).
pytest.raisesfor expected exceptions;pytest.mark.xfail(strict=True)for known bugs.- Test file naming:
test_<module_name>.py.
Raven has many dependencies (ML frameworks, GUI toolkit, web server, etc.) — it's an application, not a library. However:
- Don't add dependencies without a reason. Prefer stdlib when reasonable.
unpythonicis a core dependency used throughout.mcpyrateis used only for itscolorizerutility (terminal colors). No macros.- Heavy ML dependencies (
torch,transformers,sentence-transformers,spacy) are confined to specific modules. - Vendored dependencies live in
raven/vendor/with attribution and modification notes.