Skip to content

Open sourcing ether0 rewards, prompts, data utilities#2

Merged
whitead merged 8 commits intomainfrom
open-sourcing
Jun 5, 2025
Merged

Open sourcing ether0 rewards, prompts, data utilities#2
whitead merged 8 commits intomainfrom
open-sourcing

Conversation

@jamesbraza
Copy link
Collaborator

Also includes:

  • Docs on use of our reasoning tokens
  • Code quality in the form of unit tests and passing mypy 🥳

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR open-sources the ether0 package and its remotes extension, adds comprehensive unit tests, and ensures type and lint checks pass via mypy and CI updates.

  • Introduce fixtures and unit tests covering clients, rewards, and core utilities.
  • Add retrying dataset loader, text‐validation helpers, prompt templates, data/models, chat formatting, and client/server implementations.
  • Update pyproject.toml and CI workflows to include new ether0.remotes package, extras, and environment setup.

Reviewed Changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/conftest.py Add ether0_test fixture to load the Hugging Face dataset.
src/ether0/utils.py New retry loader, invalid‐character/language checks.
src/ether0/problem_prompts.py Lots of new prompt template lists for problems.
src/ether0/models.py Data models, enums, and filtering logic added.
src/ether0/model_prompts.py XML‐based answer extraction and prompt enums.
src/ether0/data.py SMILES parsing, drawing helpers, and ring/fingerprint checks.
src/ether0/clients.py HTTPX‐based remote clients with retry logic.
src/ether0/chat.py Chat conversation formatting for SFT/RL.
pyproject.toml Pin dependencies, add ether0.remotes, mypy overrides.
packages/remotes/... New ether0.remotes server, client tests, docs, and packaging.
docs/updated_mistral_chat_template.jinja Jinja template for chat rendering with new tokens.
docs/adding_tokens.ipynb Notebook demonstrating how to add reasoning tokens.
Comments suppressed due to low confidence (4)

src/ether0/utils.py:48

  • Add a docstring to load_dataset_retrying summarizing the retry behavior, exceptions handled, and retry parameters, so future readers immediately understand why and how retries are configured.
def load_dataset_retrying(

src/ether0/data.py:72

  • The comment notes that counterion-containing SMILES currently fail. Add a unit test exercising that pattern (e.g., [Cl-] or multi-fragment SMILES) to catch regressions or document the limitation explicitly.
SMILES_PATTERN = re.compile(

src/ether0/clients.py:17

  • This global Counter tracks errors across calls and may not be thread-safe. Consider moving it into a thread-local or function-scoped context, or use an atomic structure if concurrency is expected.
SERVER_ERRORS_COUNTER = Counter({

docs/updated_mistral_chat_template.jinja:20

  • The raise_exception call isn’t a built-in Jinja directive. Replace it with the supported {% raise "message" %} directive or register a custom function so unmatched roles actually trigger an error at render time.
{{- raise_exception("Only user, system and assistant roles are supported!") }}

@jamesbraza jamesbraza force-pushed the open-sourcing branch 2 times, most recently from db4fa0b to 55da377 Compare June 5, 2025 00:10
@jamesbraza jamesbraza force-pushed the open-sourcing branch 2 times, most recently from 26298a8 to 008bda9 Compare June 5, 2025 00:48
@whitead whitead self-requested a review June 5, 2025 01:41
Copy link
Contributor

@whitead whitead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huge win!

@jamesbraza jamesbraza force-pushed the open-sourcing branch 2 times, most recently from dff6d07 to 7029c60 Compare June 5, 2025 05:26
@whitead whitead merged commit 7826884 into main Jun 5, 2025
3 checks passed
@whitead whitead deleted the open-sourcing branch June 5, 2025 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants