-
Notifications
You must be signed in to change notification settings - Fork 1
Co t #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… training - Implemented streaming tests for reasoning models in `reasoning_streaming_test.go`. - Developed comprehensive reasoning tests in `reasoning_test.go` to validate logical thinking prompts. - Created a streaming API in `streaming.go` to handle reasoning content in responses. - Introduced utility functions in `reasoning.go` and `think_tags.go` for managing reasoning content and tags. - Added tests for utility functions to ensure correct handling of think tags. - Developed `DualDatasetWriter` in `dual_dataset.go` to write both regular and reasoning-aware datasets simultaneously. - Enhanced error handling and logging throughout the new implementations.
…upport and increase HTTP timeout settings for long-form generation
…r checkpoint resume
- Add use_streaming config option to ModelConfig - Integrate ChatCompletionStreaming into orchestrator worker for chosen/rejected generation - Apply HTTP timeout settings to streaming requests - Update example config with streaming documentation - Enable streaming in config.sft.toml for long-form generation Streaming keeps connection alive with data flowing, bypassing 504 gateway timeouts that occur with long-running requests (16k+ tokens taking 5-10+ minutes).
- Add Writer interface for unified dataset writer abstraction - Update DatasetWriter and DualDatasetWriter to implement Writer interface - Integrate DualDatasetWriter into orchestrator based on enable_reasoning_capture config - Capture ReasoningContent from API responses (both chosen and rejected) - Pass reasoning content through to write methods for dual dataset output - Update config.sft.toml to enable dual dataset mode with streaming - Document dual dataset feature in example configs Dual dataset mode generates two files simultaneously: 1. dataset.jsonl - Regular responses (content only) 2. dataset_reasoning.jsonl - Responses with <think> tags containing reasoning Reasoning capture requires streaming mode (use_streaming=true) to access reasoning_content field from reasoning models like Kimi-K2-Thinking.
…esence The worker was incorrectly trying to use rejected model in SFT mode if it was configured, even though SFT only needs chosen responses. Added explicit check for DatasetModeSFT to skip rejected model generation. This fixes a regression where SFT mode would try to connect to localhost:8080 for rejected responses that weren't needed.
…ed development workflow
- Updated .gitignore to include environment files for different providers. - Simplified .golangci.yml by removing unused linters and settings. - Added functions to validate API response completeness in validation.go. - Integrated validation checks into worker.go to handle incomplete outputs and finish reasons. - Created example environment files for chutes and nahcrof APIs. - Generated detailed stress test results in RESULTS.md, highlighting performance comparisons between nahcrof and chutes. - Expanded TEST_README.md with comprehensive setup instructions and analysis of test results. - Implemented analyze_results.py to process and summarize test outcomes. - Developed run_test.sh script to automate testing and analysis for both API providers.
…sion and rejected response regeneration
…nd rejected response regeneration with checkpoint support
…rm struct formatting
…ntegrated with progress bars during transformation and a dedicated cleanup script.
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7 issues found across 41 files
Confidence score: 3/5
- There are a couple of concrete user-impacting risks: resume behavior in
internal/dataset/transform.gocan silently create new outputs and skip completed jobs, leading to incomplete datasets. internal/config/loader.goappears to override an explicithttp_timeout_seconds = 0, which is a functional regression for users who rely on disabling timeouts.- Overall this is a sizable set of behavioral concerns, so risk is moderate despite no critical blockers called out.
- Pay close attention to
internal/dataset/transform.go,internal/config/loader.go,tests/api-reliability-test/run_test.sh- resume correctness, timeout semantics, and test failure propagation.
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="tests/api-reliability-test/run_test.sh">
<violation number="1" location="tests/api-reliability-test/run_test.sh:2">
P2: `set -e` doesn’t catch failures inside pipelines; the `timeout ... | tee` command can fail without stopping the script. Enable `pipefail` (and optionally `-u`) so test failures propagate correctly.</violation>
</file>
<file name="internal/dataset/transform.go">
<violation number="1" location="internal/dataset/transform.go:197">
P2: Resuming should not create a new output dataset when the original file is missing; it will skip completed jobs from the checkpoint and produce an incomplete dataset. Fail fast when the resume output file does not exist.</violation>
<violation number="2" location="internal/dataset/transform.go:438">
P2: When resuming reasoning output, silently creating a new file will cause the checkpoint to skip already completed jobs and produce an incomplete reasoning dataset. Resume should error if the reasoning output file is missing.</violation>
</file>
<file name="tests/api-reliability-test/analyze_results.py">
<violation number="1" location="tests/api-reliability-test/analyze_results.py:62">
P2: Avoid a bare except when parsing dataset lines; it hides unexpected errors and can mask incorrect analysis results. Catch the expected JSON decode failure explicitly.</violation>
</file>
<file name="internal/config/loader.go">
<violation number="1" location="internal/config/loader.go:103">
P2: This default overrides an explicit `http_timeout_seconds = 0`, even though the config docs specify that 0 disables timeouts. As written, users can no longer set "no timeout" behavior, which is a functional regression for configs relying on that setting.</violation>
</file>
<file name="internal/api/reasoning_stream_test.go">
<violation number="1" location="internal/api/reasoning_stream_test.go:81">
P2: Increase the scanner buffer size for streaming responses so large SSE chunks don't trigger `bufio.ErrTooLong`.</violation>
</file>
<file name="internal/api/streaming.go">
<violation number="1" location="internal/api/streaming.go:102">
P3: The marshal → unmarshal → add key → re-marshal round-trip to inject `"stream": true` is inefficient and fragile. Consider adding a `Stream bool` field (with `json:"stream,omitempty"`) to `ChatCompletionRequest` instead, which avoids the double-serialization overhead and potential floating-point representation issues.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| @@ -0,0 +1,92 @@ | |||
| #!/bin/bash | |||
| set -e | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: set -e doesn’t catch failures inside pipelines; the timeout ... | tee command can fail without stopping the script. Enable pipefail (and optionally -u) so test failures propagate correctly.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At tests/api-reliability-test/run_test.sh, line 2:
<comment>`set -e` doesn’t catch failures inside pipelines; the `timeout ... | tee` command can fail without stopping the script. Enable `pipefail` (and optionally `-u`) so test failures propagate correctly.</comment>
<file context>
@@ -0,0 +1,92 @@
+#!/bin/bash
+set -e
+
+# API Reliability Stress Test Runner
</file context>
|
|
||
| if opts.OutputReasoningPath != "" { | ||
| if opts.Resume { | ||
| reasoningFile, err = os.OpenFile(opts.OutputReasoningPath, os.O_APPEND|os.O_WRONLY, 0o644) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: When resuming reasoning output, silently creating a new file will cause the checkpoint to skip already completed jobs and produce an incomplete reasoning dataset. Resume should error if the reasoning output file is missing.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/dataset/transform.go, line 438:
<comment>When resuming reasoning output, silently creating a new file will cause the checkpoint to skip already completed jobs and produce an incomplete reasoning dataset. Resume should error if the reasoning output file is missing.</comment>
<file context>
@@ -0,0 +1,957 @@
+
+ if opts.OutputReasoningPath != "" {
+ if opts.Resume {
+ reasoningFile, err = os.OpenFile(opts.OutputReasoningPath, os.O_APPEND|os.O_WRONLY, 0o644)
+ if err != nil {
+ if os.IsNotExist(err) {
</file context>
| // Open output file (append on resume, create otherwise). | ||
| var outputFile *os.File | ||
| if opts.Resume { | ||
| outputFile, err = os.OpenFile(opts.OutputPath, os.O_APPEND|os.O_WRONLY, 0o644) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Resuming should not create a new output dataset when the original file is missing; it will skip completed jobs from the checkpoint and produce an incomplete dataset. Fail fast when the resume output file does not exist.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/dataset/transform.go, line 197:
<comment>Resuming should not create a new output dataset when the original file is missing; it will skip completed jobs from the checkpoint and produce an incomplete dataset. Fail fast when the resume output file does not exist.</comment>
<file context>
@@ -0,0 +1,957 @@
+ // Open output file (append on resume, create otherwise).
+ var outputFile *os.File
+ if opts.Resume {
+ outputFile, err = os.OpenFile(opts.OutputPath, os.O_APPEND|os.O_WRONLY, 0o644)
+ if err != nil {
+ if os.IsNotExist(err) {
</file context>
| 'length': len(output_text), | ||
| 'ending': output_text[-60:] | ||
| }) | ||
| except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Avoid a bare except when parsing dataset lines; it hides unexpected errors and can mask incorrect analysis results. Catch the expected JSON decode failure explicitly.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At tests/api-reliability-test/analyze_results.py, line 62:
<comment>Avoid a bare except when parsing dataset lines; it hides unexpected errors and can mask incorrect analysis results. Catch the expected JSON decode failure explicitly.</comment>
<file context>
@@ -0,0 +1,430 @@
+ 'length': len(output_text),
+ 'ending': output_text[-60:]
+ })
+ except:
+ pass
+
</file context>
| } | ||
| // Default HTTP timeout: 120 seconds | ||
| // For very long-form generation (>4000 tokens), set this higher (e.g., 600-1200) | ||
| if model.HTTPTimeoutSeconds == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: This default overrides an explicit http_timeout_seconds = 0, even though the config docs specify that 0 disables timeouts. As written, users can no longer set "no timeout" behavior, which is a functional regression for configs relying on that setting.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/config/loader.go, line 103:
<comment>This default overrides an explicit `http_timeout_seconds = 0`, even though the config docs specify that 0 disables timeouts. As written, users can no longer set "no timeout" behavior, which is a functional regression for configs relying on that setting.</comment>
<file context>
@@ -93,6 +98,11 @@ func applyDefaults(cfg *Config) {
}
+ // Default HTTP timeout: 120 seconds
+ // For very long-form generation (>4000 tokens), set this higher (e.g., 600-1200)
+ if model.HTTPTimeoutSeconds == 0 {
+ model.HTTPTimeoutSeconds = 120 // 2 minutes default
+ }
</file context>
| t.Logf("Response status: %d", resp.StatusCode) | ||
|
|
||
| // Read streaming response | ||
| scanner := bufio.NewScanner(resp.Body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Increase the scanner buffer size for streaming responses so large SSE chunks don't trigger bufio.ErrTooLong.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/api/reasoning_stream_test.go, line 81:
<comment>Increase the scanner buffer size for streaming responses so large SSE chunks don't trigger `bufio.ErrTooLong`.</comment>
<file context>
@@ -0,0 +1,300 @@
+ t.Logf("Response status: %d", resp.StatusCode)
+
+ // Read streaming response
+ scanner := bufio.NewScanner(resp.Body)
+ chunkCount := 0
+ hasReasoningContent := false
</file context>
| } | ||
|
|
||
| // Add stream parameter (not part of ChatCompletionRequest struct) | ||
| reqMap := make(map[string]interface{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P3: The marshal → unmarshal → add key → re-marshal round-trip to inject "stream": true is inefficient and fragile. Consider adding a Stream bool field (with json:"stream,omitempty") to ChatCompletionRequest instead, which avoids the double-serialization overhead and potential floating-point representation issues.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/api/streaming.go, line 102:
<comment>The marshal → unmarshal → add key → re-marshal round-trip to inject `"stream": true` is inefficient and fragile. Consider adding a `Stream bool` field (with `json:"stream,omitempty"`) to `ChatCompletionRequest` instead, which avoids the double-serialization overhead and potential floating-point representation issues.</comment>
<file context>
@@ -0,0 +1,372 @@
+ }
+
+ // Add stream parameter (not part of ChatCompletionRequest struct)
+ reqMap := make(map[string]interface{})
+ reqBytes, err := json.Marshal(req)
+ if err != nil {
</file context>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR expands vellumforge2 with (1) streaming + reasoning capture support (dual dataset outputs), and (2) an offline dataset transform command for SFT→DPO conversion / rejected-response regeneration, alongside updated build tooling and new API reliability stress-test assets.
Changes:
- Add streaming support + reasoning capture plumbing (reasoning_content parsing, dual dataset writing, config flags).
- Introduce
vellumforge2 transformfor offline dataset conversion/regeneration with checkpoint/resume support. - Modernize developer tooling/docs (Makefile targets, golangci config, README updates, stress-test scripts/docs).
Reviewed changes
Copilot reviewed 40 out of 41 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/api-reliability-test/run_test.sh | Runner script for reliability stress tests + analysis invocation |
| tests/api-reliability-test/config.nahcrof.toml | Stress-test config for nahcrof provider |
| tests/api-reliability-test/analyze_results.py | Parses sessions/logs to summarize reliability + write results.json |
| tests/api-reliability-test/TEST_README.md | Detailed setup/run instructions for the reliability test package |
| tests/api-reliability-test/RESULTS.md | Captured stress-test results writeup |
| tests/api-reliability-test/.env.nahcrof.example | Example env template for nahcrof API key |
| tests/api-reliability-test/.env.chutes.example | Example env template for chutes API key |
| pkg/models/types.go | Adds SFT format options + expands SFTRecord and GenerationResult for reasoning fields |
| internal/writer/session.go | Adds reasoning dataset output path helper |
| internal/writer/interface.go | Introduces writer.Writer interface supporting single/dual dataset modes |
| internal/writer/dual_dataset_test.go | Unit tests for applying reasoning into Alpaca/ShareGPT SFT records |
| internal/writer/dual_dataset.go | DualDatasetWriter writing both plain + reasoning JSONL outputs |
| internal/writer/dataset.go | Updates DatasetWriter method signatures to match new interface |
| internal/util/think_tags_test.go | Tests for think-tag detection/extraction/stripping helpers |
| internal/util/think_tags.go | Implements think-tag detection/extraction/stripping helpers |
| internal/util/reasoning.go | Helpers for wrapping/combining/stripping reasoning via <think> tags |
| internal/util/meta_cleanup_test.go | Tests for stripping meta chatter from LLM outputs |
| internal/util/meta_cleanup.go | Implements meta cleanup heuristic for rejected generations |
| internal/orchestrator/worker.go | Switches to streaming when enabled, captures reasoning, adds completion validation |
| internal/orchestrator/validation.go | Adds finish_reason validation + incomplete-output detection; refines refusal detection |
| internal/orchestrator/sft_format_test.go | Tests for writing SFT records in Alpaca vs ShareGPT formats |
| internal/orchestrator/orchestrator.go | Uses writer.Writer interface + resume progress rendering changes |
| internal/hfhub/uploader.go | Includes optional reasoning dataset in HF uploads (skips if missing) |
| internal/dataset/transform.go | Implements offline dataset transform modes with worker pool + checkpointing |
| internal/config/loader.go | Adds defaults for SFT format + per-model HTTP timeout |
| internal/config/config.go | Adds new generation/model config fields + secret mapping for new providers |
| internal/checkpoint/manager.go | Minor import formatting |
| internal/api/types.go | Extends API types to include reasoning_content + reasoning token usage fields |
| internal/api/streaming.go | Adds SSE streaming client to assemble content + reasoning_content |
| internal/api/reasoning_test.go | Integration tests for reasoning capabilities and reasoning_content parsing |
| internal/api/reasoning_streaming_test.go | Integration tests for reasoning_content in streaming mode |
| internal/api/reasoning_stream_test.go | Raw HTTP streaming experiments for reasoning_content field discovery |
| internal/api/reasoning_debug_test.go | Debug integration tests for raw responses / temperature variation |
| internal/api/client.go | Moves timeout control to per-request contexts; adds per-model timeout enforcement |
| configs/config.example.toml | Documents new config options (sft_format, streaming, timeouts, reasoning capture) |
| configs/.env.example | Adds CHUTES_API_KEY and NAHCROF_API_KEY templates |
| cmd/vellumforge2/main.go | Adds transform command + env-file support on resume; dual-writer selection |
| README.md | Documents transform usage + resume flags + timeout tuning; bumps version to v1.6.0 |
| Makefile | Adds new targets, formatting checks, CI helpers, colorized output; bumps version |
| .golangci.yml | Adds repository lint configuration |
| .gitignore | Improves env ignore patterns; ignores results.json; un-ignores config.nahcrof.toml |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ├── analyze_results.py # Results analysis script | ||
| ├── config.nahcrof.toml # nahcrof test config | ||
| ├── config.chutes.toml # chutes test config | ||
| ├── .env.nahcrof.example # API key template (nahcrof) | ||
| ├── .env.chutes.example # API key template (chutes) |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The README and runner script reference config.chutes.toml, but that file isn’t present in tests/api-reliability-test/ in this PR. As-is, run_test.sh chutes will fail. Either add the missing config file or update the docs/script to match the files that are actually checked in.
| transformCmd.Flags().StringVar(&configPath, "config", "config.toml", "Path to configuration file") | ||
| transformCmd.Flags().StringVar(&envFile, "env-file", ".env", "Path to environment file") | ||
| transformCmd.Flags().BoolVarP(&verbose, "verbose", "v", false, "Enable verbose logging") | ||
| transformCmd.Flags().StringVar(&transformMode, "mode", string(dataset.TransformSFTToDPO), "Transform mode: 'sft-to-dpo' or 'regen-rejected'") |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--mode is given a default (sft-to-dpo), but later this flag is marked required. That combination is confusing for users because the help implies it’s optional while Cobra will still error unless it’s explicitly set. Consider removing the default or dropping the required constraint.
| transformCmd.Flags().StringVar(&transformMode, "mode", string(dataset.TransformSFTToDPO), "Transform mode: 'sft-to-dpo' or 'regen-rejected'") | |
| transformCmd.Flags().StringVar(&transformMode, "mode", "", "Transform mode: 'sft-to-dpo' or 'regen-rejected'") |
| if err := parentCtx.Err(); err != nil && err != context.Canceled { | ||
| return err | ||
| } | ||
|
|
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In both transform modes, Ctrl+C / context cancellation will currently return nil because cancellation is explicitly ignored (err != nil && err != context.Canceled). That makes interrupted transforms look successful and can hide partial output. Consider returning context.Canceled (or ctx.Err()) after saving the checkpoint so the CLI can report an interrupted run correctly.
| // Very short outputs are likely incomplete (already caught by refusal check) | ||
| if len(trimmed) < 100 { | ||
| return true, "output too short (< 100 chars)" | ||
| } | ||
|
|
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isIncompleteOutput treats any output < 100 chars as incomplete. This will hard-fail legitimate short completions (especially outside story-generation use cases) and overlaps with the refusal check (<50 chars). Consider removing the length-based rejection or making it conditional (e.g., only when streaming is enabled, or only when max_output_tokens is large / long-form prompts).
| // Very short outputs are likely incomplete (already caught by refusal check) | |
| if len(trimmed) < 100 { | |
| return true, "output too short (< 100 chars)" | |
| } |
| except: | ||
| pass |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid bare except: here; it will silently swallow unexpected errors and can hide malformed JSONL lines (skewing the stats). Catch specific exceptions (e.g., json.JSONDecodeError) and consider counting/reporting skipped lines.
| define print_step | ||
| @echo -e "$(COLOR_BOLD)$(COLOR_BLUE)==> $(1)$(COLOR_RESET)" | ||
| endef |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
echo -e is not portable across /bin/sh implementations (e.g., dash prints -e literally). Since print_step is used by most targets, consider using printf for ANSI color output to keep the Makefile working consistently across environments.
| ## ci: Full CI pipeline (format check + all validations + coverage) | ||
| ci: fmt-check vet lint test-coverage build | ||
| $(call print_step,"✅ CI pipeline completed successfully!") |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ci target currently runs test-coverage, which runs the full test suite (including integration tests that can make real API calls). This can be costly/flaky when CI has credentials available. Consider using the test (short) target in ci and keeping integration tests behind an explicit target/job.
| var responseCreated int64 | ||
| var finishReason string | ||
|
|
||
| scanner := bufio.NewScanner(httpResp.Body) |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bufio.Scanner has a relatively small max token size by default. Some providers may emit large SSE data: lines, which can cause scanner errors and truncated streaming. Consider calling scanner.Buffer(...) with a larger max (similar to the JSONL scanner usage elsewhere).
| scanner := bufio.NewScanner(httpResp.Body) | |
| scanner := bufio.NewScanner(httpResp.Body) | |
| // Increase scanner buffer to handle large SSE data: lines | |
| scanner.Buffer(make([]byte, 0, 64*1024), 10*1024*1024) |
| # Run the test | ||
| timeout 1800 ./bin/vellumforge2 run --config "$config" --env-file "$env_file" 2>&1 | \ | ||
| tee "tests/api-reliability-test/test_${provider}.log" |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script assumes the timeout utility is available. On macOS it’s typically missing (users often have gtimeout via coreutils instead). Consider adding a small portability shim (detect timeout vs gtimeout) and/or documenting the dependency so the script works across the platforms you build for.
| import json | ||
| import os | ||
| import glob | ||
| from pathlib import Path |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'Path' is not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d1d7786d6f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| func (dw *DualDatasetWriter) Close() error { | ||
| var errs []error | ||
|
|
||
| if err := dw.regularFile.Close(); err != nil { | ||
| errs = append(errs, fmt.Errorf("regular dataset: %w", err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flush buffered records before closing dual writer
DualDatasetWriter.Close closes the files directly without persisting dw.records. In MO-DPO runs, writeMODPORecord writes through WriteRecord/UpdateRecord (buffered path), so enabling generation.enable_reasoning_capture switches to this writer and silently drops all buffered rows at shutdown, leaving dataset.jsonl (and no reasoning equivalent) incomplete or empty despite successful generation logs.
Useful? React with 👍 / 👎.
| "model", modelCfg.ModelName, | ||
| "timeout", timeout) | ||
| } | ||
| ctx, cancel = context.WithTimeout(ctx, timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apply HTTP timeout per attempt, not across queue/retries
Creating a timeout-scoped context at function entry makes the timeout budget cover rate-limiter waiting plus every retry/backoff, not just an HTTP attempt. In low-RPM or high-concurrency runs, Wait can exhaust the 120s budget before any request is sent; and when an attempt times out, subsequent retries immediately hit ctx.Done() and return, so configured max_retries cannot recover from timeout failures.
Useful? React with 👍 / 👎.
This pull request introduces significant improvements to the build pipeline, dataset transformation capabilities, and documentation for
vellumforge2. The most notable changes are the addition of a robust offline dataset transformation command, enhanced support for dual dataset generation, and a modernizedMakefilefor streamlined development and CI workflows. Documentation is updated to reflect new features, usage, and configuration tips.Build & CI Improvements:
Makefilewith new targets for quality checks, coverage, linting, and CI, including colorized output and improved help descriptions. Adds targets for dependency verification, pre-commit checks, and security scanning. [1] [2].golangci.ymlfor consistent linting configuration across environments.Dataset Transformation Features:
transformcommand to convert SFT datasets to DPO and regenerate rejected responses, supporting checkpoint/resume and dual dataset (reasoning/plain) workflows. [1] [2] [3]Documentation Updates:
README.mdwith detailed instructions for the newtransformcommand, checkpoint/resume with custom config and env files, and guidance for handling long-form generation timeouts and retries. [1] [2] [3]v1.6.0. [1] [2]Session Resume Enhancements:
Miscellaneous:
Summary by cubic
Adds streaming generation and dual dataset output with reasoning capture, plus a new transform CLI to convert SFT→DPO and regenerate rejected responses. Improves long-run reliability with per-model HTTP timeouts, better resume, and output validation; updates Makefile, linting, and docs.
New Features
Build & CI
Written for commit d1d7786. Summary will update on new commits.