Co t #4

lemon07r · 2026-02-08T15:50:45Z

This pull request introduces significant improvements to the build pipeline, dataset transformation capabilities, and documentation for vellumforge2. The most notable changes are the addition of a robust offline dataset transformation command, enhanced support for dual dataset generation, and a modernized Makefile for streamlined development and CI workflows. Documentation is updated to reflect new features, usage, and configuration tips.

Build & CI Improvements:

Modernized Makefile with new targets for quality checks, coverage, linting, and CI, including colorized output and improved help descriptions. Adds targets for dependency verification, pre-commit checks, and security scanning. [1] [2]
Added .golangci.yml for consistent linting configuration across environments.

Dataset Transformation Features:

Introduced a new transform command to convert SFT datasets to DPO and regenerate rejected responses, supporting checkpoint/resume and dual dataset (reasoning/plain) workflows. [1] [2] [3]
Enhanced dataset writer logic to support dual output when reasoning capture is enabled, both for generation and resume flows. [1] [2]

Documentation Updates:

Expanded README.md with detailed instructions for the new transform command, checkpoint/resume with custom config and env files, and guidance for handling long-form generation timeouts and retries. [1] [2] [3]
Updated version references in documentation to v1.6.0. [1] [2]

Session Resume Enhancements:

Improved resume logic to support loading environment variables from a file, enabling more flexible configuration for checkpoint resumption.

Miscellaneous:

Minor import ordering and refactoring for clarity and maintainability.

Summary by cubic

Adds streaming generation and dual dataset output with reasoning capture, plus a new transform CLI to convert SFT→DPO and regenerate rejected responses. Improves long-run reliability with per-model HTTP timeouts, better resume, and output validation; updates Makefile, linting, and docs.

New Features
- Transform CLI: SFT→DPO and rejected regeneration with checkpoint/resume and optional dual outputs (reasoning/plain).
- Streaming chat completions to bypass gateway timeouts; captures reasoning_content; per-model HTTP timeouts via config.
- Dual dataset writer and HF uploader: writes dataset.jsonl and dataset_reasoning.jsonl; utils for tags and meta cleanup.
- SFT format option (alpaca or sharegpt) for SFT mode outputs.
- Resume and validation: --env-file support for checkpoints; stronger checks for incomplete/refusal; stress-test suite and provider configs.
Build & CI
- Modernized Makefile with deps/fmt/vet/lint/test/coverage/pre-commit/security/CI targets and colorized help.
- Added golangci-lint configuration.
- Docs updated for transform, streaming, dual dataset, and resume flags; version bumped to v1.6.0.

^{Written for commit d1d7786. Summary will update on new commits.}

… training - Implemented streaming tests for reasoning models in `reasoning_streaming_test.go`. - Developed comprehensive reasoning tests in `reasoning_test.go` to validate logical thinking prompts. - Created a streaming API in `streaming.go` to handle reasoning content in responses. - Introduced utility functions in `reasoning.go` and `think_tags.go` for managing reasoning content and tags. - Added tests for utility functions to ensure correct handling of think tags. - Developed `DualDatasetWriter` in `dual_dataset.go` to write both regular and reasoning-aware datasets simultaneously. - Enhanced error handling and logging throughout the new implementations.

…upport and increase HTTP timeout settings for long-form generation

…r checkpoint resume

- Add use_streaming config option to ModelConfig - Integrate ChatCompletionStreaming into orchestrator worker for chosen/rejected generation - Apply HTTP timeout settings to streaming requests - Update example config with streaming documentation - Enable streaming in config.sft.toml for long-form generation Streaming keeps connection alive with data flowing, bypassing 504 gateway timeouts that occur with long-running requests (16k+ tokens taking 5-10+ minutes).

- Add Writer interface for unified dataset writer abstraction - Update DatasetWriter and DualDatasetWriter to implement Writer interface - Integrate DualDatasetWriter into orchestrator based on enable_reasoning_capture config - Capture ReasoningContent from API responses (both chosen and rejected) - Pass reasoning content through to write methods for dual dataset output - Update config.sft.toml to enable dual dataset mode with streaming - Document dual dataset feature in example configs Dual dataset mode generates two files simultaneously: 1. dataset.jsonl - Regular responses (content only) 2. dataset_reasoning.jsonl - Responses with <think> tags containing reasoning Reasoning capture requires streaming mode (use_streaming=true) to access reasoning_content field from reasoning models like Kimi-K2-Thinking.

…esence The worker was incorrectly trying to use rejected model in SFT mode if it was configured, even though SFT only needs chosen responses. Added explicit check for DatasetModeSFT to skip rejected model generation. This fixes a regression where SFT mode would try to connect to localhost:8080 for rejected responses that weren't needed.

…onse length

…ed development workflow

- Updated .gitignore to include environment files for different providers. - Simplified .golangci.yml by removing unused linters and settings. - Added functions to validate API response completeness in validation.go. - Integrated validation checks into worker.go to handle incomplete outputs and finish reasons. - Created example environment files for chutes and nahcrof APIs. - Generated detailed stress test results in RESULTS.md, highlighting performance comparisons between nahcrof and chutes. - Expanded TEST_README.md with comprehensive setup instructions and analysis of test results. - Implemented analyze_results.py to process and summarize test outcomes. - Developed run_test.sh script to automate testing and analysis for both API providers.

…sion and rejected response regeneration

…nd rejected response regeneration with checkpoint support

… configuration

…rm struct formatting

…tured output

…ntegrated with progress bars during transformation and a dedicated cleanup script.

lemon07r · 2026-02-08T15:51:22Z

@codex review

cubic-dev-ai

7 issues found across 41 files

Confidence score: 3/5

There are a couple of concrete user-impacting risks: resume behavior in internal/dataset/transform.go can silently create new outputs and skip completed jobs, leading to incomplete datasets.
internal/config/loader.go appears to override an explicit http_timeout_seconds = 0, which is a functional regression for users who rely on disabling timeouts.
Overall this is a sizable set of behavioral concerns, so risk is moderate despite no critical blockers called out.
Pay close attention to internal/dataset/transform.go, internal/config/loader.go, tests/api-reliability-test/run_test.sh - resume correctness, timeout semantics, and test failure propagation.

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="tests/api-reliability-test/run_test.sh">

<violation number="1" location="tests/api-reliability-test/run_test.sh:2">
P2: `set -e` doesn’t catch failures inside pipelines; the `timeout ... | tee` command can fail without stopping the script. Enable `pipefail` (and optionally `-u`) so test failures propagate correctly.</violation>
</file>

<file name="internal/dataset/transform.go">

<violation number="1" location="internal/dataset/transform.go:197">
P2: Resuming should not create a new output dataset when the original file is missing; it will skip completed jobs from the checkpoint and produce an incomplete dataset. Fail fast when the resume output file does not exist.</violation>

<violation number="2" location="internal/dataset/transform.go:438">
P2: When resuming reasoning output, silently creating a new file will cause the checkpoint to skip already completed jobs and produce an incomplete reasoning dataset. Resume should error if the reasoning output file is missing.</violation>
</file>

<file name="tests/api-reliability-test/analyze_results.py">

<violation number="1" location="tests/api-reliability-test/analyze_results.py:62">
P2: Avoid a bare except when parsing dataset lines; it hides unexpected errors and can mask incorrect analysis results. Catch the expected JSON decode failure explicitly.</violation>
</file>

<file name="internal/config/loader.go">

<violation number="1" location="internal/config/loader.go:103">
P2: This default overrides an explicit `http_timeout_seconds = 0`, even though the config docs specify that 0 disables timeouts. As written, users can no longer set "no timeout" behavior, which is a functional regression for configs relying on that setting.</violation>
</file>

<file name="internal/api/reasoning_stream_test.go">

<violation number="1" location="internal/api/reasoning_stream_test.go:81">
P2: Increase the scanner buffer size for streaming responses so large SSE chunks don't trigger `bufio.ErrTooLong`.</violation>
</file>

<file name="internal/api/streaming.go">

<violation number="1" location="internal/api/streaming.go:102">
P3: The marshal → unmarshal → add key → re-marshal round-trip to inject `"stream": true` is inefficient and fragile. Consider adding a `Stream bool` field (with `json:"stream,omitempty"`) to `ChatCompletionRequest` instead, which avoids the double-serialization overhead and potential floating-point representation issues.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-02-08T15:56:43Z

tests/api-reliability-test/run_test.sh

@@ -0,0 +1,92 @@
+#!/bin/bash
+set -e


P2: set -e doesn’t catch failures inside pipelines; the timeout ... | tee command can fail without stopping the script. Enable pipefail (and optionally -u) so test failures propagate correctly.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At tests/api-reliability-test/run_test.sh, line 2: <comment>`set -e` doesn’t catch failures inside pipelines; the `timeout ... | tee` command can fail without stopping the script. Enable `pipefail` (and optionally `-u`) so test failures propagate correctly.</comment> <file context> @@ -0,0 +1,92 @@ +#!/bin/bash +set -e + +# API Reliability Stress Test Runner </file context>

cubic-dev-ai · 2026-02-08T15:56:43Z

internal/dataset/transform.go

+
+	if opts.OutputReasoningPath != "" {
+		if opts.Resume {
+			reasoningFile, err = os.OpenFile(opts.OutputReasoningPath, os.O_APPEND|os.O_WRONLY, 0o644)


P2: When resuming reasoning output, silently creating a new file will cause the checkpoint to skip already completed jobs and produce an incomplete reasoning dataset. Resume should error if the reasoning output file is missing.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At internal/dataset/transform.go, line 438: <comment>When resuming reasoning output, silently creating a new file will cause the checkpoint to skip already completed jobs and produce an incomplete reasoning dataset. Resume should error if the reasoning output file is missing.</comment> <file context> @@ -0,0 +1,957 @@ + + if opts.OutputReasoningPath != "" { + if opts.Resume { + reasoningFile, err = os.OpenFile(opts.OutputReasoningPath, os.O_APPEND|os.O_WRONLY, 0o644) + if err != nil { + if os.IsNotExist(err) { </file context>

cubic-dev-ai · 2026-02-08T15:56:43Z

internal/dataset/transform.go

+	// Open output file (append on resume, create otherwise).
+	var outputFile *os.File
+	if opts.Resume {
+		outputFile, err = os.OpenFile(opts.OutputPath, os.O_APPEND|os.O_WRONLY, 0o644)


P2: Resuming should not create a new output dataset when the original file is missing; it will skip completed jobs from the checkpoint and produce an incomplete dataset. Fail fast when the resume output file does not exist.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At internal/dataset/transform.go, line 197: <comment>Resuming should not create a new output dataset when the original file is missing; it will skip completed jobs from the checkpoint and produce an incomplete dataset. Fail fast when the resume output file does not exist.</comment> <file context> @@ -0,0 +1,957 @@ + // Open output file (append on resume, create otherwise). + var outputFile *os.File + if opts.Resume { + outputFile, err = os.OpenFile(opts.OutputPath, os.O_APPEND|os.O_WRONLY, 0o644) + if err != nil { + if os.IsNotExist(err) { </file context>

cubic-dev-ai · 2026-02-08T15:56:43Z

tests/api-reliability-test/analyze_results.py

+                                'length': len(output_text),
+                                'ending': output_text[-60:]
+                            })
+                except:


P2: Avoid a bare except when parsing dataset lines; it hides unexpected errors and can mask incorrect analysis results. Catch the expected JSON decode failure explicitly.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At tests/api-reliability-test/analyze_results.py, line 62: <comment>Avoid a bare except when parsing dataset lines; it hides unexpected errors and can mask incorrect analysis results. Catch the expected JSON decode failure explicitly.</comment> <file context> @@ -0,0 +1,430 @@ + 'length': len(output_text), + 'ending': output_text[-60:] + }) + except: + pass + </file context>

cubic-dev-ai · 2026-02-08T15:56:43Z

internal/config/loader.go

 		}
+		// Default HTTP timeout: 120 seconds
+		// For very long-form generation (>4000 tokens), set this higher (e.g., 600-1200)
+		if model.HTTPTimeoutSeconds == 0 {


P2: This default overrides an explicit http_timeout_seconds = 0, even though the config docs specify that 0 disables timeouts. As written, users can no longer set "no timeout" behavior, which is a functional regression for configs relying on that setting.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At internal/config/loader.go, line 103: <comment>This default overrides an explicit `http_timeout_seconds = 0`, even though the config docs specify that 0 disables timeouts. As written, users can no longer set "no timeout" behavior, which is a functional regression for configs relying on that setting.</comment> <file context> @@ -93,6 +98,11 @@ func applyDefaults(cfg *Config) { } + // Default HTTP timeout: 120 seconds + // For very long-form generation (>4000 tokens), set this higher (e.g., 600-1200) + if model.HTTPTimeoutSeconds == 0 { + model.HTTPTimeoutSeconds = 120 // 2 minutes default + } </file context>

cubic-dev-ai · 2026-02-08T15:56:43Z

internal/api/reasoning_stream_test.go

+	t.Logf("Response status: %d", resp.StatusCode)
+
+	// Read streaming response
+	scanner := bufio.NewScanner(resp.Body)


P2: Increase the scanner buffer size for streaming responses so large SSE chunks don't trigger bufio.ErrTooLong.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At internal/api/reasoning_stream_test.go, line 81: <comment>Increase the scanner buffer size for streaming responses so large SSE chunks don't trigger `bufio.ErrTooLong`.</comment> <file context> @@ -0,0 +1,300 @@ + t.Logf("Response status: %d", resp.StatusCode) + + // Read streaming response + scanner := bufio.NewScanner(resp.Body) + chunkCount := 0 + hasReasoningContent := false </file context>

cubic-dev-ai · 2026-02-08T15:56:43Z

internal/api/streaming.go

+	}
+
+	// Add stream parameter (not part of ChatCompletionRequest struct)
+	reqMap := make(map[string]interface{})


P3: The marshal → unmarshal → add key → re-marshal round-trip to inject "stream": true is inefficient and fragile. Consider adding a Stream bool field (with json:"stream,omitempty") to ChatCompletionRequest instead, which avoids the double-serialization overhead and potential floating-point representation issues.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At internal/api/streaming.go, line 102: <comment>The marshal → unmarshal → add key → re-marshal round-trip to inject `"stream": true` is inefficient and fragile. Consider adding a `Stream bool` field (with `json:"stream,omitempty"`) to `ChatCompletionRequest` instead, which avoids the double-serialization overhead and potential floating-point representation issues.</comment> <file context> @@ -0,0 +1,372 @@ + } + + // Add stream parameter (not part of ChatCompletionRequest struct) + reqMap := make(map[string]interface{}) + reqBytes, err := json.Marshal(req) + if err != nil { </file context>

Copilot

Pull request overview

This PR expands vellumforge2 with (1) streaming + reasoning capture support (dual dataset outputs), and (2) an offline dataset transform command for SFT→DPO conversion / rejected-response regeneration, alongside updated build tooling and new API reliability stress-test assets.

Changes:

Add streaming support + reasoning capture plumbing (reasoning_content parsing, dual dataset writing, config flags).
Introduce vellumforge2 transform for offline dataset conversion/regeneration with checkpoint/resume support.
Modernize developer tooling/docs (Makefile targets, golangci config, README updates, stress-test scripts/docs).

Reviewed changes

Copilot reviewed 40 out of 41 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
tests/api-reliability-test/run_test.sh	Runner script for reliability stress tests + analysis invocation
tests/api-reliability-test/config.nahcrof.toml	Stress-test config for nahcrof provider
tests/api-reliability-test/analyze_results.py	Parses sessions/logs to summarize reliability + write results.json
tests/api-reliability-test/TEST_README.md	Detailed setup/run instructions for the reliability test package
tests/api-reliability-test/RESULTS.md	Captured stress-test results writeup
tests/api-reliability-test/.env.nahcrof.example	Example env template for nahcrof API key
tests/api-reliability-test/.env.chutes.example	Example env template for chutes API key
pkg/models/types.go	Adds SFT format options + expands SFTRecord and GenerationResult for reasoning fields
internal/writer/session.go	Adds reasoning dataset output path helper
internal/writer/interface.go	Introduces writer.Writer interface supporting single/dual dataset modes
internal/writer/dual_dataset_test.go	Unit tests for applying reasoning into Alpaca/ShareGPT SFT records
internal/writer/dual_dataset.go	DualDatasetWriter writing both plain + reasoning JSONL outputs
internal/writer/dataset.go	Updates DatasetWriter method signatures to match new interface
internal/util/think_tags_test.go	Tests for think-tag detection/extraction/stripping helpers
internal/util/think_tags.go	Implements think-tag detection/extraction/stripping helpers
internal/util/reasoning.go	Helpers for wrapping/combining/stripping reasoning via `<think>` tags
internal/util/meta_cleanup_test.go	Tests for stripping meta chatter from LLM outputs
internal/util/meta_cleanup.go	Implements meta cleanup heuristic for rejected generations
internal/orchestrator/worker.go	Switches to streaming when enabled, captures reasoning, adds completion validation
internal/orchestrator/validation.go	Adds finish_reason validation + incomplete-output detection; refines refusal detection
internal/orchestrator/sft_format_test.go	Tests for writing SFT records in Alpaca vs ShareGPT formats
internal/orchestrator/orchestrator.go	Uses writer.Writer interface + resume progress rendering changes
internal/hfhub/uploader.go	Includes optional reasoning dataset in HF uploads (skips if missing)
internal/dataset/transform.go	Implements offline dataset transform modes with worker pool + checkpointing
internal/config/loader.go	Adds defaults for SFT format + per-model HTTP timeout
internal/config/config.go	Adds new generation/model config fields + secret mapping for new providers
internal/checkpoint/manager.go	Minor import formatting
internal/api/types.go	Extends API types to include reasoning_content + reasoning token usage fields
internal/api/streaming.go	Adds SSE streaming client to assemble content + reasoning_content
internal/api/reasoning_test.go	Integration tests for reasoning capabilities and reasoning_content parsing
internal/api/reasoning_streaming_test.go	Integration tests for reasoning_content in streaming mode
internal/api/reasoning_stream_test.go	Raw HTTP streaming experiments for reasoning_content field discovery
internal/api/reasoning_debug_test.go	Debug integration tests for raw responses / temperature variation
internal/api/client.go	Moves timeout control to per-request contexts; adds per-model timeout enforcement
configs/config.example.toml	Documents new config options (sft_format, streaming, timeouts, reasoning capture)
configs/.env.example	Adds CHUTES_API_KEY and NAHCROF_API_KEY templates
cmd/vellumforge2/main.go	Adds `transform` command + env-file support on resume; dual-writer selection
README.md	Documents transform usage + resume flags + timeout tuning; bumps version to v1.6.0
Makefile	Adds new targets, formatting checks, CI helpers, colorized output; bumps version
.golangci.yml	Adds repository lint configuration
.gitignore	Improves env ignore patterns; ignores results.json; un-ignores config.nahcrof.toml

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-08T15:57:04Z

tests/api-reliability-test/TEST_README.md

+├── analyze_results.py           # Results analysis script
+├── config.nahcrof.toml          # nahcrof test config
+├── config.chutes.toml           # chutes test config
+├── .env.nahcrof.example         # API key template (nahcrof)
+├── .env.chutes.example          # API key template (chutes)


The README and runner script reference config.chutes.toml, but that file isn’t present in tests/api-reliability-test/ in this PR. As-is, run_test.sh chutes will fail. Either add the missing config file or update the docs/script to match the files that are actually checked in.

Copilot · 2026-02-08T15:57:04Z

cmd/vellumforge2/main.go

+	transformCmd.Flags().StringVar(&configPath, "config", "config.toml", "Path to configuration file")
+	transformCmd.Flags().StringVar(&envFile, "env-file", ".env", "Path to environment file")
+	transformCmd.Flags().BoolVarP(&verbose, "verbose", "v", false, "Enable verbose logging")
+	transformCmd.Flags().StringVar(&transformMode, "mode", string(dataset.TransformSFTToDPO), "Transform mode: 'sft-to-dpo' or 'regen-rejected'")


--mode is given a default (sft-to-dpo), but later this flag is marked required. That combination is confusing for users because the help implies it’s optional while Cobra will still error unless it’s explicitly set. Consider removing the default or dropping the required constraint.

Suggested change

transformCmd.Flags().StringVar(&transformMode, "mode", string(dataset.TransformSFTToDPO), "Transform mode: 'sft-to-dpo' or 'regen-rejected'")

transformCmd.Flags().StringVar(&transformMode, "mode", "", "Transform mode: 'sft-to-dpo' or 'regen-rejected'")

Copilot · 2026-02-08T15:57:05Z

internal/dataset/transform.go

+	if err := parentCtx.Err(); err != nil && err != context.Canceled {
+		return err
+	}
+


In both transform modes, Ctrl+C / context cancellation will currently return nil because cancellation is explicitly ignored (err != nil && err != context.Canceled). That makes interrupted transforms look successful and can hide partial output. Consider returning context.Canceled (or ctx.Err()) after saving the checkpoint so the CLI can report an interrupted run correctly.

Copilot · 2026-02-08T15:57:05Z

internal/orchestrator/validation.go

+	// Very short outputs are likely incomplete (already caught by refusal check)
+	if len(trimmed) < 100 {
+		return true, "output too short (< 100 chars)"
+	}
+


isIncompleteOutput treats any output < 100 chars as incomplete. This will hard-fail legitimate short completions (especially outside story-generation use cases) and overlaps with the refusal check (<50 chars). Consider removing the length-based rejection or making it conditional (e.g., only when streaming is enabled, or only when max_output_tokens is large / long-form prompts).

Suggested change

// Very short outputs are likely incomplete (already caught by refusal check)

if len(trimmed) < 100 {

return true, "output too short (< 100 chars)"

}

Copilot · 2026-02-08T15:57:05Z

tests/api-reliability-test/analyze_results.py

+                except:
+                    pass


Avoid bare except: here; it will silently swallow unexpected errors and can hide malformed JSONL lines (skewing the stats). Catch specific exceptions (e.g., json.JSONDecodeError) and consider counting/reporting skipped lines.

Copilot · 2026-02-08T15:57:06Z

Makefile

+define print_step
+	@echo -e "$(COLOR_BOLD)$(COLOR_BLUE)==> $(1)$(COLOR_RESET)"
+endef


echo -e is not portable across /bin/sh implementations (e.g., dash prints -e literally). Since print_step is used by most targets, consider using printf for ANSI color output to keep the Makefile working consistently across environments.

Copilot · 2026-02-08T15:57:06Z

Makefile

+## ci: Full CI pipeline (format check + all validations + coverage)
+ci: fmt-check vet lint test-coverage build
+	$(call print_step,"✅ CI pipeline completed successfully!")


The ci target currently runs test-coverage, which runs the full test suite (including integration tests that can make real API calls). This can be costly/flaky when CI has credentials available. Consider using the test (short) target in ci and keeping integration tests behind an explicit target/job.

Copilot · 2026-02-08T15:57:06Z

internal/api/streaming.go

+	var responseCreated int64
+	var finishReason string
+
+	scanner := bufio.NewScanner(httpResp.Body)


bufio.Scanner has a relatively small max token size by default. Some providers may emit large SSE data: lines, which can cause scanner errors and truncated streaming. Consider calling scanner.Buffer(...) with a larger max (similar to the JSONL scanner usage elsewhere).

Suggested change

scanner := bufio.NewScanner(httpResp.Body)

scanner := bufio.NewScanner(httpResp.Body)

// Increase scanner buffer to handle large SSE data: lines

scanner.Buffer(make([]byte, 0, 64*1024), 10*1024*1024)

Copilot · 2026-02-08T15:57:06Z

tests/api-reliability-test/run_test.sh

+    # Run the test
+    timeout 1800 ./bin/vellumforge2 run --config "$config" --env-file "$env_file" 2>&1 | \
+        tee "tests/api-reliability-test/test_${provider}.log"


This script assumes the timeout utility is available. On macOS it’s typically missing (users often have gtimeout via coreutils instead). Consider adding a small portability shim (detect timeout vs gtimeout) and/or documenting the dependency so the script works across the platforms you build for.

Copilot · 2026-02-08T15:57:07Z

tests/api-reliability-test/analyze_results.py

+import json
+import os
+import glob
+from pathlib import Path


Import of 'Path' is not used.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d1d7786d6f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-08T15:58:01Z

internal/writer/dual_dataset.go

+func (dw *DualDatasetWriter) Close() error {
+	var errs []error
+
+	if err := dw.regularFile.Close(); err != nil {
+		errs = append(errs, fmt.Errorf("regular dataset: %w", err))


Flush buffered records before closing dual writer

DualDatasetWriter.Close closes the files directly without persisting dw.records. In MO-DPO runs, writeMODPORecord writes through WriteRecord/UpdateRecord (buffered path), so enabling generation.enable_reasoning_capture switches to this writer and silently drops all buffered rows at shutdown, leaving dataset.jsonl (and no reasoning equivalent) incomplete or empty despite successful generation logs.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-08T15:58:01Z

internal/api/client.go

+			"model", modelCfg.ModelName,
+			"timeout", timeout)
+	}
+	ctx, cancel = context.WithTimeout(ctx, timeout)


Apply HTTP timeout per attempt, not across queue/retries

Creating a timeout-scoped context at function entry makes the timeout budget cover rate-limiter waiting plus every retry/backoff, not just an HTTP attempt. In low-RPM or high-concurrency runs, Wait can exhaust the 120s budget before any request is sent; and when an attempt times out, subsequent retries immediately hit ctx.Done() and return, so configured max_retries cannot recover from timeout failures.

Useful? React with 👍 / 👎.

lemon07r added 23 commits November 12, 2025 05:49

feat: Enhance checkpoint resume functionality with environment file s…

3a18b1e

…upport and increase HTTP timeout settings for long-form generation

Merge timeout branch: Add HTTP timeout config and env-file support fo…

567a0d0

…r checkpoint resume

feat: Enhance refusal response handling with detailed logging of resp…

7b122fd

…onse length

feat: Add golangci-lint configuration and enhance Makefile for improv…

7a12e94

…ed development workflow

feat: Add optional verbose SSE logging for debugging streaming responses

2c711ba

feat: Add integration test command and improve test output handling

89812ee

feat: Enhance preference pair generation with initial progress tracking

27597e1

feat: Update version to 1.6.0 and add progress render delay in worker

1aeadc9

feat: Add ShareGPT and Alpaca format options for SFT datasets

c5c11bc

feat: Implement dataset transformation commands for SFT to DPO conver…

4745558

…sion and rejected response regeneration

feat: Add dataset transformation commands for SFT to DPO conversion a…

8fad787

…nd rejected response regeneration with checkpoint support

feat: Add reasoning support for DPO dataset transformation and update…

ee8ea5e

… configuration

feat: Update .gitignore to include new config file and adjust transfo…

83ce9c7

…rm struct formatting

feat: Enhance analyze_results script with session selection and struc…

d840306

…tured output

feat: Add support for Chutes AI and Nahcrof AI API keys in configuration

bc2a3e4

fix: Update estimated duration for testing both providers in run_test.sh

d54beeb

feat: Implement LLM response meta-chatter cleanup for DPO datasets, i…

d6b1817

…ntegrated with progress bars during transformation and a dedicated cleanup script.

test: improve meta cleanup utility tests

d1d7786

lemon07r requested a review from Copilot February 8, 2026 15:50

Copilot started reviewing on behalf of lemon07r February 8, 2026 15:51 View session

cubic-dev-ai bot reviewed Feb 8, 2026

View reviewed changes

Copilot AI reviewed Feb 8, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Feb 8, 2026

View reviewed changes

	transformCmd.Flags().StringVar(&transformMode, "mode", string(dataset.TransformSFTToDPO), "Transform mode: 'sft-to-dpo' or 'regen-rejected'")
	transformCmd.Flags().StringVar(&transformMode, "mode", "", "Transform mode: 'sft-to-dpo' or 'regen-rejected'")

Co t #4

Are you sure you want to change the base?

Co t #4

Uh oh!

Conversation

lemon07r commented Feb 8, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Uh oh!

lemon07r commented Feb 8, 2026

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

lemon07r commented Feb 8, 2026 •

edited by cubic-dev-ai bot

Loading

cubic-dev-ai bot Feb 8, 2026 •

edited

Loading

cubic-dev-ai bot Feb 8, 2026 •

edited

Loading

cubic-dev-ai bot Feb 8, 2026 •

edited

Loading

cubic-dev-ai bot Feb 8, 2026 •

edited

Loading

cubic-dev-ai bot Feb 8, 2026 •

edited

Loading

cubic-dev-ai bot Feb 8, 2026 •

edited

Loading

cubic-dev-ai bot Feb 8, 2026 •

edited

Loading