Skip to content

Using auxiliary models in experience pipeline & OpenAI API supports stream mode#513

Merged
pan-x-c merged 12 commits intoagentscope-ai:mainfrom
pan-x-c:feature/exp_pipeline_auxiliary_model
Mar 3, 2026
Merged

Using auxiliary models in experience pipeline & OpenAI API supports stream mode#513
pan-x-c merged 12 commits intoagentscope-ai:mainfrom
pan-x-c:feature/exp_pipeline_auxiliary_model

Conversation

@pan-x-c
Copy link
Collaborator

@pan-x-c pan-x-c commented Mar 2, 2026

Description

As the title says

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has passed all tests
  • Docstrings have been added/updated in Google Style
  • Documentation has been updated
  • Code is ready for review

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR wires auxiliary (judge) models into the experience processing pipeline by making operators async-capable and providing them access to auxiliary model OpenAI clients, while also refactoring ModelWrapper to infer engine type from the underlying model actor.

Changes:

  • Add an async operator interface (ExperienceOperatorV1) and update ExperiencePipeline to prepare operators asynchronously and await operator processing/cleanup.
  • Add auxiliary model wrapper discovery (get_auxiliary_model_wrappers) and pass auxiliary model OpenAI clients into experience operators.
  • Make ModelWrapper fetch engine type from the model actor (get_engine_type) and remove config-passed engine_type.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
trinity/explorer/workflow_runner.py Stop passing engine_type into ModelWrapper; rely on model-reported engine type.
trinity/explorer/explorer.py Reorder preparation so models are prepared before the experience pipeline; add node-affinity comment for the pipeline actor.
trinity/common/models/vllm_model.py Implement get_engine_type() for vLLM-backed inference models.
trinity/common/models/tinker_model.py Implement get_engine_type() for Tinker-backed inference models.
trinity/common/models/model.py Add InferenceModel.get_engine_type() abstractmethod and fetch it in ModelWrapper.prepare().
trinity/common/models/init.py Rename auxiliary actor names to optionally include config name; add get_auxiliary_model_wrappers() helper.
trinity/common/config.py Make DataProcessorConfig.experience_pipeline non-optional with a default factory.
trinity/buffer/pipelines/experience_pipeline.py Defer operator creation to prepare(), inject auxiliary model clients, and make operator execution/close async.
trinity/buffer/operators/experience_operator.py Introduce ExperienceOperatorV1 async interface + wrapper for legacy operators; add create_operators() helper.
trinity/buffer/operators/init.py Export the new operator interface and factory.
tests/explorer/workflow_test.py Update ModelWrapper construction after removing engine_type parameter.
tests/explorer/scheduler_test.py Update dummy inference models to implement get_engine_type().
tests/explorer/explorer_test.py Add a test operator that uses auxiliary models via OpenAI async clients; configure an auxiliary model name.
tests/common/vllm_test.py Update ModelWrapper construction after removing engine_type parameter.
tests/buffer/reward_shaping_mapper_test.py Switch to async test and use create_operators() with await op.process(...).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@pan-x-c pan-x-c changed the title Using auxiliary models in experience pipeline Using auxiliary models in experience pipeline & OpenAI API supports stream mode Mar 2, 2026
@pan-x-c
Copy link
Collaborator Author

pan-x-c commented Mar 2, 2026

/unittest-diff

@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
158 157 0 1 0 0 26m 19s

Skipped

Tests Status
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_experience_pipeline 10.7s
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_pass_rate_calculation 6.3s
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_experience_buffer 2.6s
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_0_sft 4.3s
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_1_dpo 4.8s
tests/buffer/file_test.py::TestFileBuffer::test_file_reader 147ms
tests/buffer/file_test.py::TestFileBuffer::test_file_writer 1.5s
tests/buffer/formatter_test.py::TestFormatter::test_dpo_messages_formatter 552ms
tests/buffer/formatter_test.py::TestFormatter::test_dpo_plaintext_formatter 476ms
tests/buffer/formatter_test.py::TestFormatter::test_multi_modal_sft_formatter 841ms
tests/buffer/formatter_test.py::TestFormatter::test_sft_messages_formatter 978ms
tests/buffer/formatter_test.py::TestFormatter::test_sft_plaintext_formatter 725ms
tests/buffer/formatter_test.py::TestFormatter::test_task_formatter 228ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse 6.3s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity 2.1s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_reuse_count_control 4.0s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue 3.3s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue 3.1s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity 3.6s
tests/buffer/reader_test.py::TestBufferReader::test_buffer_reader_registration 760ms
tests/buffer/reward_shaping_mapper_test.py::TestRewardShapingMapper::test_basic_usage 7ms
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_default_sample_strategy 2.0s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_staleness_control_sample_strategy 1.7s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_default_sample_strategy 1.6s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_staleness_control_sample_strategy 1.8s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_sql_staleness_control_sample_strategy 4.3s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_default_sample_strategy 1.8s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_staleness_control_sample_strategy 1.8s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_default_sample_strategy 1.6s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_staleness_control_sample_strategy 1.8s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_sql_staleness_control_sample_strategy 3.2s
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_exp_buffer_read_write_0 5.5s
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_exp_buffer_read_write_1 2.1s
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_task_buffer_read_write 2.6s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_0 71ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_1 57ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_2 90ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_3 89ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_4 89ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_5 92ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_6 106ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_simple 46ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_0_file 272ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_1_sql 2.5s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_2_file 40ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_3_sql 2.5s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_4_file 41ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_5_sql 3.0s
tests/cli/launcher_test.py::TestLauncherMain::test_debug_mode 45.5s
tests/cli/launcher_test.py::TestLauncherMain::test_log_mode 154ms
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_command 5.5s
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_in_dlc 1.2s
tests/cli/launcher_test.py::TestLauncherMain::test_main_studio_command 688ms
tests/cli/launcher_test.py::TestLauncherMain::test_multi_stage_run 13.9s
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 21.6s
tests/common/config_test.py::TestConfig::test_chat_template_path 77ms
tests/common/config_test.py::TestConfig::test_config_flatten 32ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid 161ms
tests/common/config_test.py::TestConfig::test_default_workflow 77ms
tests/common/config_test.py::TestConfig::test_load_default_config 11.4s
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly 78ms
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation 77ms
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster 143ms
tests/common/experience_test.py::TestEID::test_eid_properties 1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type 1ms
tests/common/experience_test.py::TestExperience::test_assertions 1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience 1ms
tests/common/experience_test.py::TestExperience::test_gather 1ms
tests/common/experience_test.py::TestExperience::test_gather_with_token_level_reward 1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion 14ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize 1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_to_dict 1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields 1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion 1ms
tests/common/sudoku_test.py::test_9x9_generator_produces_valid_solution 1ms
tests/common/sudoku_test.py::test_9x9_generator_creates_holes 1ms
tests/common/sudoku_test.py::test_9x9_solution_is_fully_filled 1ms
tests/common/sudoku_test.py::test_judge_allows_incomplete_board 1ms
tests/common/sudoku_test.py::test_judge_detects_row_violation 1ms
tests/common/sudoku_test.py::test_judge_detects_column_violation 1ms
tests/common/sudoku_test.py::test_judge_detects_block_violation 1ms
tests/common/sudoku_test.py::test_4x4_generator_produces_valid_solution 1ms
tests/common/sudoku_test.py::test_4x4_solution_is_fully_filled 1ms
tests/common/sudoku_test.py::test_4x4_judge_detects_row_violation 1ms
tests/common/sudoku_test.py::test_4x4_judge_detects_block_violation 1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate 57.4s
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate 39.6s
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate 38.6s
tests/common/vllm_test.py::TestModelLen_0::test_model_len 32.3s
tests/common/vllm_test.py::TestModelLen_1::test_model_len 27.0s
tests/common/vllm_test.py::TestModelLen_2::test_model_len 28.1s
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len 28.2s
tests/common/vllm_test.py::TestMessageProcess::test_no_prompt_truncation 26.5s
tests/common/vllm_test.py::TestMessageProcess::test_truncation_status 26.6s
tests/common/vllm_test.py::TestAPIServer::test_api 29.1s
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api 26.5s
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async 28.7s
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async ⏭️ 1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask 255ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools 236ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls 32.1s
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls 32.3s
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate 1m 25s
tests/common/vllm_test.py::TestTinkerAPI::test_tinker_api 40.9s
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer 1m 39s
tests/explorer/explorer_test.py::TestExplorerEvalDetailedStats::test_explorer 1m 15s
tests/explorer/explorer_test.py::TestExplorerGSM8KRULERNoEval::test_explorer 1m 1s
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer 3m 1s
tests/explorer/explorer_test.py::ServeTest::test_serve 1m 2s
tests/explorer/proxy_test.py::RecorderTest::test_recorder 85ms
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow 4.6s
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations 4.9s
tests/explorer/scheduler_test.py::SchedulerTest::test_dynamic_timeout 12.7s
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results 28.9s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_0 4.6s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_1 4.7s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_0 4.6s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1 4.6s
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution 5.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow 4.9s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_min_wait 12.6s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods 14.8s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop 9.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks 7.8s
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid 25.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all 7.8s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch 13.5s
tests/explorer/scheduler_test.py::TestRunnerStateCollection::test_runner_state_collection 9.8s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_0 1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_1 1.1s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_0 1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_1 1.0s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error 1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps 1.0s
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow 11ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow 16ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow 127ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow 3ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow 11ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow 7ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_0 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_1 100ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_0 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_1 201ms
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow 23.2s
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow 23.0s
tests/explorer/workflow_test.py::TestWorkflowStateRecording::test_workflow_state_recording 4.0s
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v0 729ms
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v1 14ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner 138ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_get_state 8.1s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_with_openai 26.8s
tests/explorer/workflow_test.py::TestConcurrentWorkflowRunner::test_concurrent_workflow_runner 45.6s

Github Test Reporter by CTRF 💚

@pan-x-c
Copy link
Collaborator Author

pan-x-c commented Mar 2, 2026

/unittest-module-trainer

@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
27 24 0 3 0 0 47m 42s

Skipped

Tests Status
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer 4m 8s
tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer 5m 9s
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer 1m 40s
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer 1m 8s
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer 1m 3s
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer 1m 8s
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer 1m 11s
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer ⏭️ 1ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer 33.3s
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer 32.0s
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools 32.0s
tests/trainer/trainer_test.py::TestFullyAsyncMode_0_fsdp::test_fully_async_mode 1m 38s
tests/trainer/trainer_test.py::TestFullyAsyncMode_1_fsdp::test_fully_async_mode 1m 38s
tests/trainer/trainer_test.py::TestFullyAsyncMode_2_megatron::test_fully_async_mode 2m 28s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_0_fsdp::test_trainer 2m 54s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_1_megatron::test_trainer 5m 53s
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer 1m 57s
tests/trainer/trainer_test.py::TestServeWithTrainer::test_serve_with_trainer 1m 46s
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer 2m 33s
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer 1m 7s
tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer 3m 19s
tests/trainer/trainer_test.py::TestOverRollout::test_trainer 1m 10s
tests/trainer/trainer_test.py::TestTrainerPromptTruncation::test_trainer 45.5s
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer ⏭️ 1ms
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class ⏭️ 1ms
tests/trainer/trainer_test.py::AgentScopeTunerTest::test_agentscope_tuner 1m 18s
tests/trainer/trainer_test.py::ColocateModeTest::test_trainer 1m 59s

Github Test Reporter by CTRF 💚

@pan-x-c
Copy link
Collaborator Author

pan-x-c commented Mar 3, 2026

/gemini review

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@pan-x-c
Copy link
Collaborator Author

pan-x-c commented Mar 3, 2026

/unittest-module-trainer

@chenyushuo
Copy link
Collaborator

/unittest-module-trainer

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
27 24 0 3 0 0 49m 43s

Skipped

Tests Status
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer 4m 2s
tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer 5m 17s
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer 1m 47s
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer 1m 14s
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer 1m 6s
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer 1m 17s
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer 1m 22s
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer ⏭️ 1ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer 39.2s
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer 35.4s
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools 35.0s
tests/trainer/trainer_test.py::TestFullyAsyncMode_0_fsdp::test_fully_async_mode 1m 45s
tests/trainer/trainer_test.py::TestFullyAsyncMode_1_fsdp::test_fully_async_mode 1m 45s
tests/trainer/trainer_test.py::TestFullyAsyncMode_2_megatron::test_fully_async_mode 2m 30s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_0_fsdp::test_trainer 2m 54s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_1_megatron::test_trainer 5m 53s
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer 2m 6s
tests/trainer/trainer_test.py::TestServeWithTrainer::test_serve_with_trainer 1m 54s
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer 2m 41s
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer 1m 6s
tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer 3m 25s
tests/trainer/trainer_test.py::TestOverRollout::test_trainer 1m 14s
tests/trainer/trainer_test.py::TestTrainerPromptTruncation::test_trainer 48.5s
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer ⏭️ 1ms
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class ⏭️ 1ms
tests/trainer/trainer_test.py::AgentScopeTunerTest::test_agentscope_tuner 1m 29s
tests/trainer/trainer_test.py::ColocateModeTest::test_trainer 2m 7s

Github Test Reporter by CTRF 💚

@pan-x-c
Copy link
Collaborator Author

pan-x-c commented Mar 3, 2026

/unittest-module-common

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
55 54 0 1 0 0 12m 33s

Skipped

Tests Status
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 23.0s
tests/common/config_test.py::TestConfig::test_chat_template_path 77ms
tests/common/config_test.py::TestConfig::test_config_flatten 31ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid 161ms
tests/common/config_test.py::TestConfig::test_default_workflow 353ms
tests/common/config_test.py::TestConfig::test_load_default_config 14.4s
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly 79ms
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation 77ms
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster 1.7s
tests/common/experience_test.py::TestEID::test_eid_properties 1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type 1ms
tests/common/experience_test.py::TestExperience::test_assertions 1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience 1ms
tests/common/experience_test.py::TestExperience::test_gather 1ms
tests/common/experience_test.py::TestExperience::test_gather_with_token_level_reward 1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion 14ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize 1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_to_dict 1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields 1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion 1ms
tests/common/sudoku_test.py::test_9x9_generator_produces_valid_solution 1ms
tests/common/sudoku_test.py::test_9x9_generator_creates_holes 1ms
tests/common/sudoku_test.py::test_9x9_solution_is_fully_filled 1ms
tests/common/sudoku_test.py::test_judge_allows_incomplete_board 1ms
tests/common/sudoku_test.py::test_judge_detects_row_violation 1ms
tests/common/sudoku_test.py::test_judge_detects_column_violation 1ms
tests/common/sudoku_test.py::test_judge_detects_block_violation 1ms
tests/common/sudoku_test.py::test_4x4_generator_produces_valid_solution 1ms
tests/common/sudoku_test.py::test_4x4_solution_is_fully_filled 1ms
tests/common/sudoku_test.py::test_4x4_judge_detects_row_violation 1ms
tests/common/sudoku_test.py::test_4x4_judge_detects_block_violation 1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate 1m 1s
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate 44.7s
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate 50.1s
tests/common/vllm_test.py::TestModelLen_0::test_model_len 28.5s
tests/common/vllm_test.py::TestModelLen_1::test_model_len 27.3s
tests/common/vllm_test.py::TestModelLen_2::test_model_len 27.3s
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len 27.3s
tests/common/vllm_test.py::TestMessageProcess::test_no_prompt_truncation 38.0s
tests/common/vllm_test.py::TestMessageProcess::test_truncation_status 27.2s
tests/common/vllm_test.py::TestAPIServer::test_api 30.0s
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api 27.0s
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async 29.7s
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async ⏭️ 1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask 554ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools 829ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls 32.2s
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls 31.7s
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate 3m
tests/common/vllm_test.py::TestTinkerAPI::test_tinker_api 44.3s

Github Test Reporter by CTRF 💚

@pan-x-c pan-x-c merged commit 31f9b79 into agentscope-ai:main Mar 3, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants