Skip to content

Tunix v0.1.6 — Agentic RL & VLM

Latest

Choose a tag to compare

@jiangyangmu jiangyangmu released this 13 Mar 22:57

Highlights

from tunix import AgenticGRPOConfig
from tunix import AgenticGRPOLearner

agentic_grpo_config = AgenticGRPOConfig(
    num_generations=NUM_GENERATIONS,
    num_iterations=NUM_ITERATIONS,
    max_response_length=MAX_RESPONSE_LENGTH,
    beta=BETA,
    epsilon=EPSILON,
    system_prompt=SWE_SYSTEM_PROMPT,
    max_concurrency=1,
    epsilon_high=0.28,
    off_policy_steps=0,
)

agentic_grpo_learner = AgenticGRPOLearner(
    rl_cluster=rl_cluster,
    reward_fns=reward_fns,
    agent_class=MyAgentClass,
    agent_kwargs={},
    env_class=MyEnv,
    env_kwargs={"max_steps": MAX_STEPS},
    algo_config=agentic_grpo_config,
    chat_parser=chat_parser,
)

agentic_grpo_learner.train(train_dataset=train_dataset)

What's Changed

New Contributors

Full Changelog: v0.1.5...v0.1.6