Trinity-Mini-DrugProt-Think
RLVR (GRPO) + LoRA post-training on Arcee Trinity Mini for DrugProt relation classification.
📝 Report |
AWS deployment guide
|
Model
This repo contains two distinct tracks:
- Experiments: RL training configs and artifacts for DrugProt.
- Serving: Deploy a Hugging Face base model + LoRA adapter to a SageMaker real-time endpoint (SageMaker SDK v3).
uv(Python project + tool runner)- SageMaker Python SDK v3 (
ModelBuilder,InferenceSpec; seev3-examples/) - Prime Intellect (Prime CLI + RL runs; docs: https://docs.primeintellect.ai/)
- Weights & Biases (experiment tracking)
- Hugging Face PEFT (LoRA)
experiments/configs/rl/: RL experiment configs (*.toml).index.html: blog post (static HTML + Chart.js, serves from repo root).data/: exported metrics CSVs used by the blog post charts.experiments/reports/: supplementary writeups (deployment guide, etc.).serving/lora_inference/: SageMaker v3InferenceSpec+ container requirements.serving/scripts/: deploy/test/delete endpoint scripts.
- Prime CLI installed (
prime --version), e.g.uv tool install prime. - Logged in (
prime login). - W&B API key available.
Configs use env_files = ["secrets.env"], so put the secrets file next to the configs:
experiments/configs/rl/secrets.env(gitignored)
WANDB_API_KEY=your_key_here- Baseline config:
experiments/configs/rl/w1_alpha16_baseline.toml
prime rl run experiments/configs/rl/w1_alpha16_baseline.tomlprime rl progress <run_id>
prime rl logs <run_id> -f
prime rl metrics <run_id>
prime rl distributions <run_id> --type rewards
prime rl rollouts <run_id> --step <step> -n 50Deploy a Hugging Face base model with a LoRA adapter to a SageMaker real-time endpoint using SageMaker SDK v3 (ModelBuilder + InferenceSpec).
serving/lora_inference/spec.py: loads base model + LoRA adapter and runs generation.serving/lora_inference/requirements.txt: container-time inference dependencies.serving/scripts/deploy_lora_endpoint.py: deploy/update endpoint.serving/scripts/test_lora_endpoint.py: invoke endpoint.serving/scripts/delete_lora_endpoint.py: delete endpoint.
- AWS credentials configured locally (
~/.aws/credentialsor env vars). - SageMaker execution role ARN with model/endpoint permissions.
- Optional (private Hugging Face repos):
HF_TOKENenv var.
uv syncexport SAGEMAKER_ROLE_ARN="arn:aws:iam::<account-id>:role/<sagemaker-role>"
export HF_TOKEN="hf_xxx" # optional for private repos
uv run python serving/scripts/deploy_lora_endpoint.py \
--endpoint-name trinity-mini-drugprot-think \
--adapter-id lokahq/Trinity-Mini-DrugProt-Think \
--role-arn "$SAGEMAKER_ROLE_ARN" \
--instance-type ml.g5.2xlargeuv run python serving/scripts/deploy_lora_endpoint.py \
--endpoint-name trinity-mini-drugprot-think-local \
--adapter-id ./adapter \
--role-arn "$SAGEMAKER_ROLE_ARN" \
--instance-type ml.g5.2xlargeIf ./adapter/adapter_config.json exists, the server will resolve the base model from the adapter metadata. You can always override with --base-model-id <hf-model-id>.
uv run python serving/scripts/deploy_lora_endpoint.py \
--endpoint-name trinity-mini-drugprot-think \
--adapter-id lokahq/Trinity-Mini-DrugProt-Think \
--role-arn "$SAGEMAKER_ROLE_ARN" \
--update-endpointuv run python serving/scripts/test_lora_endpoint.py \
--endpoint-name trinity-mini-drugprot-think \
--prompt "Give me one practical use-case of LoRA adapters in production." \
--max-new-tokens 120 \
--temperature 0.7 \
--top-p 0.95Request JSON:
{
"inputs": "string prompt",
"max_new_tokens": 256,
"temperature": 0.7,
"top_p": 0.95,
"do_sample": true
}Response JSON:
{
"generated_text": "...",
"full_text": "...",
"model_id": "<hf-base-model-id>",
"adapter_id": "<adapter path or repo id>",
"model_name": "Trinity-Mini-DrugProt-Think"
}- Model: Arcee AI (with Prime Intellect and Datalogy) for releasing the Trinity family.
- Training: Prime Intellect for hosted training infrastructure.
- Environment: OpenMed for DrugProt dataset packaging.
- Deployment: AWS for deployment and hosting.
