microsoft
diff --git a/‎docs/scens/catalog.rst
Lines changed: 1 addition & 0 deletions b/‎docs/scens/catalog.rst
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/scens/finetune.rst
Lines changed: 163 additions & 0 deletions b/‎docs/scens/finetune.rst
Lines changed: 163 additions & 0 deletions
diff --git a/‎rdagent/app/finetune/data_science/conf.py
Lines changed: 40 additions & 0 deletions b/‎rdagent/app/finetune/data_science/conf.py
Lines changed: 40 additions & 0 deletions
diff --git a/‎rdagent/app/finetune/data_science/loop.py
Lines changed: 40 additions & 0 deletions b/‎rdagent/app/finetune/data_science/loop.py
Lines changed: 40 additions & 0 deletions
diff --git a/‎rdagent/app/finetune/data_science/scen.py
Lines changed: 20 additions & 0 deletions b/‎rdagent/app/finetune/data_science/scen.py
Lines changed: 20 additions & 0 deletions
diff --git a/‎rdagent/app/finetune/data_science/tpl/components/coder/data_science/pipeline/prompts.yaml
Lines changed: 4 additions & 0 deletions b/‎rdagent/app/finetune/data_science/tpl/components/coder/data_science/pipeline/prompts.yaml
Lines changed: 4 additions & 0 deletions
diff --git a/‎rdagent/app/finetune/data_science/tpl/scenarios/data_science/proposal/exp_gen/prompts_v2.yaml
Lines changed: 6 additions & 0 deletions b/‎rdagent/app/finetune/data_science/tpl/scenarios/data_science/proposal/exp_gen/prompts_v2.yaml
Lines changed: 6 additions & 0 deletions
diff --git a/‎rdagent/app/finetune/llm/conf.py
Lines changed: 43 additions & 0 deletions b/‎rdagent/app/finetune/llm/conf.py
Lines changed: 43 additions & 0 deletions
diff --git a/‎rdagent/app/finetune/llm/loop.py
Lines changed: 40 additions & 0 deletions b/‎rdagent/app/finetune/llm/loop.py
Lines changed: 40 additions & 0 deletions
diff --git a/‎rdagent/app/finetune/llm/prompts.yaml
Lines changed: 13 additions & 0 deletions b/‎rdagent/app/finetune/llm/prompts.yaml
Lines changed: 13 additions & 0 deletions
@@ -43,3 +43,4 @@ The supported scenarios are listed below:
     model_agent_fin
     model_copilot_general
     data_science
+    finetune
@@ -0,0 +1,163 @@
+.. _finetune_agent:
+
+=============================
+Fine-tuning an Existing Model
+=============================
+
+## **🎯 Scenario: Continue Training on a Pre-trained Model**
+
+In this workflow the **Data Science Agent** starts from a *previously trained* model (and its training script), performs additional fine-tuning on new data, and then re-uses the updated weights for subsequent inference runs.
+
+🚧 Directory Structure
+
+Your competition folder (here called ``custom_data``) must contain **one extra sub-directory** named ``prev_model`` where you keep the old weights and the code that produced them:
+
+.. code-block:: text
+
+   ds_data
+   └── custom_data
+       ├── train.csv
+       ├── test.csv
+       ├── sample_submission.csv      # optional
+       ├── description.md             # optional
+       ├── sample.py                  # optional
+       └── prev_model                 # ← NEW
+           ├── models/                #   previous checkpoints (e.g. *.bin, *.pt, *.ckpt)
+           └── main.py                  #   training/inference scripts you used before
+
+If your competition provides custom grading/validation scripts, keep them under ``ds_data/eval/custom_data`` exactly as before.
+
+🔧 Environment Setup
+~~~~~~~~~~~~~~~~~~~~~~
+
+Add or update the following variables in **.env** (examples shown):
+
+.. code-block:: sh
+
+   # required for all Data-Science runs
+   dotenv set DS_LOCAL_DATA_PATH <your local path>/ds_data
+
+   # optional: choose docker / conda, etc.
+   dotenv set DS_CODER_COSTEER_ENV_TYPE docker
+
+🚀 How It Works at Runtime
+
+1. **First run**
+
+   * `rdagent` detects `prev_model/models`.
+   * It loads the latest checkpoint and prepare the fine-tuning based on code found under `prev_model/*.py` (or your own pipeline if you override it).
+   * Fine-tuned weights are written to `./workspace_input/models`.
+
+2. **Subsequent runs**
+
+   * When you execute `python ./workspace_input/main.py`, the script first looks for a checkpoint in `./workspace_input/models`.
+   * If found, it **skips fine-tuning** and goes straight to prediction / submission generation.
+
+⏰ Managing Timeouts
+
+
+By default:
+
+* **Debug loop**: 1 hour (``DS_DEBUG_TIMEOUT=3600`` seconds)  
+* **Full run**  : 3 hours (``DS_FULL_TIMEOUT=10800`` seconds)
+
+Override either value in **.env**:
+
+.. code-block:: sh
+
+   # give the debug loop 45 min and the full loop 6 h
+   dotenv set DS_DEBUG_TIMEOUT 2700
+   dotenv set DS_FULL_TIMEOUT 21600
+
+- 🚀 **Run the Application**
+
+  - You can directly run the application by using the following command:
+    
+    .. code-block:: sh
+
+        dotenv run -- python rdagent/app/finetune/data_science/loop.py --competition <Competition ID>
+
+  - Then, you can run the test set score corresponding to each round of the loop.
+
+    .. code-block:: sh
+
+        dotenv run -- python rdagent/log/mle_summary.py grade <url_to_log>
+
+    Here, <url_to_log> refers to the parent directory of the log folder generated during the run.
+
+- 📥 **Visualize the R&D Process**
+
+  - We provide a web UI to visualize the log. You just need to run:
+
+    .. code-block:: sh
+
+        streamlit run rdagent/log/ui/dsapp.py
+
+  - Then you can input the log path and visualize the R&D process.
+
+🔍 MLE-bench Guide: Running ML Engineering via MLE-bench
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- 📝 **MLE-bench Overview**
+
+  - MLE-bench is a comprehensive benchmark designed to evaluate the ML engineering capabilities of AI systems using real-world scenarios. The dataset comprises 75 Kaggle competitions. Since Kaggle does not provide held-out test sets for these competitions, the benchmark includes preparation scripts that split the publicly available training data into new training and test sets, and grading scripts are provided for each competition to accurately evaluate submission scores.
+
+- 🔧 **Set up Environment for MLE-bench**
+
+  - Running R&D-Agent on MLE-bench is designed for full automation. There is no need for manual downloads and data preparation. Simply set the environment variable ``DS_IF_USING_MLE_DATA`` to True.  
+
+  - At runtime, R&D-Agent will automatically build the Docker image specified at ``rdagent/scenarios/kaggle/docker/mle_bench_docker/Dockerfile``. This image is responsible for downloading the required datasets and grading files for MLE-bench.  
+  
+  - Note: The first run may take longer than subsequent runs as the Docker image and data are being downloaded and set up for the first time.
+
+    .. code-block:: sh
+
+        dotenv set DS_LOCAL_DATA_PATH <your local directory>/ds_data
+        dotenv set DS_IF_USING_MLE_DATA True
+
+- 🔨 **Configuring the Kaggle API**
+
+  - Downloading Kaggle competition data requires the Kaggle API. You can set up the Kaggle API by following these steps:
+  
+    - Register and login on the `Kaggle <https://www.kaggle.com/>`_ website.
+
+    - Click on the avatar (usually in the top right corner of the page) -> ``Settings`` -> ``Create New Token``, A file called ``kaggle.json`` will be downloaded.
+
+    - Move ``kaggle.json`` to ``~/.config/kaggle/``
+
+    - Modify the permissions of the ``kaggle.json`` file.
+
+      .. code-block:: sh
+
+        chmod 600 ~/.config/kaggle/kaggle.json
+
+  - For more information about Kaggle API Settings, refer to the `Kaggle API <https://github.com/Kaggle/kaggle-api>`_.
+
+
+- 🔩 **Setting the Environment Variables for MLE-bench**
+
+  - In addition to auto-downloading the benchmark data, you must also configure the runtime environment for executing the competition code.  
+  - Use the environment variable ``DS_CODER_COSTEER_ENV_TYPE`` to select the execution mode:
+    
+    • When set to docker (the default), RD-Agent utilizes the official Kaggle Docker image (``gcr.io/kaggle-gpu-images/python:latest``) to ensure that all required packages are available.  
+    • If you prefer to use a custom Docker setup, you can modify the configuration using ``DS_DOCKER_IMAGE`` or ``DS_DOCKERFILE_FOLDER_PATH``.  
+    • Alternatively, if your competition work only demands basic libraries, you may set ``DS_CODER_COSTEER_ENV_TYPE`` to conda. In this mode, you must create a local conda environment named “kaggle” and pre-install the necessary packages. RD-Agent will execute the competition code within this “kaggle” conda environment.
+
+    .. code-block:: sh
+
+      # Configure the runtime environment: choice between 'docker' (default) or 'conda'
+      dotenv set DS_CODER_COSTEER_ENV_TYPE docker
+
+- **Additional Guidance**
+
+  - **Combine different LLM Models at R&D Stage**
+
+    - You can combine different LLM models at the R&D stage. 
+
+    - By default, when you set environment variable ``CHAT_MODEL``, it covers both R&D stages. When customizing the model for the development stage, you can set:
+    
+    .. code-block:: sh
+
+      # This example sets the model to "o3-mini". For some models, the reasoning effort shoule be set to "None".
+      dotenv set LITELLM_CHAT_MODEL_MAP '{"coding":{"model":"o3-mini","reasoning_effort":"high"},"running":{"model":"o3-mini","reasoning_effort":"high"}}'
+
@@ -0,0 +1,40 @@
+import os
+
+from pydantic_settings import SettingsConfigDict
+
+from rdagent.app.data_science.conf import DS_RD_SETTING
+from rdagent.core.conf import RD_AGENT_SETTINGS, ExtendedBaseSettings
+
+
+class DSFinetuneScen(ExtendedBaseSettings):
+    model_config = SettingsConfigDict(env_prefix="FT_", protected_namespaces=())
+    scen: str = "rdagent.app.finetune.data_science.scen.DSFinetuneScen"
+    """
+    Scenario class for data science tasks.
+    - For Kaggle competitions, use: "rdagent.scenarios.data_science.scen.KaggleScen"
+    - For custom data science scenarios, use: "rdagent.scenarios.data_science.scen.DataScienceScen"
+    - For LLM finetune scenarios, use: "rdagent.app.finetune.llm.scen.LLMFinetuneScen"
+    - For Data science finetune scenarios, use: "rdagent.app.finetune.data_science.scen.DSFinetuneScen"
+    """
+
+    debug_timeout: int = 3600
+    """The timeout limit for running on debugging data"""
+    full_timeout: int = 10800
+    """The timeout limit for running on full data"""
+
+    coder_on_whole_pipeline: bool = True
+    enable_model_dump: bool = True
+    app_tpl: str = "app/finetune/data_science/tpl"
+
+
+def update_settings(competition: str):
+    """
+    Update the RD_AGENT_SETTINGS with the values from DS_FINETUNE_SETTINGS.
+    """
+    DS_FINETUNE_SETTINGS = DSFinetuneScen()
+    RD_AGENT_SETTINGS.app_tpl = DS_FINETUNE_SETTINGS.app_tpl
+    os.environ["DS_CODER_COSTEER_EXTRA_EVALUATOR"] = '["rdagent.app.finetune.share.eval.PrevModelLoadEvaluator"]'
+    for field_name, new_value in DS_FINETUNE_SETTINGS.model_dump().items():
+        if hasattr(DS_RD_SETTING, field_name):
+            setattr(DS_RD_SETTING, field_name, new_value)
+    DS_RD_SETTING.competition = competition
@@ -0,0 +1,40 @@
+import asyncio
+from pathlib import Path
+
+import fire
+
+from rdagent.app.data_science.conf import DS_RD_SETTING
+from rdagent.app.finetune.data_science.conf import update_settings
+from rdagent.core.utils import import_class
+from rdagent.log import rdagent_logger as logger
+from rdagent.scenarios.data_science.loop import DataScienceRDLoop
+
+
+def main(
+    model: str | None = None,
+    competition: str | None = None,
+):
+    """
+    Parameters
+    ----------
+    competition :
+        Competition name.
+
+    Auto R&D Evolving loop for models finetune.
+    You can continue running a session by using the command:
+    .. code-block:: bash
+        dotenv run -- python rdagent/app/finetune/data_science/loop.py --competition aerial-cactus-identification
+    """
+    if not competition:
+        raise Exception("Please specify competition name.")
+
+    model_folder = Path(DS_RD_SETTING.local_data_path) / competition / "prev_model"
+    if not model_folder.exists():
+        raise Exception(f"Please put the model path to {model_folder}.")
+    update_settings(competition)
+    rd_loop: DataScienceRDLoop = DataScienceRDLoop(DS_RD_SETTING)
+    asyncio.run(rd_loop.run())
+
+
+if __name__ == "__main__":
+    fire.Fire(main)
@@ -0,0 +1,20 @@
+from pathlib import Path
+
+from rdagent.app.data_science.conf import DS_RD_SETTING
+from rdagent.core.scenario import Scenario
+from rdagent.log import rdagent_logger as logger
+from rdagent.scenarios.data_science.scen import DataScienceScen
+from rdagent.scenarios.data_science.scen.utils import describe_data_folder_v2
+from rdagent.utils.agent.tpl import T
+
+
+class DSFinetuneScen(DataScienceScen):
+    """DSFinetuneScen Scenario"""
+
+    def _get_data_folder_description(self) -> str:
+        folder_desc = describe_data_folder_v2(
+            Path(DS_RD_SETTING.local_data_path) / self.competition,
+            show_nan_columns=DS_RD_SETTING.show_nan_columns,
+            max_length=20000,  # more context for model script
+        )
+        return folder_desc
@@ -0,0 +1,4 @@
+pipeline_coder:
+  system: |-
+    {% include "rdagent.components.coder.data_science.pipeline.prompts:pipeline_coder.system" %}
+    NOTE: Ensure that base model form `{% include "scenarios.data_science.share:scen.input_path" %}prev_model` is correctly loaded, you are supposed to finetune the base model. 
@@ -0,0 +1,6 @@
+task_gen:
+  system: |-
+    {% include "rdagent.scenarios.data_science.proposal.exp_gen.prompts_v2:task_gen.system" %}
+    NOTE: You MUST load base model form `{% include "scenarios.data_science.share:scen.input_path" %}prev_model`. Your main goal is to finetune it. 
+
+  
@@ -0,0 +1,43 @@
+import os
+
+from pydantic_settings import SettingsConfigDict
+
+from rdagent.app.data_science.conf import DS_RD_SETTING
+from rdagent.core.conf import RD_AGENT_SETTINGS, ExtendedBaseSettings
+
+
+class LLMFinetuneScen(ExtendedBaseSettings):
+    model_config = SettingsConfigDict(env_prefix="FT_", protected_namespaces=())
+    scen: str = "rdagent.app.finetune.llm.scen.LLMFinetuneScen"
+    """
+    Scenario class for data science tasks.
+    - For Kaggle competitions, use: "rdagent.scenarios.data_science.scen.KaggleScen"
+    - For custom data science scenarios, use: "rdagent.scenarios.data_science.scen.DataScienceScen"
+    - For LLM finetune scenarios, use: "rdagent.app.finetune.llm.scen.LLMFinetuneScen"
+    - For Data science finetune scenarios, use: "rdagent.app.finetune.data_science.scen.DSFinetuneScen"
+    """
+
+    hypothesis_gen: str = "rdagent.app.finetune.llm.proposal.FinetuneExpGen"
+    """Hypothesis generation class"""
+
+    debug_timeout: int = 36000
+    """The timeout limit for running on debugging data"""
+    full_timeout: int = 360000
+    """The timeout limit for running on full data"""
+
+    coder_on_whole_pipeline: bool = True
+    enable_model_dump: bool = True
+    app_tpl: str = "app/finetune/llm/tpl"
+
+
+def update_settings(competition: str):
+    """
+    Update the RD_AGENT_SETTINGS with the values from LLM_FINETUNE_SETTINGS.
+    """
+    LLM_FINETUNE_SETTINGS = LLMFinetuneScen()
+    RD_AGENT_SETTINGS.app_tpl = LLM_FINETUNE_SETTINGS.app_tpl
+    os.environ["DS_CODER_COSTEER_EXTRA_EVALUATOR"] = '["rdagent.app.finetune.share.eval.PrevModelLoadEvaluator"]'
+    for field_name, new_value in LLM_FINETUNE_SETTINGS.model_dump().items():
+        if hasattr(DS_RD_SETTING, field_name):
+            setattr(DS_RD_SETTING, field_name, new_value)
+    DS_RD_SETTING.competition = competition
@@ -0,0 +1,40 @@
+import asyncio
+from pathlib import Path
+
+import fire
+
+from rdagent.app.data_science.conf import DS_RD_SETTING
+from rdagent.app.finetune.llm.conf import update_settings
+from rdagent.core.utils import import_class
+from rdagent.log import rdagent_logger as logger
+from rdagent.scenarios.data_science.loop import DataScienceRDLoop
+
+
+def main(
+    model: str | None = None,
+    dataset: str | None = None,
+):
+    """
+    Parameters
+    ----------
+    dataset :
+        Dateset name, used for finetune.
+
+    Auto R&D Evolving loop for models finetune.
+    You can continue running a session by using the command:
+    .. code-block:: bash
+        dotenv run -- python rdagent/app/finetune/llm/loop.py --dataset shibing624/alpaca-zh
+    """
+    if not dataset:
+        raise Exception("Please specify dataset name.")
+
+    model_folder = Path(DS_RD_SETTING.local_data_path) / dataset / "prev_model"
+    if not model_folder.exists():
+        raise Exception(f"Please put the model path to {model_folder}.")
+    update_settings(dataset)
+    rd_loop: DataScienceRDLoop = DataScienceRDLoop(DS_RD_SETTING)
+    asyncio.run(rd_loop.run())
+
+
+if __name__ == "__main__":
+    fire.Fire(main)
@@ -0,0 +1,13 @@
+scenario_description: |-
+  ------Background of the scenario------
+  You are a world-class machine learning engineer. Your task is to finetune a model on the given dataset using QLoRA method.
+  ------Dataset Description------
+  {{ raw_description }}
+
+competition_background: |-
+  ## QLoRA Fine-Tuning
+  You are a world-class machine learning engineer and prompt engineer specializing in parameter-efficient fine-tuning of large language models using **QLoRA**. Your expertise includes 4-bit quantization, low-rank adaptation, and maximizing performance on GPU clusters. You are committed to building accurate, resource-efficient, and robust LLMs.
+
+  - **Fine-Tuning Method**: QLoRA (4-bit quantized LoRA)  
+  - **Training Dataset**:  
+    > {{ raw_description }}