Skip to content

Commit b0cc474

Browse files
authored
Add moirai agent replication scripts (#244)
1 parent 8d2a08d commit b0cc474

35 files changed

+4849
-0
lines changed

project/moirai-agent/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
This folder temporarily hosts the scripts to replicate early Moirai-agent results on contextual and non-contextual forecasting tasks.
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
## GIFT-CTX Replication
2+
This repo contains instructions to replicate the results of Moirai Agent on the GIFT-CTX benchmark.
3+
4+
### Set up environment
5+
First, set up the environment by running:
6+
```
7+
pip install -r requirement.txt
8+
```
9+
And export your OpenAI key by:
10+
```
11+
export OPENAI_API_KEY="..."
12+
```
13+
14+
### Prepare the dataset
15+
- Download the `gift_ctx.parquet` data here: `https://huggingface.co/datasets/Salesforce/GIFT-CTX`
16+
- Plot the historical data before runtime:
17+
```
18+
python gen_image.py --in_file gift_ctx.parquet --out_file gift_ctx_image.parquet --img_root img
19+
```
20+
Change the parameters to your project setting:
21+
```
22+
--in_file: path to the original GIFT-CTX dataset
23+
--out_file: path to the new dataset with image path added
24+
--img_root: path to the image folder
25+
```
26+
Note that Moirai Agent can work fine with just text input. To maximize its performance, we recommend to also provide the plot of historical data, which can be rendered beforehand to reduce runtime.
27+
### Run Moirai Agent
28+
To run Moirai Agent, first start the tools with:
29+
```
30+
bash run_tools.sh
31+
```
32+
For now, we provide agent with 2 tools: a python sandbox, and a timeseries foundation model.
33+
34+
To evaluate Moirai Agent on the GIFT-CTX benchmark with the default setting and replicate the reported results, run:
35+
```
36+
bash run.sh
37+
```
38+
Note that the results reported in the blog post were obtained with GPT-5.1 medium reasoning effort on Jan 07 2026. Please expect some minor differences with the official reported numbers because of the non-deterministic nature of LLMs. We provided our log files in `./results`.
39+
40+
To change the input path file, config file, output dir, input mode (text only, text with image), and parallelism, run with arguments:
41+
```
42+
bash run.sh [your_parquet_path] [your_config_path] [output_dir] [input_mode] [#jobs]
43+
```
44+
45+
There are several configurations you can change in Moirai Agent such as LLM and its parameters, tools, and system prompt, which you can adjust in `src/ctx_forecast/config.py`.
46+
47+
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
import argparse
2+
import glob
3+
import os
4+
5+
import pandas as pd
6+
from src.utils.image import plot_historical_and_save
7+
from tqdm import tqdm
8+
9+
10+
def main(in_file, out_file, img_root):
11+
df = pd.read_parquet(in_file)
12+
hist_values = df["history_values"].tolist()
13+
img_paths = []
14+
for i, v in enumerate(hist_values):
15+
v = [float(x) for x in v.split(",")]
16+
17+
start = df["history_start"].iloc[i]
18+
19+
freq = df["frequency"].iloc[i]
20+
timestamps = pd.date_range(start=start, periods=len(v), freq=freq)
21+
ts_list = timestamps.strftime("%Y-%m-%d %H:%M:%S").tolist()
22+
23+
item = {"Date": ts_list, "Value": v}
24+
25+
img_path = os.path.join(img_root, f"image_{i}.jpg")
26+
img_paths.append(img_path)
27+
plot_historical_and_save(item, img_path)
28+
df["image_path"] = img_paths
29+
30+
df.to_parquet(out_file)
31+
return
32+
33+
34+
if __name__ == "__main__":
35+
parser = argparse.ArgumentParser(description="A simple script to plot time series")
36+
parser.add_argument("--in_file", default="./gift_ctx.parquet", required=True)
37+
parser.add_argument("--out_file", default="./gift_ctx_image.parquet", required=True)
38+
parser.add_argument("--img_root", default="./images/", required=True)
39+
40+
args = parser.parse_args()
41+
main(args.in_file, args.out_file, args.img_root)
Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
accelerate==1.10.1
2+
aiohappyeyeballs==2.6.1
3+
aiohttp==3.13.0
4+
aiosignal==1.4.0
5+
annotated-types==0.7.0
6+
anyio==4.11.0
7+
argon2-cffi==25.1.0
8+
argon2-cffi-bindings==25.1.0
9+
arrow==1.3.0
10+
asttokens==3.0.0
11+
async-lru==2.0.5
12+
attrs==25.4.0
13+
babel==2.17.0
14+
beautifulsoup4==4.14.2
15+
bleach==6.2.0
16+
blinker==1.9.0
17+
boto3==1.42.16
18+
botocore==1.42.16
19+
certifi==2025.10.5
20+
cffi==2.0.0
21+
charset-normalizer==3.4.3
22+
chronos-forecasting==2.2.2
23+
click==8.3.0
24+
comm==0.2.3
25+
contourpy==1.3.3
26+
cycler==0.12.1
27+
datasets==4.1.1
28+
debugpy==1.8.17
29+
decorator==5.2.1
30+
defusedxml==0.7.1
31+
dill==0.4.0
32+
distro==1.9.0
33+
einops==0.8.1
34+
executing==2.2.1
35+
fastjsonschema==2.21.2
36+
filelock==3.19.1
37+
Flask==3.1.2
38+
fonttools==4.60.1
39+
fqdn==1.5.1
40+
frozenlist==1.8.0
41+
fsspec==2025.9.0
42+
gluonts==0.16.2
43+
h11==0.16.0
44+
hf-xet==1.1.10
45+
httpcore==1.0.9
46+
httpx==0.28.1
47+
httpx-sse==0.4.2
48+
huggingface-hub==0.35.3
49+
idna==3.10
50+
inquirerpy==0.3.4
51+
ipdb==0.13.13
52+
ipykernel==6.30.1
53+
ipython==9.6.0
54+
ipython_pygments_lexers==1.1.1
55+
ipywidgets==8.1.7
56+
isoduration==20.11.0
57+
itsdangerous==2.2.0
58+
jaxtyping==0.3.3
59+
jedi==0.19.2
60+
Jinja2==3.1.6
61+
jiter==0.11.0
62+
jmespath==1.0.1
63+
joblib==1.5.3
64+
json5==0.12.1
65+
jsonpointer==3.0.0
66+
jsonschema==4.25.1
67+
jsonschema-specifications==2025.9.1
68+
jupyter-events==0.12.0
69+
jupyter-lsp==2.3.0
70+
jupyter_client==8.6.3
71+
jupyter_core==5.8.1
72+
jupyter_server==2.17.0
73+
jupyter_server_terminals==0.5.3
74+
jupyterlab==4.4.9
75+
jupyterlab_pygments==0.3.0
76+
jupyterlab_server==2.27.3
77+
jupyterlab_widgets==3.0.15
78+
jupyterthemes==0.20.0
79+
kiwisolver==1.4.9
80+
lark==1.3.0
81+
lesscpy==0.15.1
82+
lightning==2.5.5
83+
lightning-utilities==0.15.2
84+
MarkupSafe==3.0.3
85+
matplotlib==3.10.6
86+
matplotlib-inline==0.1.7
87+
mcp==1.16.0
88+
mistune==3.1.4
89+
mpmath==1.3.0
90+
multidict==6.7.0
91+
multiprocess==0.70.16
92+
nbclient==0.10.2
93+
nbconvert==7.16.6
94+
nbformat==5.10.4
95+
nest-asyncio==1.6.0
96+
networkx==3.5
97+
notebook==7.4.7
98+
notebook_shim==0.2.4
99+
numpy==2.1.3
100+
openai==2.2.0
101+
orjson==3.11.3
102+
packaging==25.0
103+
pandas==2.3.3
104+
pandocfilters==1.5.1
105+
parso==0.8.5
106+
pexpect==4.9.0
107+
pfzy==0.3.4
108+
pillow==11.3.0
109+
platformdirs==4.4.0
110+
ply==3.11
111+
prometheus_client==0.23.1
112+
prompt_toolkit==3.0.52
113+
propcache==0.4.0
114+
psutil==7.1.0
115+
ptyprocess==0.7.0
116+
pure_eval==0.2.3
117+
pyarrow==21.0.0
118+
pycparser==2.23
119+
pydantic==2.11.10
120+
pydantic-settings==2.11.0
121+
pydantic_core==2.33.2
122+
Pygments==2.19.2
123+
pyparsing==3.2.5
124+
python-dateutil==2.9.0.post0
125+
python-dotenv==1.1.1
126+
python-json-logger==4.0.0
127+
python-multipart==0.0.20
128+
pytorch-lightning==2.5.5
129+
pytz==2025.2
130+
PyYAML==6.0.3
131+
pyzmq==27.1.0
132+
referencing==0.36.2
133+
regex==2025.9.18
134+
requests==2.32.5
135+
rfc3339-validator==0.1.4
136+
rfc3986-validator==0.1.1
137+
rfc3987-syntax==1.1.0
138+
rpds-py==0.27.1
139+
s3transfer==0.16.0
140+
safetensors==0.6.2
141+
scikit-learn==1.8.0
142+
scipy==1.16.2
143+
Send2Trash==1.8.3
144+
setuptools==80.9.0
145+
six==1.17.0
146+
sniffio==1.3.1
147+
soupsieve==2.8
148+
sse-starlette==3.0.2
149+
stack-data==0.6.3
150+
starlette==0.48.0
151+
sympy==1.14.0
152+
terminado==0.18.1
153+
threadpoolctl==3.6.0
154+
tinycss2==1.4.0
155+
tokenizers==0.22.1
156+
toolz==0.12.1
157+
torch==2.8.0
158+
torchmetrics==1.8.2
159+
tornado==6.5.2
160+
tqdm==4.67.1
161+
traitlets==5.14.3
162+
transformers==4.57.0
163+
triton==3.4.0
164+
types-python-dateutil==2.9.0.20251008
165+
typing-inspection==0.4.2
166+
typing_extensions==4.15.0
167+
tzdata==2025.2
168+
uri-template==1.3.0
169+
urllib3==2.5.0
170+
uvicorn==0.37.0
171+
wadler_lindig==0.1.7
172+
wcwidth==0.2.14
173+
webcolors==24.11.1
174+
webencodings==0.5.1
175+
websocket-client==1.9.0
176+
Werkzeug==3.1.3
177+
widgetsnbextension==4.0.14
178+
xxhash==3.6.0
179+
yarl==1.22.0
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
#!/bin/bash
2+
set -e
3+
4+
# --------------------------------------------------
5+
# Positional args with defaults
6+
# --------------------------------------------------
7+
TASK_FILE=${1:-"gift_ctx_images.parquet"}
8+
CONFIG_FILE=${2:-"src/ctx_forecast/config.py"}
9+
OUTPUT_DIR=${3:-"results/moirai_agent"}
10+
INPUT_MODE=${4:-"text+image"}
11+
JOBS=${5:-$(nproc 2>/dev/null || echo 8)}
12+
NUM_PARTS=${6:-245}
13+
14+
# --------------------------------------------------
15+
# Run all parts in parallel
16+
# --------------------------------------------------
17+
seq 0 $((NUM_PARTS - 1)) | parallel -j "$JOBS" \
18+
python -m src.ctx_forecast.tsf_agent \
19+
--config_file "$CONFIG_FILE" \
20+
--config_name CONFIG \
21+
--input_mode "$INPUT_MODE" \
22+
--task_file "$TASK_FILE" \
23+
--part_idx {} \
24+
--num_parts "$NUM_PARTS" \
25+
--output_dir "$OUTPUT_DIR"
26+
27+
echo "Finish all parts"
28+
echo "Calculating metrics"
29+
30+
# --------------------------------------------------
31+
# Gather metrics
32+
# --------------------------------------------------
33+
python -m src.ctx_forecast.metrics_gather \
34+
--results_dir "${OUTPUT_DIR}/${INPUT_MODE}/results"
35+
36+
echo "Done! Results are available at ${OUTPUT_DIR}/${INPUT_MODE}"
37+
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
#!/bin/bash
2+
3+
# build and launch python-sandbox docker
4+
5+
bash src/tools/python_sandbox/start_background.sh
6+
7+
8+
# launch time series forecasting tools
9+
CUDA_VISIBLE_DEVICES=7 python3 -m src.tools.tsf_services --model_name chronos_v2 &
10+
# CUDA_VISIBLE_DEVICES=0 .venv/bin/python3 -m src.tools.tsf_services --model_name moirai &
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
max_iterations = 6
2+
CONFIG = {
3+
"llm": {
4+
"model_name": "gpt-5.1",
5+
"model_params_type": {
6+
"temperature": 1.0,
7+
"top_p": 1.0,
8+
"reasoning": {"effort": "medium", "summary": "auto"},
9+
"max_output_tokens": 8192,
10+
},
11+
},
12+
"servers": {
13+
"forecast": {
14+
"command": "python",
15+
"args": ["-m", "src.ctx_forecast.mcp_server"],
16+
"enabled_tools": ["chronos_forecast_service", "python_sandbox_service"],
17+
},
18+
},
19+
"max_iterations": max_iterations,
20+
"system_prompt": "You are an intelligent assistant that can solve complex problems by thinking step-by-step and using available tools when needed. "
21+
"\n## Your Task "
22+
"\n- Solve a contextual time-series forecasting problem, where historical values and contextual information are provided. "
23+
"\n- The future values depends on both of the observed values and the contextual information. "
24+
"\n- Contextual information may specify or imply unexpected effects in the future that will affect the normal results reasoned from history values. Then, you should combine the forecasting tool's results with the contextual information to reason the future."
25+
"\n- Contextual information may specify or imply occasional or abnormal factors in the history that will not persist in the future. Then, identify and eliminate the misleading factors from the history, the future should be reasonsed from modified history values and relevant knowledge. "
26+
"\n- Contextual information may totally dominate the future, when the history values are less informative. Then, discover the underlying correlation or math structure defined by context and history values, use it for prediction making. "
27+
"\n- You should figure out the real cause of the future first, then make predictions accordingly. "
28+
"\n## How You Work "
29+
"\n 1. **Think First**: Analyze the problem and determine what information or actions you need "
30+
"\n 2. **Use Tools When Needed**: Call appropriate functions/tools to perform numerical modeling. "
31+
"\n 3. **Reason with Results**: Process the tool outputs and use them to inform your next steps "
32+
"\n 4. **Iterate**: Continue thinking and using tools until you can provide a complete answer "
33+
"\n ## Important Guidelines "
34+
"\n - Only when the original or modified history values show clear patterns, you may call forecasting tools. Otherwise, fatal numerical errors will happen. "
35+
"\n - Only when exact mathematical structures are inferred from the context, you may write and execute codes in the python-sandbox. Always include a print function in your codes to return valid messages. "
36+
f"\n - Ensure that all reasoning and tool usage is complete within a maximum of {max_iterations} steps. Each step should either advance your understanding or gather necessary information. Be systematic and thorough in your approach. A final and fully cited answer has to be output before step {max_iterations}. ",
37+
}

0 commit comments

Comments
 (0)