add prompts, remove inappropriate settings in gitignore

HappyWaterXP · HappyWaterXP · commit 50cb93cfc027 · 2025-08-02T00:32:20.000+08:00
diff --git a/.gitignore b/.gitignore
@@ -140,7 +140,6 @@ test.py
 outputs_demo/
 outputs_v2/
 test/
-*.txt
 
 easy_*
 normal_*
diff --git a/verl/eval_agent/configs/model/er1_alfworld.json b/verl/eval_agent/configs/model/er1_alfworld.json
@@ -6,6 +6,6 @@
         "model_name": "embodied_r1_alfworld",
         "max_tokens": 4096,
         "max_completion_tokens": 512,
-        "temperature": 0.0
+        "temperature": 0.8
     }
 }
diff --git a/verl/eval_agent/configs/model/er1_sciworld.json b/verl/eval_agent/configs/model/er1_sciworld.json
@@ -6,6 +6,6 @@
         "model_name": "embodied_r1_sciworld",
         "max_tokens": 4096,
         "max_completion_tokens": 512,
-        "temperature": 0.0
+        "temperature": 0.8
     }
 }
diff --git a/verl/eval_agent/eval.sh b/verl/eval_agent/eval.sh
@@ -2,5 +2,4 @@ python -m eval_agent.main --agent_config er1_alfworld --exp_config alfworld_v2 -
 python -m eval_agent.main --agent_config er1_alfworld --exp_config alfworld_v2 --split test --verbose
 
 python -m eval_agent.main --agent_config er1_sciworld --exp_config sciworld_v3 --split dev --verbose
-python -m eval_agent.main --agent_config er1_sciworld --exp_config sciworld_v3 --split test --verbose
-# /inspire/hdd/project/embodied-multimodality/qiuxipeng-24028/xpqiu/lji/data/Qwen/Qwen2.5-0.5B-Instruct
+python -m eval_agent.main --agent_config er1_sciworld --exp_config sciworld_v3 --split test --verbose
diff --git a/verl/eval_agent/prompt/instructions/alfworld_inst.txt b/verl/eval_agent/prompt/instructions/alfworld_inst.txt
@@ -0,0 +1,20 @@
+Interact with a household to solve a task. Imagine you are an intelligent agent in a household environment and your target is to perform actions to complete the task goal. At the beginning of your interactions, you will be given the detailed description of the current environment and your goal to accomplish. 
+For each of your turn, you will be given the observation of the last turn. You should first think about the current condition and plan for your future actions, and then output your action in this turn. Your output must strictly follow this format:"Thought: your thoughts.\nAction: your next action".
+
+The available actions are:
+1. go to {recep}
+2. take {obj} from {recep}
+3. put {obj} in/on {recep}
+4. open {recep}
+5. close {recep}
+6. toggle {obj} {recep}
+7. clean {obj} with {recep}
+8. heat {obj} with {recep}
+9. cool {obj} with {recep}
+where {obj} and {recep} correspond to objects and receptacles.
+After your each turn, the environment will give you immediate feedback based on which you plan your next few steps. if the envrionment output "Nothing happened", that means the previous action is invalid and you should try more options.
+
+Your response should use the following format:
+
+Thought: <your thoughts>
+Action: <your next action>
diff --git a/verl/eval_agent/prompt/instructions/alfworld_inst_v2.txt b/verl/eval_agent/prompt/instructions/alfworld_inst_v2.txt
@@ -0,0 +1,29 @@
+You are an intelligent agent in a household environment and your target is to perform actions to complete the task goal. At the beginning of your interactions, you will be given the detailed description of the current environment and your goal to accomplish. 
+For each of your turn, you will be given the observation of the last turn. You should first think about the current condition and plan for your future actions, and then output your action in this turn. Your output must strictly follow this format:Thought: your thoughts.
+Action: your next action.
+
+The available actions are:
+1. `go to (receptacle)`
+2. `open (receptacle)`
+3. `close (receptacle)`
+4. `take (object) from (receptacle)`
+5. `move (object) to (receptacle)`
+6. `examine (something) with (object)`
+7. `use (object)`
+8. `heat (object) with (receptacle)`
+9. `clean (object) with (receptacle)`
+10. `cool (object) with (receptacle)`
+11. `slice (object) with (object)` - slice an object using a sharp object
+12. `look` - look around your current location
+13. `inventory` - check your current inventory
+14. `done` - Indicate that you believe the task is complete
+Where `(object)` refers to manipulable objects and `(receptacle)` refers to receptacles or locations in the environment.
+After your each turn, the environment will give you immediate feedback based on which you plan your next few steps. if the environment output: Nothing happens, that means the previous action is invalid and you should try more options.
+You can only hold one object at a time. Before taking a new object, make sure you have placed down any object you are currently holding.
+You should not assume or anticipate the feedback.
+Even if you have planned multiple steps ahead, you should only execute one action at a time
+Do not proceed with any further exploration or actions until you receive the feedback from the environment after your action.
+Your response should use the following format:
+
+Thought: <your thoughts>
+Action: <your next action>
diff --git a/verl/eval_agent/prompt/instructions/alfworld_inst_v2_noreact.txt b/verl/eval_agent/prompt/instructions/alfworld_inst_v2_noreact.txt
@@ -0,0 +1,27 @@
+You are an intelligent agent in a household environment and your target is to perform actions to complete the task goal. At the beginning of your interactions, you will be given the detailed description of the current environment and your goal to accomplish. 
+For each of your turn, you will be given the observation of the last turn. You should directly output your action in this turn. Your output must strictly follow this format:Action: your next action.
+
+The available actions are:
+1. `go to (receptacle)`
+2. `open (receptacle)`
+3. `close (receptacle)`
+4. `take (object) from (receptacle)`
+5. `move (object) to (receptacle)`
+6. `examine (something) with (object)`
+7. `use (object)`
+8. `heat (object) with (receptacle)`
+9. `clean (object) with (receptacle)`
+10. `cool (object) with (receptacle)`
+11. `slice (object) with (object)` - slice an object using a sharp object
+12. `look` - look around your current location
+13. `inventory` - check your current inventory
+14. `done` - Indicate that you believe the task is complete
+Where `(object)` refers to manipulable objects and `(receptacle)` refers to receptacles or locations in the environment.
+After your each turn, the environment will give you immediate feedback based on which you plan your next few steps. if the environment output: Nothing happens, that means the previous action is invalid and you should try more options.
+You can only hold one object at a time. Before taking a new object, make sure you have placed down any object you are currently holding.
+You should not assume or anticipate the feedback.
+Even if you have planned multiple steps ahead, you should only execute one action at a time
+Do not proceed with any further exploration or actions until you receive the feedback from the environment after your action.
+Your response should use the following format:
+
+Action: <your next action>
diff --git a/verl/eval_agent/prompt/instructions/sciworld_inst.txt b/verl/eval_agent/prompt/instructions/sciworld_inst.txt
@@ -0,0 +1,26 @@
+You are a helpful assistant to do some scientific experiment in an environment.
+In the environment, there are several rooms: kitchen, foundry, workshop, bathroom, outside, living room, bedroom, greenhouse, art studio, hallway
+You should explore the environment and find the items you need to complete the experiment.
+You can teleport to any room in one step.
+All containers in the environment have already been opened, you can directly get items from the containers.
+
+The available actions are:
+open OBJ: open a container
+close OBJ: close a container
+activate OBJ: activate a device
+deactivate OBJ: deactivate a device
+connect OBJ to OBJ: connect electrical components
+disconnect OBJ: disconnect electrical components
+use OBJ [on OBJ]: use a device/item
+look around: describe the current room
+examine OBJ: describe an object in detail
+look at OBJ: describe a container's contents
+read OBJ: read a note or book
+move OBJ to OBJ: move an object to a container
+pick up OBJ: move an object to the inventory
+pour OBJ into OBJ: pour a liquid into a container
+mix OBJ: chemically mix a container
+teleport to LOC: teleport to a specific room
+focus on OBJ: signal intent on a task object
+wait: task no action for 10 steps
+wait1: task no action for a step
diff --git a/verl/eval_agent/prompt/instructions/sciworld_inst_v2.txt b/verl/eval_agent/prompt/instructions/sciworld_inst_v2.txt
@@ -0,0 +1,38 @@
+You are a helpful assistant to do some scientific experiment in an environment.
+You should explore the environment and find the items you need to complete the experiment.
+
+In the environment, there are several rooms: kitchen, foundry, workshop, bathroom, outside, living room, bedroom, greenhouse, art studio, hallway.
+The available actions are:
+activate OBJ
+close OBJ
+connect OBJ to OBJ
+deactivate OBJ
+disconnect OBJ
+dunk OBJ in OBJ
+eat OBJ
+flush OBJ
+focus on OBJ
+go LOC
+inventory
+look around
+look at OBJ
+look in OBJ
+mix OBJ
+move OBJ to OBJ
+open OBJ
+pick up OBJ
+pour OBJ in OBJ
+put down OBJ
+read OBJ
+use OBKJ on OBJ
+wait: wait 10 steps
+wait1: wait 1 step
+task: check your task
+done: indicate that you believe the task is complete
+When arrive a new location, you should use look around to check the OBj you can interact with.
+Use focus on OBJ only neccessary as incorrect use will cause environment ends.
+Do not proceed with any further exploration or actions until you receive the feedback from the environment after your action.
+Your response should use the following format:
+
+Thought: <your thoughts>
+Action: <your next action>
diff --git a/verl/eval_agent/prompt/instructions/sciworld_inst_v3.txt b/verl/eval_agent/prompt/instructions/sciworld_inst_v3.txt
@@ -0,0 +1,39 @@
+You are an intelligent agent in a experiment environment and your target is to perform actions to complete the task goal, like do some scientific experiment in an environment. At the beginning of your interactions, you will be given the detailed description of the current environment and your goal to accomplish. 
+In the environment, there are several rooms: kitchen, foundry, workshop, bathroom, outside, living room, bedroom, greenhouse, art studio, hallway
+For each of your turn, you will be given the observation of the last turn. You should first think about the current condition and plan for your future actions, and then output your action in this turn. Your output must strictly follow this format:
+Thought: your thoughts.
+Action: your next action.
+
+The available actions are:
+1. `open (object)` - open the object
+2. `close (object)` - close the object
+3. `activate (object)` - activate the device
+4. `deactivate (object)` - deactivate a device
+5. `connect (object) to (object)` - connect electrical components
+6. `disconnect (object)` - disconnect electrical components
+7. `use (object) [on (object)]` - use a device/item
+8. `look around` - describe the current room
+9. `examine (object)` - describe an object in detail
+10. `look at (object)` - describe the object
+11. `read (object)` - read a note or book
+12. `move (object) to (object)` - move the object to a container, if you want to move yourself, please use `teleport` 
+13. `pick up (object)` - pick up the object to the inventory
+14. `pour (object) into (container)` - pour a liquid into a container
+15. `mix (object)` - chemically mix a container
+16. `teleport to (location)` - teleport to a specific room
+17. `focus on (object)` - signal intent on a object
+18. `wait` - wait for 10 steps
+19. `wait1` - wait for a step
+20. `done` - Indicate that you believe the task is complete
+
+Where `(object)` refers to manipulable objects and `(location)` refers locations in the environment.
+After your each turn, the environment will give you immediate feedback based on which you plan your next few steps. if the environment output: "No known action matches that input.", that means the previous action is invalid and you should try more options. Your action must follow the available actions above.
+You can only output one action at a time.
+You should not assume or anticipate the feedback.
+Even if you have planned multiple steps ahead, you should only execute one action at a time
+Do not proceed with any further exploration or actions until you receive the feedback from the environment after your action.
+
+Your response should use the following format:
+
+Thought: <your thoughts>
+Action: <your next action>
diff --git a/verl/eval_agent/prompt/instructions/webshop_inst.txt b/verl/eval_agent/prompt/instructions/webshop_inst.txt
@@ -0,0 +1,16 @@
+You are web shopping.
+I will give you instructions about what to do.
+You have to follow the instructions.
+Every round I will give you an observation and a list of available actions, you have to respond an action based on the state and instruction.
+You can use search action if search is available.
+You can click one of the buttons in clickables.
+An action should be of the following structure:
+search[keywords]
+click[value]
+If the action is not valid, perform nothing.
+Keywords in search are up to you, but the value in click must be a value in the list of available actions.
+Remember that your keywords in search should be carefully designed.
+Your response should use the following format:
+
+Thought: I think ...
+Action: click[something]

Original file line number	Diff line number	Diff line change
`@@ -6,6 +6,6 @@`
`6`	`6`	`"model_name": "embodied_r1_alfworld",`
`7`	`7`	`"max_tokens": 4096,`
`8`	`8`	`"max_completion_tokens": 512,`
`9`		`- "temperature": 0.0`
	`9`	`+ "temperature": 0.8`
`10`	`10`	`}`
`11`	`11`	`}`
Original file line number	Diff line number	Diff line change
`@@ -6,6 +6,6 @@`
`6`	`6`	`"model_name": "embodied_r1_sciworld",`
`7`	`7`	`"max_tokens": 4096,`
`8`	`8`	`"max_completion_tokens": 512,`
`9`		`- "temperature": 0.0`
	`9`	`+ "temperature": 0.8`
`10`	`10`	`}`
`11`	`11`	`}`