diff --git a/README.md b/README.md
index c8c50999f7..6631937d6c 100644
--- a/README.md
+++ b/README.md
@@ -283,7 +283,7 @@ For more detailed examples about how to use Trinity-RFT, please refer to the fol
 + [Advanced data processing / human-in-the-loop](./docs/sphinx_doc/source/tutorial/example_data_functionalities.md)
 
 
-
+For some frequently asked questions, check [FAQ](./docs/sphinx_doc/source/tutorial/faq.md) for answers.
 
 
 ## Advanced usage and full configurations
diff --git a/docs/sphinx_doc/source/index.rst b/docs/sphinx_doc/source/index.rst
index 4b4cab2aa9..fc085215b0 100644
--- a/docs/sphinx_doc/source/index.rst
+++ b/docs/sphinx_doc/source/index.rst
@@ -33,6 +33,12 @@ Welcome to Trinity-RFT's documentation!
    tutorial/trinity_configs.md
    tutorial/example_mix_algo.md
 
+.. toctree::
+   :maxdepth: 2
+   :caption: FAQ
+
+   tutorial/faq.md
+
 .. toctree::
    :maxdepth: 1
    :glob:
diff --git a/docs/sphinx_doc/source/tutorial/example_mix_algo.md b/docs/sphinx_doc/source/tutorial/example_mix_algo.md
index b106293eed..59f7036f46 100644
--- a/docs/sphinx_doc/source/tutorial/example_mix_algo.md
+++ b/docs/sphinx_doc/source/tutorial/example_mix_algo.md
@@ -15,9 +15,9 @@ $$
 \left[
     \frac{1}{T'_b} \sum_{t=1}^{T'_b}
     \log \pi_\theta(o'_{b,t} \mid q'_b, o'_{b,<t})
-\right]}_{\text{Auxiliary Loss on Expert Data}}.
+\right]}_{\text{Auxiliary objective on expert data}}.
 $$
-The first term corresponds to the standard GRPO objective, which aims to maximize the expected reward. The last term is an auxiliary loss defined on expert data, encouraging the policy to imitate expert behavior. $\mu$ is a weighting factor that controls the relative importance of the two terms.
+The first term corresponds to the standard GRPO objective, which aims to maximize the expected reward. The last term is an auxiliary objective defined on expert data, encouraging the policy to imitate expert behavior. $\mu$ is a weighting factor that controls the relative importance of the two terms.
 
 
 ## Step 0: Prepare the Expert Data
diff --git a/docs/sphinx_doc/source/tutorial/faq.md b/docs/sphinx_doc/source/tutorial/faq.md
new file mode 100644
index 0000000000..b9606734a7
--- /dev/null
+++ b/docs/sphinx_doc/source/tutorial/faq.md
@@ -0,0 +1,162 @@
+# FAQ
+
+## Part 1: Configurations
+**Q:** Why do most examples have two configuration YAML files, e.g., `gsm8k.yaml` and `train_gsm8k.yaml` in the `examples/grpo_gsm8k` directory?
+
+**A:** Trinity-RFT uses [veRL](https://github.com/volcengine/verl) as the training backend, and the auxiliary YAML file starting with `train_` is used for configuring veRL, referred to [veRL documentation](https://verl.readthedocs.io/en/latest/examples/config.html).
+If you specify the path to `train_gsm8k.yaml` in `trainer.trainer_config_path`, Trinity-RFT will automatically pass the parameters to veRL.
+
+We provide an alternative way to configure the veRL trainer. You may also directly specify the parameters in the `trainer.trainer_config` dictionary. This approach is mutually exclusive with using `trainer.trainer_config_path`.
+
+Note that some parameters are not listed in the auxiliary configuration file (e.g., `train_gsm8k.yaml`), as they will be overridden by the parameters in the trinity configuration file (e.g., `gsm8k.yaml`). Please refer to `./trinity_configs.md` for more details.
+For users' convenience, future versions will gradually reduce parameters in `trainer.trainer_config` and `trainer.trainer_config_path` until it's fully deprecated.
+
+---
+
+**Q:** What's the relationship between `buffer.batch_size`, `actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu` and other batch sizes?
+
+**A:** The following parameters are closely related:
+
+- `buffer.batch_size`: The number of tasks in a batch, effective for both the explorer and the trainer.
+- `actor_rollout_ref.actor.ppo_mini_batch_size`: In the configuration, this value represents the number of tasks in a mini-batch, overridden by `buffer.batch_size`; but in the `update_policy` function, its value becomes the number of experiences in a mini-batch per GPU, i.e., `buffer.batch_size * algorithm.repeat_times (/ ngpus_trainer)`. The expression of dividing `ngpus_trainer` is caused by implict data allocation to GPUs, but this do not affects the result after gradient accumulation.
+- `actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu`: The number of experiences in a micro-batch per GPU.
+
+A minimal example showing their usage is as follows:
+
+```python
+def update_policy(batch_exps):
+    dataloader = batch_epxs.split(ppo_mini_batch_size) # here `ppo_mini_batch_size` is in terms of experiences
+    for _ in range(ppo_epochs):
+        for batch_idx, data in enumerate(dataloader):
+            # Split data
+            mini_batch = data
+            if actor_rollout_ref.actor.use_dynamic_bsz:
+                micro_batches, _ = rearrange_micro_batches(
+                        batch=mini_batch, max_token_len=max_token_len
+                    )
+            else:
+                micro_batches = mini_batch.split(actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu)
+
+            # Computing gradient
+            for data in micro_batches:
+                entropy, log_prob = self._forward_micro_batch(
+                    micro_batch=data, ...
+                )
+                pg_loss, pg_clipfrac, ppo_kl, pg_clipfrac_lower = compute_policy_loss(
+                    log_prob=log_prob, **data
+                )
+                policy_loss = pg_loss + ...
+                loss = policy_loss / self.gradient_accumulation
+                loss.backward()
+
+            # Optimizer step
+            grad_norm = self._optimizer_step()
+    self.actor_optimizer.zero_grad()
+```
+Please refer to `trinity/trainer/verl/dp_actor.py` for detailed implementation. veRL also provides an explanation in [FAQ](https://verl.readthedocs.io/en/latest/faq/faq.html#what-is-the-meaning-of-train-batch-size-mini-batch-size-and-micro-batch-size).
+
+
+## Part 2: Common Errors
+
+**Error:**
+```bash
+File ".../flash_attn/flash_attn_interface.py", line 15, in ‹module>
+    import flash_attn_2_cuda as flash_attn_gpu
+ImportError: ...
+```
+
+**A:** The `flash-attn` module is not properly installed. Try to fix it by running `pip install flash-attn` or `pip install flash-attn -v --no-build-isolation`.
+
+---
+
+**Error:**
+```bash
+UsageError: api_key not configured (no-tty). call wandb.login(key=[your_api_key]) ...
+```
+
+**A:** Try to log in to WandB before starting Ray and running the experiment. One way to do this is run the command `export WANDB_API_KEY=[your_api_key]`.
+
+---
+
+**Error:**
+```bash
+ValueError: Failed to look up actor with name 'explorer' ...
+```
+
+**A:** Make sure Ray is started before running the experiment. If Ray is already running, you can restart it with the following commands:
+
+```bash
+ray stop
+ray start --head
+```
+
+---
+
+**Error:** Out-of-Memory (OOM) error
+
+**A:** The following parameters may be helpful:
+
+- For trainer, adjust `actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu` when `actor_rollout_ref.actor.use_dynamic_bsz=false`; adjust `actor_rollout_ref.actor.ppo_max_token_len_per_gpu` and `actor_rollout_ref.actor.ulysses_sequence_parallel_size` when `actor_rollout_ref.actor.use_dynamic_bsz=true`.
+- For explorer, adjust `explorer.rollout_model.tensor_parallel_size`,
+
+
+## Part 3: Debugging Methods [Coming Soon]
+To see the full logs of all processes and save it to `debug.log`:
+```bash
+export RAY_DEDUP_LOGS=0
+trinity run --config grpo_gsm8k/gsm8k.yaml 2>&1 | tee debug.log
+```
+
+
+## Part 4: Other Questions
+**Q:** What's the purpose of `buffer.trainer_input.experience_buffer.path`?
+
+**A:** This path specifies the path to the SQLite database storaging the generated experiences. You may comment out this line if you don't want to use the SQLite database.
+
+To see the experiences in the database, you can use the following Python script:
+
+```python
+from sqlalchemy import create_engine
+from sqlalchemy.exc import OperationalError
+from sqlalchemy.orm import sessionmaker
+from sqlalchemy.pool import NullPool
+from trinity.common.schema import ExperienceModel
+
+engine = create_engine(buffer.trainer_input.experience_buffer.path)
+session = sessionmaker(bind=engine)
+sess = session()
+
+MAX_EXPERIENCES = 4
+experiences = (
+    sess.query(ExperienceModel)
+    .with_for_update()
+    .limit(MAX_EXPERIENCES)
+    .all()
+)
+
+exp_list = []
+for exp in experiences:
+    exp_list.append(ExperienceModel.to_experience(exp))
+
+# Print the experiences
+for exp in exp_list:
+    print(f"{exp.prompt_text=}", f"{exp.response_text=}")
+```
+
+---
+
+**Q:** How to load the checkpoints outside of the Trinity-RFT framework?
+
+**A:** You need to specify model path and checkpoint path. The following code snippet gives an example with transformers.
+
+```python
+import os
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from trinity.common.models.utils import load_state_dict_from_verl_checkpoint
+
+# Assume we need the checkpoint at step 780;
+# model_path, checkpoint_root_dir, project, and name are already defined
+model = AutoModelForCausalLM.from_pretrained(model_path)
+ckp_path = os.path.join(checkpoint_root_dir, project, name, "global_step_780", "actor")
+model.load_state_dict(load_state_dict_from_verl_checkpoint(ckp_path))
+```
diff --git a/docs/sphinx_doc/source/tutorial/trinity_configs.md b/docs/sphinx_doc/source/tutorial/trinity_configs.md
index 88d925f786..6c2497c5d5 100644
--- a/docs/sphinx_doc/source/tutorial/trinity_configs.md
+++ b/docs/sphinx_doc/source/tutorial/trinity_configs.md
@@ -399,7 +399,7 @@ data_processor:
 
 For advanced users working with the `verl` trainer backend. This includes fine-grained settings for actor/critic models, optimizer parameters, and training loops.
 
-> For full parameter meanings, refer to the [veRL documentation](https://github.com/volcengine/verl/blob/v0.3.0.post1/docs/examples/config.rst).
+> For full parameter meanings, refer to the [veRL documentation](https://verl.readthedocs.io/en/latest/examples/config.html).
 
 
 ```yaml
diff --git a/examples/async_gsm8k/verl_config.yaml b/examples/async_gsm8k/verl_config.yaml
index fc44fdad94..f773f9a0ae 100644
--- a/examples/async_gsm8k/verl_config.yaml
+++ b/examples/async_gsm8k/verl_config.yaml
@@ -7,7 +7,6 @@ actor_rollout_ref:
     use_remove_padding: True  # False
   actor:
     strategy: fsdp  # This is for backward-compatibility
-    ppo_mini_batch_size: 128
     ppo_micro_batch_size_per_gpu: 4
     use_dynamic_bsz: True # False
     ppo_max_token_len_per_gpu: 16384 # n * ${data.max_prompt_length} + ${data.max_response_length}
diff --git a/examples/dpo_humanlike/train_dpo.yaml b/examples/dpo_humanlike/train_dpo.yaml
index d5074848b0..28c687322c 100644
--- a/examples/dpo_humanlike/train_dpo.yaml
+++ b/examples/dpo_humanlike/train_dpo.yaml
@@ -7,7 +7,6 @@ actor_rollout_ref:
     use_remove_padding: False
   actor:
     strategy: fsdp  # This is for backward-compatibility
-    ppo_mini_batch_size: 32
     ppo_micro_batch_size_per_gpu: 2 # NOTE
     use_dynamic_bsz: False
     ppo_max_token_len_per_gpu: 16384 # n * ${data.max_prompt_length} + ${data.max_response_length}
diff --git a/examples/grpo_alfworld/alfworld.yaml b/examples/grpo_alfworld/alfworld.yaml
index 8323ef8591..281008ae46 100644
--- a/examples/grpo_alfworld/alfworld.yaml
+++ b/examples/grpo_alfworld/alfworld.yaml
@@ -13,7 +13,7 @@ cluster:
   gpu_per_node: 8
 buffer:
   total_epochs: 20
-  batch_size: 4
+  batch_size: 32
   max_retry_times: 3
   max_retry_interval: 1
   explorer_input:
diff --git a/examples/grpo_alfworld/train_alfworld.yaml b/examples/grpo_alfworld/train_alfworld.yaml
index 5b73ec7403..063abd768a 100644
--- a/examples/grpo_alfworld/train_alfworld.yaml
+++ b/examples/grpo_alfworld/train_alfworld.yaml
@@ -7,7 +7,6 @@ actor_rollout_ref:
     use_remove_padding: False
   actor:
     strategy: fsdp  # This is for backward-compatibility
-    ppo_mini_batch_size: 1536
     ppo_micro_batch_size_per_gpu: 1
     use_dynamic_bsz: False
     ppo_max_token_len_per_gpu: 16384 # n * ${data.max_prompt_length} + ${data.max_response_length}
diff --git a/examples/grpo_gsm8k/train_gsm8k.yaml b/examples/grpo_gsm8k/train_gsm8k.yaml
index fc44fdad94..f773f9a0ae 100644
--- a/examples/grpo_gsm8k/train_gsm8k.yaml
+++ b/examples/grpo_gsm8k/train_gsm8k.yaml
@@ -7,7 +7,6 @@ actor_rollout_ref:
     use_remove_padding: True  # False
   actor:
     strategy: fsdp  # This is for backward-compatibility
-    ppo_mini_batch_size: 128
     ppo_micro_batch_size_per_gpu: 4
     use_dynamic_bsz: True # False
     ppo_max_token_len_per_gpu: 16384 # n * ${data.max_prompt_length} + ${data.max_response_length}
diff --git a/examples/grpo_gsm8k_experience_pipeline/train_gsm8k.yaml b/examples/grpo_gsm8k_experience_pipeline/train_gsm8k.yaml
index fc44fdad94..f773f9a0ae 100644
--- a/examples/grpo_gsm8k_experience_pipeline/train_gsm8k.yaml
+++ b/examples/grpo_gsm8k_experience_pipeline/train_gsm8k.yaml
@@ -7,7 +7,6 @@ actor_rollout_ref:
     use_remove_padding: True  # False
   actor:
     strategy: fsdp  # This is for backward-compatibility
-    ppo_mini_batch_size: 128
     ppo_micro_batch_size_per_gpu: 4
     use_dynamic_bsz: True # False
     ppo_max_token_len_per_gpu: 16384 # n * ${data.max_prompt_length} + ${data.max_response_length}
diff --git a/examples/grpo_gsm8k_task_pipeline/train_gsm8k.yaml b/examples/grpo_gsm8k_task_pipeline/train_gsm8k.yaml
index fc44fdad94..f773f9a0ae 100644
--- a/examples/grpo_gsm8k_task_pipeline/train_gsm8k.yaml
+++ b/examples/grpo_gsm8k_task_pipeline/train_gsm8k.yaml
@@ -7,7 +7,6 @@ actor_rollout_ref:
     use_remove_padding: True  # False
   actor:
     strategy: fsdp  # This is for backward-compatibility
-    ppo_mini_batch_size: 128
     ppo_micro_batch_size_per_gpu: 4
     use_dynamic_bsz: True # False
     ppo_max_token_len_per_gpu: 16384 # n * ${data.max_prompt_length} + ${data.max_response_length}
diff --git a/examples/grpo_math/README.md b/examples/grpo_math/README.md
index 649cc5272f..5b3c2c3ea2 100644
--- a/examples/grpo_math/README.md
+++ b/examples/grpo_math/README.md
@@ -1,6 +1,6 @@
 # Example: PPO on MATH dataset
 
-This example shows the usage of PPO on the MATH dataset.
+This example shows the usage of PPO on the MATH dataset, adapted from [simpleRL](https://github.com/hkust-nlp/simpleRL-reason/tree/v0).
 
 For more detailed information, please refer to the [documentation](../../docs/sphinx_doc/source/tutorial/example_reasoning_basic.md).
 
diff --git a/examples/grpo_math/train_math.yaml b/examples/grpo_math/train_math.yaml
index 0a46bd1788..ee94163eed 100644
--- a/examples/grpo_math/train_math.yaml
+++ b/examples/grpo_math/train_math.yaml
@@ -7,7 +7,6 @@ actor_rollout_ref:
     use_remove_padding: True  # False
   actor:
     strategy: fsdp  # This is for backward-compatibility
-    ppo_mini_batch_size: 128
     ppo_micro_batch_size_per_gpu: 4
     use_dynamic_bsz: True # False
     ppo_max_token_len_per_gpu: 16384 # n * ${data.max_prompt_length} + ${data.max_response_length}
diff --git a/examples/grpo_sciworld/train_sciworld.yaml b/examples/grpo_sciworld/train_sciworld.yaml
index 5b73ec7403..063abd768a 100644
--- a/examples/grpo_sciworld/train_sciworld.yaml
+++ b/examples/grpo_sciworld/train_sciworld.yaml
@@ -7,7 +7,6 @@ actor_rollout_ref:
     use_remove_padding: False
   actor:
     strategy: fsdp  # This is for backward-compatibility
-    ppo_mini_batch_size: 1536
     ppo_micro_batch_size_per_gpu: 1
     use_dynamic_bsz: False
     ppo_max_token_len_per_gpu: 16384 # n * ${data.max_prompt_length} + ${data.max_response_length}
diff --git a/examples/grpo_webshop/train_webshop.yaml b/examples/grpo_webshop/train_webshop.yaml
index 5b73ec7403..063abd768a 100644
--- a/examples/grpo_webshop/train_webshop.yaml
+++ b/examples/grpo_webshop/train_webshop.yaml
@@ -7,7 +7,6 @@ actor_rollout_ref:
     use_remove_padding: False
   actor:
     strategy: fsdp  # This is for backward-compatibility
-    ppo_mini_batch_size: 1536
     ppo_micro_batch_size_per_gpu: 1
     use_dynamic_bsz: False
     ppo_max_token_len_per_gpu: 16384 # n * ${data.max_prompt_length} + ${data.max_response_length}
diff --git a/examples/mix_math/train_mix_math.yaml b/examples/mix_math/train_mix_math.yaml
index ca072b78f6..7d32c1d756 100644
--- a/examples/mix_math/train_mix_math.yaml
+++ b/examples/mix_math/train_mix_math.yaml
@@ -7,7 +7,6 @@ actor_rollout_ref:
     use_remove_padding: True  # False
   actor:
     strategy: fsdp  # This is for backward-compatibility
-    ppo_mini_batch_size: 128
     ppo_micro_batch_size_per_gpu: 4
     use_dynamic_bsz: True # False
     ppo_max_token_len_per_gpu: 25600 # n * ${data.max_prompt_length} + ${data.max_response_length}
diff --git a/examples/opmd_gsm8k/train_opmd_gsm8k.yaml b/examples/opmd_gsm8k/train_opmd_gsm8k.yaml
index 5ddd5124ee..cf2f06cf70 100644
--- a/examples/opmd_gsm8k/train_opmd_gsm8k.yaml
+++ b/examples/opmd_gsm8k/train_opmd_gsm8k.yaml
@@ -31,7 +31,6 @@ actor_rollout_ref:
     use_remove_padding: True
   actor:
     strategy: fsdp  # This is for backward-compatibility
-    ppo_mini_batch_size: 128
     ppo_micro_batch_size_per_gpu: 4
     use_dynamic_bsz: True
     ppo_max_token_len_per_gpu: 16384 # n * ${data.max_prompt_length} + ${data.max_response_length}
diff --git a/examples/ppo_countdown/README.md b/examples/ppo_countdown/README.md
index fa08b375a7..04c14d6241 100644
--- a/examples/ppo_countdown/README.md
+++ b/examples/ppo_countdown/README.md
@@ -1,6 +1,6 @@
 # Example: PPO on Countdown dataset
 
-This example shows the usage of PPO on the Countdown dataset.
+This example shows the usage of PPO on the Countdown dataset, adapted from [TinyZero](https://github.com/Jiayi-Pan/TinyZero).
 
 For more detailed information, please refer to the [documentation](../../docs/sphinx_doc/source/tutorial/example_reasoning_basic.md).
 
diff --git a/examples/ppo_countdown/train_countdown.yaml b/examples/ppo_countdown/train_countdown.yaml
index 191c345b90..7b1ef8eccf 100644
--- a/examples/ppo_countdown/train_countdown.yaml
+++ b/examples/ppo_countdown/train_countdown.yaml
@@ -7,7 +7,6 @@ actor_rollout_ref:
     use_remove_padding: True
   actor:
     strategy: fsdp  # This is for backward-compatibility
-    ppo_mini_batch_size: 128
     ppo_micro_batch_size_per_gpu: 4
     use_dynamic_bsz: True
     ppo_max_token_len_per_gpu: 16384 # n * ${data.max_prompt_length} + ${data.max_response_length}
@@ -61,7 +60,6 @@ critic:
         # transformer_layer_cls_to_wrap: None
         min_num_params: 0
       fsdp_size: -1
-  ppo_mini_batch_size: ${actor_rollout_ref.actor.ppo_mini_batch_size}
   ppo_micro_batch_size_per_gpu: 8
   forward_micro_batch_size_per_gpu: ${critic.ppo_micro_batch_size_per_gpu}
   use_dynamic_bsz: ${actor_rollout_ref.actor.use_dynamic_bsz}