change the autopdl tutorial (#1159)

vazirim · web-flow · commit 235431c7453f · 2025-09-08T10:39:51.000-04:00
Signed-off-by: Mandana Vaziri &lt;mvaziri@us.ibm.com&gt;
diff --git a/docs/autopdl.md b/docs/autopdl.md
@@ -50,7 +50,8 @@ An AutoPDL configuration file describes the state-space and parameters for the s
 --8<-- "./examples/optimizer/grammar_correction.yaml"
 ```
 
-Field `pdl_path` is the path to the PDL program to optimize. `dataset` points to the dataset to be used. In this case, it's an object with paths for train/validation/test splits. In general, `dataset` could be a string pointing to Huggingface dataset (that would then be automatically downloaded). `demonstrations_variable_name` gives the name of the PDL variable that will hold the demonstrations in the optimized program. `demonstration_columns` indicates the field names in the dataset that will be used to create demonstrations, and `instance_columns` are those fields that will be used to formulate an instance query (see the query in the PDL program above, which uses `input`). The `groundtruth_column` holds the field with the ground truth (in this case `output`). `eval_pdl` is the path of the PDL program that encapsulates the loss function.
+Field `pdl_path` is the path to the PDL program to optimize. `dataset` points to the dataset to be used. In this case, it's an object with paths for train/validation/test splits. 
+`demonstrations_variable_name` gives the name of the PDL variable that will hold the demonstrations in the optimized program. `demonstration_columns` indicates the field names in the dataset that will be used to create demonstrations, and `instance_columns` are those fields that will be used to formulate an instance query (see the query in the PDL program above, which uses `input`). The `groundtruth_column` holds the field with the ground truth (in this case `output`). `eval_pdl` is the path of the PDL program that encapsulates the loss function.
 
 `initial_validation_set_size` is the initial size of the validation set (i.e., the number of tests used initially to validate candidates). `max_validation_set_size` indicates the maximum to which this validation set will grow. For more details on the successive halving algorithm used in AutoPDL see [here](https://arxiv.org/abs/2504.04365). `max_test_set_size: 10` is the maximum of the test set used to evaluate at the end of the evaluation run. `num_candidates` indicates the number of candidates to consider (sampled randomly). `parallelism` indicates the level of parallelism used by the optimizer.