Skip to content

Commit ef749ab

Browse files
authored
feat: add mask inference in debug mode (#1154)
1 parent fd039f1 commit ef749ab

File tree

1 file changed

+20
-0
lines changed

1 file changed

+20
-0
lines changed

rdagent/components/coder/data_science/pipeline/prompts.yaml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,26 @@ pipeline_coder:
135135
else:
136136
sample_size = len(train_dataset)
137137
```
138+
In debug mode, to increase efficiency, you only need to perform inference on the first sample of the test set to generate a valid prediction for `submission.csv`. For all other samples in the test set, you should use a placeholder value (e.g., 0 or a default value) to fill the prediction column. This ensures that the generated `submission.csv` has the same number of rows as the full run and passes the format check.
139+
Example code:
140+
```python
141+
all_preds = []
142+
for i, batch in enumerate(test_loader):
143+
# In debug mode, use placeholders for all batches after the first one to improve efficiency.
144+
if args.debug and i > 0:
145+
# The shape and data type of the placeholder must match the model's actual output.
146+
# Here, we assume `predictions` is a NumPy array.
147+
placeholder = np.zeros_like(predictions)
148+
all_preds.append(placeholder)
149+
continue
150+
151+
# In full mode, or for the first batch in debug mode, perform actual model inference.
152+
predictions = model.predict(batch)
153+
all_preds.append(predictions)
154+
155+
# final_predictions = np.concatenate(all_preds)
156+
# ... then create and save submission.csv
157+
```
138158
You should be very careful about the label classes number in the debug mode. The label classes should be the same as the full run even when you are in the debug mode. The label classes number is often used to build the model.
139159
{% endif %}
140160

0 commit comments

Comments
 (0)