Skip to content

Commit 1a2fe6c

Browse files
committed
feat: added README & updated scripts
1 parent bea5e43 commit 1a2fe6c

File tree

3 files changed

+145
-44
lines changed

3 files changed

+145
-44
lines changed

scripts/experiments/README.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
## Overview
2+
This script automates the process of evaluating datasets with multiple metrics, either in a multilabel or multiclass configuration. It iterates through datasets in a specified directory and applies a set of metrics to each dataset. The script is designed to work with `autointent` and updates configuration files before processing each dataset.
3+
4+
---
5+
6+
## Features
7+
- Processes datasets for **multilabel** or **multiclass** scenarios based on user input.
8+
- Supports multiple metrics:
9+
- **Multilabel metrics**:
10+
- `scoring_accuracy`
11+
- `scoring_f1`
12+
- `scoring_log_likelihood`
13+
- `scoring_precision`
14+
- `scoring_recall`
15+
- `scoring_roc_auc`
16+
- `scoring_neg_ranking_loss`
17+
- `scoring_neg_coverage`
18+
- `scoring_hit_rate`
19+
- **Multiclass metrics**:
20+
- `scoring_accuracy`
21+
- `scoring_f1`
22+
- `scoring_log_likelihood`
23+
- `scoring_precision`
24+
- `scoring_recall`
25+
- `scoring_roc_auc`
26+
- Automatically handles configuration updates using `update_metric.sh`.
27+
- Logs processing results and skips datasets gracefully on errors.
28+
29+
---
30+
31+
## Requirements
32+
- **Dependencies**:
33+
- `autointent` must be installed and available in the PATH.
34+
- `yq` is required for processing YAML files. Ensure it is installed and available in the PATH.
35+
36+
### Installing `yq`
37+
38+
#### Linux
39+
```
40+
wget https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -O /usr/bin/yq
41+
chmod +x /usr/bin/yq
42+
```
43+
#### macOS
44+
```
45+
brew install yq
46+
```
47+
48+
- **Input Files**:
49+
- JSON files located in the directory specified by `<DATA_PATH>`.
50+
51+
---
52+
53+
## Usage
54+
From root repo:
55+
```
56+
sh scripts/experiments/generate_experiments.sh <DATA_PATH> <LOG_PATH> <USE_MULTILABEL>
57+
```
58+
Parameters
59+
60+
<DATA_PATH>: Path to the directory containing dataset JSON files.
61+
<LOG_PATH>: Directory where logs for each dataset will be saved.
62+
<USE_MULTILABEL>: Boolean flag (true or false) indicating whether to use multilabel metrics.
63+
64+
## Example
65+
```
66+
sh scripts/experiments/generate_experiments.sh data/intent_records_regexp/ experiments/dnnc/ false
67+
```
68+
69+
This command processes all JSON files in `data/intent_records_regexp/` using multiclass metrics, saving logs in `experiments/dnnc/`.
70+
71+
## Notes
72+
73+
- Ensure the path to update_metric.sh is correct. Adjust the CONFIG_SCRIPT_PATH variable if needed.
Lines changed: 70 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,74 @@
11
#!/bin/bash
22

3-
DATA_PATH="experiments/intent_description"
4-
LOG_PATH="experiments/intent_description/multilabel"
5-
METRIC="scoring_hit_rate"
6-
USE_MULTILABEL=true
7-
CONFIG_SCRIPT_PATH="./update_metric.sh"
8-
9-
for FILE in "$DATA_PATH"/*.json; do
10-
FILENAME=$(basename "$FILE" .json)
11-
DATASET_NAME=$(echo "$FILENAME" | sed 's/_fix.*//')
12-
13-
# Determine the appropriate multilabel flag for the metric update script
14-
if [ "$USE_MULTILABEL" = true ]; then
15-
MULTILABEL_ARG="true"
16-
else
17-
MULTILABEL_ARG="false"
18-
fi
19-
20-
# Update the metric in the configuration file
21-
echo "Updating metric for dataset: $DATASET_NAME"
22-
$CONFIG_SCRIPT_PATH "$METRIC" "$MULTILABEL_ARG"
23-
if [ $? -ne 0 ]; then
24-
echo "Error updating metric for $DATASET_NAME. Exiting."
25-
exit 1
26-
fi
27-
28-
rm -rf runs/
29-
30-
echo "Processing dataset: $DATASET_NAME"
31-
autointent data.train_path="$FILE" \
32-
logs.dirpath="$LOG_PATH/${DATASET_NAME}_${METRIC}" \
33-
seed=42 \
34-
vector_index.device=cuda \
35-
hydra.job_logging.root.level=INFO \
36-
data.force_multilabel="$USE_MULTILABEL"
37-
38-
if [ $? -ne 0 ]; then
39-
echo "Error encountered while processing $FILE. Exiting."
40-
exit 1
41-
else
42-
echo "Successfully processed $FILE"
43-
fi
3+
# Check for the required arguments
4+
if [ "$#" -ne 3 ]; then
5+
echo "Usage: $0 <DATA_PATH> <LOG_PATH> <USE_MULTILABEL>"
6+
exit 1
7+
fi
8+
9+
# Read arguments
10+
DATA_PATH="$1"
11+
LOG_PATH="$2"
12+
USE_MULTILABEL="$3"
13+
CONFIG_SCRIPT_PATH="scripts/experiments/update_metric.sh"
14+
15+
# Define metrics for multilabel and multiclass
16+
if [ "$USE_MULTILABEL" = true ]; then
17+
METRICS=(
18+
"scoring_accuracy"
19+
"scoring_f1"
20+
"scoring_log_likelihood"
21+
"scoring_precision"
22+
"scoring_recall"
23+
"scoring_roc_auc"
24+
"scoring_neg_ranking_loss"
25+
"scoring_neg_coverage"
26+
"scoring_hit_rate"
27+
)
28+
else
29+
METRICS=(
30+
"scoring_accuracy"
31+
"scoring_f1"
32+
"scoring_log_likelihood"
33+
"scoring_precision"
34+
"scoring_recall"
35+
"scoring_roc_auc"
36+
)
37+
fi
38+
39+
# Iterate through each metric
40+
for METRIC in "${METRICS[@]}"; do
41+
echo "Processing with metric: $METRIC"
42+
43+
for FILE in "$DATA_PATH"/*.json; do
44+
FILENAME=$(basename "$FILE" .json)
45+
DATASET_NAME=$(echo "$FILENAME" | sed 's/_fix.*//')
46+
47+
# Update the metric in the configuration file
48+
echo "Updating metric for dataset: $DATASET_NAME"
49+
$CONFIG_SCRIPT_PATH "$METRIC" "$USE_MULTILABEL"
50+
if [ $? -ne 0 ]; then
51+
echo "Error updating metric for $DATASET_NAME with metric: $METRIC. Exiting."
52+
exit 1
53+
fi
54+
55+
rm -rf runs/
56+
57+
echo "Processing dataset: $DATASET_NAME with metric: $METRIC"
58+
autointent data.train_path="$FILE" \
59+
logs.dirpath="$LOG_PATH/${DATASET_NAME}_${METRIC}" \
60+
seed=42 \
61+
vector_index.device=cuda \
62+
hydra.job_logging.root.level=INFO \
63+
data.force_multilabel="$USE_MULTILABEL"
64+
65+
if [ $? -ne 0 ]; then
66+
echo "Error encountered while processing $FILE with metric: $METRIC. Exiting."
67+
exit 1
68+
else
69+
echo "Successfully processed $FILE with metric: $METRIC"
70+
fi
71+
done
4472
done
4573

46-
echo "All datasets processed successfully."
74+
echo "All datasets processed successfully for all metrics."

scripts/experiments/update_metric.sh

100644100755
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@ MULTILABEL="$2"
1111

1212
# Determine the correct configuration file based on the multilabel argument
1313
if [ "$MULTILABEL" == "true" ]; then
14-
CONFIG_PATH="../../autointent/datafiles/default-multilabel-config.yaml"
14+
CONFIG_PATH="autointent/datafiles/default-multilabel-config.yaml"
1515
elif [ "$MULTILABEL" == "false" ]; then
16-
CONFIG_PATH="../../autointent/datafiles/default-multiclass-config.yaml"
16+
CONFIG_PATH="autointent/datafiles/default-multiclass-config.yaml"
1717
else
1818
echo "Invalid value for <multilabel>. Use 'true' or 'false'."
1919
exit 1

0 commit comments

Comments
 (0)