Skip to content

Commit e6ffbeb

Browse files
author
marwan37
committed
fix broken links and format
1 parent 7ae4cfa commit e6ffbeb

File tree

6 files changed

+12
-16
lines changed

6 files changed

+12
-16
lines changed

research-radar/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,7 @@ The project follows the recommended ZenML project structure:
230230
The project includes detailed documentation in various subdirectories:
231231
- **[Data Documentation](data/README.md)**: Details on dataset storage and processing.
232232
- **[Classification Results Documentation](classification_results/README.md)**: Explanation of classification outputs, metrics, and the checkpoint system.
233-
- **[Model Comparison Documentation](model_comparison/README.md)**: Details on the model comparison.
233+
- **[Model Comparison Metrics Documentation](model_compare_metrics/README.md)**: Details on the model comparison.
234234
- **[Pipelines Documentation](pipelines/README.md)**: Details on the pipeline definitions.
235235
- **[Prompts Documentation](prompts/README.md)**: Details on the prompts used in the pipeline.
236236
- **[Schemas Documentation](schemas/README.md)**: Details on data models and validation.

research-radar/pipelines/README.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,12 @@
11
# Pipeline Usage
22

33
- For detailed implementation of each step, see the individual Python files
4-
- For pipeline configurations and settings, refer to [`base_config.yaml`](../base_config.yaml)
4+
- For pipeline configurations and settings, refer to [`base_config.yaml`](../configs/base_config.yaml)
55

66
## Classification Pipeline
77

88
Runs the following steps:
99

10-
- [`load_classification_dataset`](../steps/load_classification_dataset.py) - Loads articles based on classification mode
1110
- [`classify_articles`](../steps/classify_articles.py) - Classifies articles using DeepSeek R1
1211
- [`save_classifications`](../steps/save_classifications.py) - Saves classification results to JSON
1312
- [`merge_classifications`](../steps/merge_classifications.py) - Merges new classifications with existing dataset (augmentation mode)
@@ -20,7 +19,6 @@ Runs the following steps:
2019

2120
Runs the following steps:
2221

23-
- [`load_training_dataset`](../steps/load_training_dataset.py) - Automatically selects augmented dataset if available, otherwise uses composite dataset
2422
- [`data_preprocessor`](../steps/data_preprocessor.py) - Prepares text for model training
2523
- [`data_splitter`](../steps/data_splitter.py) - Creates train/validation/test splits
2624
- [`save_test_set`](../steps/save_test_set.py) - Optionally saves test set for later evaluation
@@ -37,6 +35,6 @@ Runs the following step:
3735

3836
Runs the following steps:
3937

40-
- [`load_test_set_from_artifact`](../steps/load_test_set_from_artifact.py)
38+
- [`load_test_set`](../steps/load_test_set.py)
4139
- [`compare_models`](../steps/compare_models.py)
4240
- [`save_comparison_metrics`](../steps/save_comparison_metrics.py)

research-radar/schemas/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Two-part schema for article data:
3131
- `InputArticle`: Article text with metadata and validation rules
3232
- Ensures text is non-empty with field validation
3333

34-
### [`training_arguments_config.py`](training_arguments_config.py)
34+
### [`training_config.py`](training_config.py)
3535

3636
Configuration schema for Hugging Face `TrainingArguments`:
3737

research-radar/schemas/config_models.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,7 @@
1515
# limitations under the License.
1616
#
1717

18-
"""
19-
Pydantic models for configuration validation.
20-
"""
18+
"""Pydantic models for configuration validation."""
2119

2220
from typing import Dict, List, Literal, Optional
2321

@@ -41,6 +39,7 @@ class BatchProcessingConfig(BaseModel):
4139
@field_validator("batch_size")
4240
@classmethod
4341
def validate_batch_size(cls, v):
42+
"""Validate the batch size."""
4443
if v <= 0:
4544
raise ValueError("batch_size must be greater than 0")
4645
return v
@@ -57,6 +56,7 @@ class ParallelProcessingConfig(BaseModel):
5756
@field_validator("workers")
5857
@classmethod
5958
def validate_workers(cls, v):
59+
"""Validate the number of workers."""
6060
if v < 1:
6161
raise ValueError("Number of workers must be at least 1")
6262
return v
@@ -82,6 +82,7 @@ class InferenceParamsConfig(BaseModel):
8282
@field_validator("temperature", "top_p")
8383
@classmethod
8484
def validate_probability_params(cls, v):
85+
"""Validate the probability parameters."""
8586
if not 0.0 <= v <= 1.0:
8687
raise ValueError(
8788
"Probability parameters must be between 0.0 and 1.0"
@@ -90,6 +91,7 @@ def validate_probability_params(cls, v):
9091

9192
@model_validator(mode="after")
9293
def validate_token_lengths(self):
94+
"""Validate the token lengths."""
9395
if self.max_new_tokens >= self.max_sequence_length:
9496
raise ValueError(
9597
"max_new_tokens must be less than max_sequence_length"
@@ -330,8 +332,7 @@ class AppConfig(BaseModel):
330332

331333

332334
def validate_config(config: Dict) -> AppConfig:
333-
"""
334-
Validate configuration dictionary against Pydantic models.
335+
"""Validate configuration dictionary against Pydantic models.
335336
336337
Args:
337338
config: Raw configuration dictionary loaded from base_config.yaml

research-radar/steps/data_splitter.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,7 @@ def data_splitter(
3333
Annotated[Dataset, "validation_set"],
3434
Annotated[Dataset, "test_set"],
3535
]:
36-
"""
37-
Performs stratified dataset splitting.
36+
"""Performs stratified dataset splitting.
3837
3938
Args:
4039
dataset: Input dataset to split

research-radar/steps/load_test_set.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -45,12 +45,10 @@ def load_test_set(
4545
source_type: Type of data source ('disk' or 'artifact')
4646
path: Path to dataset on disk (required when source_type is 'disk')
4747
artifact_name: Name of the ZenML artifact (required when source_type is 'artifact')
48+
version: Version of the ZenML artifact (optional, defaults to latest)
4849
4950
Returns:
5051
Dataset: The loaded dataset
51-
52-
Raises:
53-
ValueError: If the parameters are invalid or the dataset cannot be loaded
5452
"""
5553
source_type = source_type.lower()
5654
if source_type not in ["disk", "artifact"]:

0 commit comments

Comments
 (0)