Skip to content

Commit 9639156

Browse files
authored
Merge branch 'main' into copilot/fix-custom-metric-function-error
2 parents 75337a1 + c64eeb5 commit 9639156

File tree

9 files changed

+631
-4
lines changed

9 files changed

+631
-4
lines changed

.github/copilot-instructions.md

Lines changed: 241 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,241 @@
1+
# GitHub Copilot Instructions for FLAML
2+
3+
## Project Overview
4+
5+
FLAML (Fast Library for Automated Machine Learning & Tuning) is a lightweight Python library for efficient automation of machine learning and AI operations. It automates workflow based on large language models, machine learning models, etc. and optimizes their performance.
6+
7+
**Key Components:**
8+
9+
- `flaml/automl/`: AutoML functionality for classification and regression
10+
- `flaml/tune/`: Generic hyperparameter tuning
11+
- `flaml/default/`: Zero-shot AutoML with default configurations
12+
- `flaml/autogen/`: Legacy autogen code (note: AutoGen has moved to a separate repository)
13+
- `flaml/fabric/`: Microsoft Fabric integration
14+
- `test/`: Comprehensive test suite
15+
16+
## Build and Test Commands
17+
18+
### Installation
19+
20+
```bash
21+
# Basic installation
22+
pip install -e .
23+
24+
# Install with test dependencies
25+
pip install -e .[test]
26+
27+
# Install with automl dependencies
28+
pip install -e .[automl]
29+
30+
# Install with forecast dependencies (Linux only)
31+
pip install -e .[forecast]
32+
```
33+
34+
### Running Tests
35+
36+
```bash
37+
# Run all tests (excluding autogen)
38+
pytest test/ --ignore=test/autogen --reruns 2 --reruns-delay 10
39+
40+
# Run tests with coverage
41+
coverage run -a -m pytest test --ignore=test/autogen --reruns 2 --reruns-delay 10
42+
coverage xml
43+
44+
# Check dependencies
45+
python test/check_dependency.py
46+
```
47+
48+
### Linting and Formatting
49+
50+
```bash
51+
# Run pre-commit hooks
52+
pre-commit run --all-files
53+
54+
# Format with black (line length: 120)
55+
black . --line-length 120
56+
57+
# Run ruff for linting and auto-fix
58+
ruff check . --fix
59+
```
60+
61+
## Code Style and Formatting
62+
63+
### Python Style
64+
65+
- **Line length:** 120 characters (configured in both Black and Ruff)
66+
- **Formatter:** Black (v23.3.0+)
67+
- **Linter:** Ruff with Pyflakes and pycodestyle rules
68+
- **Import sorting:** Use isort (via Ruff)
69+
- **Python version:** Supports Python >= 3.10 (full support for 3.10, 3.11, 3.12; Python 3.13 tested but some optional dependencies may have limited compatibility)
70+
71+
### Code Quality Rules
72+
73+
- Follow Black formatting conventions
74+
- Keep imports sorted and organized
75+
- Avoid unused imports (F401) - these are flagged but not auto-fixed
76+
- Avoid wildcard imports (F403) where possible
77+
- Complexity: Max McCabe complexity of 10
78+
- Use type hints where appropriate
79+
- Write clear docstrings for public APIs
80+
81+
### Pre-commit Hooks
82+
83+
The repository uses pre-commit hooks for:
84+
85+
- Checking for large files, AST syntax, YAML/TOML/JSON validity
86+
- Detecting merge conflicts and private keys
87+
- Trailing whitespace and end-of-file fixes
88+
- pyupgrade for Python 3.8+ syntax
89+
- Black formatting
90+
- Markdown formatting (mdformat with GFM and frontmatter support)
91+
- Ruff linting with auto-fix
92+
93+
## Testing Strategy
94+
95+
### Test Organization
96+
97+
- Tests are in the `test/` directory, organized by module
98+
- `test/automl/`: AutoML feature tests
99+
- `test/tune/`: Hyperparameter tuning tests
100+
- `test/default/`: Zero-shot AutoML tests
101+
- `test/nlp/`: NLP-related tests
102+
- `test/spark/`: Spark integration tests
103+
104+
### Test Requirements
105+
106+
- Write tests for new functionality
107+
- Ensure tests pass on multiple Python versions (3.10, 3.11, 3.12, 3.13)
108+
- Tests should work on both Ubuntu and Windows
109+
- Use pytest markers for platform-specific tests (e.g., `@pytest.mark.spark`)
110+
- Tests should be idempotent and not depend on external state
111+
- Use `--reruns 2 --reruns-delay 10` for flaky tests
112+
113+
### Coverage
114+
115+
- Aim for good test coverage on new code
116+
- Coverage reports are generated for Python 3.11 builds
117+
- Coverage reports are uploaded to Codecov
118+
119+
## Git Workflow and Best Practices
120+
121+
### Branching
122+
123+
- Main branch: `main`
124+
- Create feature branches from `main`
125+
- PR reviews are required before merging
126+
127+
### Commit Messages
128+
129+
- Use clear, descriptive commit messages
130+
- Reference issue numbers when applicable
131+
132+
### Pull Requests
133+
134+
- Ensure all tests pass before requesting review
135+
- Update documentation if adding new features
136+
- Follow the PR template in `.github/PULL_REQUEST_TEMPLATE.md`
137+
138+
## Project Structure
139+
140+
```
141+
flaml/
142+
├── automl/ # AutoML functionality
143+
├── tune/ # Hyperparameter tuning
144+
├── default/ # Zero-shot AutoML
145+
├── autogen/ # Legacy autogen (deprecated, moved to separate repo)
146+
├── fabric/ # Microsoft Fabric integration
147+
├── onlineml/ # Online learning
148+
└── version.py # Version information
149+
150+
test/ # Test suite
151+
├── automl/
152+
├── tune/
153+
├── default/
154+
├── nlp/
155+
└── spark/
156+
157+
notebook/ # Example notebooks
158+
website/ # Documentation website
159+
```
160+
161+
## Dependencies and Package Management
162+
163+
### Core Dependencies
164+
165+
- NumPy >= 1.17
166+
- Python >= 3.10 (officially supported: 3.10, 3.11, 3.12; Python 3.13 is tested in CI but may have limited compatibility with some optional dependencies)
167+
168+
### Optional Dependencies
169+
170+
- `[automl]`: lightgbm, xgboost, scipy, pandas, scikit-learn
171+
- `[test]`: Full test suite dependencies
172+
- `[spark]`: PySpark and joblib dependencies
173+
- `[forecast]`: holidays, prophet, statsmodels, hcrystalball, pytorch-forecasting, pytorch-lightning, tensorboardX
174+
- `[hf]`: Hugging Face transformers and datasets
175+
- See `setup.py` for complete list
176+
177+
### Version Constraints
178+
179+
- Be mindful of Python version-specific dependencies (check setup.py)
180+
- XGBoost versions differ based on Python version
181+
- NumPy 2.0+ only for Python >= 3.13
182+
- Some features (like vowpalwabbit) only work with older Python versions
183+
184+
## Boundaries and Restrictions
185+
186+
### Do NOT Modify
187+
188+
- `.git/` directory and Git configuration
189+
- `LICENSE` file
190+
- Version information in `flaml/version.py` (unless explicitly updating version)
191+
- GitHub Actions workflows without careful consideration
192+
- Existing test files unless fixing bugs or adding coverage
193+
194+
### Be Cautious With
195+
196+
- `setup.py`: Changes to dependencies should be carefully reviewed
197+
- `pyproject.toml`: Linting and testing configuration
198+
- `.pre-commit-config.yaml`: Pre-commit hook configuration
199+
- Backward compatibility: FLAML is a library with external users
200+
201+
### Security Considerations
202+
203+
- Never commit secrets or API keys
204+
- Be careful with external data sources in tests
205+
- Validate user inputs in public APIs
206+
- Follow secure coding practices for ML operations
207+
208+
## Special Notes
209+
210+
### AutoGen Migration
211+
212+
- AutoGen has moved to a separate repository: https://github.com/microsoft/autogen
213+
- The `flaml/autogen/` directory contains legacy code
214+
- Tests in `test/autogen/` are ignored in the main test suite
215+
- Direct users to the new AutoGen repository for AutoGen-related issues
216+
217+
### Platform-Specific Considerations
218+
219+
- Some tests only run on Linux (e.g., forecast tests with prophet)
220+
- Windows and Ubuntu are the primary supported platforms
221+
- macOS support exists but requires special libomp setup for lgbm/xgboost
222+
223+
### Performance
224+
225+
- FLAML focuses on efficient automation and tuning
226+
- Consider computational cost when adding new features
227+
- Optimize for low resource usage where possible
228+
229+
## Documentation
230+
231+
- Main documentation: https://microsoft.github.io/FLAML/
232+
- Update documentation when adding new features
233+
- Provide clear examples in docstrings
234+
- Add notebook examples for significant new features
235+
236+
## Contributing
237+
238+
- Follow the contributing guide: https://microsoft.github.io/FLAML/docs/Contribute
239+
- Sign the Microsoft CLA when making your first contribution
240+
- Be respectful and follow the Microsoft Open Source Code of Conduct
241+
- Join the Discord community for discussions: https://discord.gg/Cppx2vSPVP

flaml/automl/automl.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -180,6 +180,11 @@ def custom_metric(
180180
and 'final_estimator' to specify the passthrough and
181181
final_estimator in the stacker. The dict can also contain
182182
'n_jobs' as the key to specify the number of jobs for the stacker.
183+
Note: The hyperparameters of a custom 'final_estimator' are NOT
184+
automatically tuned. If you provide an estimator instance (e.g.,
185+
CatBoostClassifier()), it will use the parameters you specified
186+
or their defaults. If 'final_estimator' is not provided, the best
187+
model found during the search will be used as the final estimator.
183188
eval_method: A string of resampling strategy, one of
184189
['auto', 'cv', 'holdout'].
185190
split_ratio: A float of the valiation data percentage for holdout.
@@ -1859,6 +1864,11 @@ def custom_metric(
18591864
and 'final_estimator' to specify the passthrough and
18601865
final_estimator in the stacker. The dict can also contain
18611866
'n_jobs' as the key to specify the number of jobs for the stacker.
1867+
Note: The hyperparameters of a custom 'final_estimator' are NOT
1868+
automatically tuned. If you provide an estimator instance (e.g.,
1869+
CatBoostClassifier()), it will use the parameters you specified
1870+
or their defaults. If 'final_estimator' is not provided, the best
1871+
model found during the search will be used as the final estimator.
18621872
eval_method: A string of resampling strategy, one of
18631873
['auto', 'cv', 'holdout'].
18641874
split_ratio: A float of the valiation data percentage for holdout.
@@ -3182,6 +3192,10 @@ def _search(self):
31823192
# the total degree of parallelization = parallelization degree per estimator * parallelization degree of ensemble
31833193
)
31843194
if isinstance(self._ensemble, dict):
3195+
# Note: If a custom final_estimator is provided, it is used as-is without
3196+
# hyperparameter tuning. The user is responsible for setting appropriate
3197+
# parameters or using defaults. If not provided, the best model found
3198+
# during the search (self._trained_estimator) is used.
31853199
final_estimator = self._ensemble.get("final_estimator", self._trained_estimator)
31863200
passthrough = self._ensemble.get("passthrough", True)
31873201
ensemble_n_jobs = self._ensemble.get("n_jobs", ensemble_n_jobs)

flaml/default/estimator.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,27 @@ def suggest_hyperparams(self, X, y):
9595
def fit(self, X, y, *args, **params):
9696
hyperparams, estimator_name, X, y_transformed = self.suggest_hyperparams(X, y)
9797
self.set_params(**hyperparams)
98+
99+
# Transform eval_set if present
100+
if "eval_set" in params and params["eval_set"] is not None:
101+
transformed_eval_set = []
102+
for eval_X, eval_y in params["eval_set"]:
103+
# Transform features
104+
eval_X_transformed = self._feature_transformer.transform(eval_X)
105+
# Transform labels if applicable
106+
if self._label_transformer and estimator_name in [
107+
"rf",
108+
"extra_tree",
109+
"xgboost",
110+
"xgb_limitdepth",
111+
"choose_xgb",
112+
]:
113+
eval_y_transformed = self._label_transformer.transform(eval_y)
114+
transformed_eval_set.append((eval_X_transformed, eval_y_transformed))
115+
else:
116+
transformed_eval_set.append((eval_X_transformed, eval_y))
117+
params["eval_set"] = transformed_eval_set
118+
98119
if self._label_transformer and estimator_name in [
99120
"rf",
100121
"extra_tree",

flaml/tune/searcher/search_thread.py

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,31 @@
2525
logger = logging.getLogger(__name__)
2626

2727

28+
def _recursive_dict_update(target: Dict, source: Dict) -> None:
29+
"""Recursively update target dictionary with source dictionary.
30+
31+
Unlike dict.update(), this function merges nested dictionaries instead of
32+
replacing them entirely. This is crucial for configurations with nested
33+
structures (e.g., XGBoost params).
34+
35+
Args:
36+
target: The dictionary to be updated (modified in place).
37+
source: The dictionary containing values to merge into target.
38+
39+
Example:
40+
>>> target = {'params': {'eta': 0.1, 'max_depth': 3}}
41+
>>> source = {'params': {'verbosity': 0}}
42+
>>> _recursive_dict_update(target, source)
43+
>>> target
44+
{'params': {'eta': 0.1, 'max_depth': 3, 'verbosity': 0}}
45+
"""
46+
for key, value in source.items():
47+
if isinstance(value, dict) and key in target and isinstance(target[key], dict):
48+
_recursive_dict_update(target[key], value)
49+
else:
50+
target[key] = value
51+
52+
2853
class SearchThread:
2954
"""Class of global or local search thread."""
3055

@@ -65,7 +90,7 @@ def suggest(self, trial_id: str) -> Optional[Dict]:
6590
try:
6691
config = self._search_alg.suggest(trial_id)
6792
if isinstance(self._search_alg._space, dict):
68-
config.update(self._const)
93+
_recursive_dict_update(config, self._const)
6994
else:
7095
# define by run
7196
config, self.space = unflatten_hierarchical(config, self._space)

test/automl/test_multiclass.py

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -181,6 +181,49 @@ def test_ensemble(self):
181181
}
182182
automl.fit(X_train=X_train, y_train=y_train, **settings)
183183

184+
def test_ensemble_final_estimator_params_not_tuned(self):
185+
"""Test that final_estimator parameters in ensemble are not automatically tuned.
186+
187+
This test verifies that when a custom final_estimator is provided with specific
188+
parameters, those parameters are used as-is without any hyperparameter tuning.
189+
"""
190+
from sklearn.linear_model import LogisticRegression
191+
192+
automl = AutoML()
193+
X_train, y_train = load_wine(return_X_y=True)
194+
195+
# Create a LogisticRegression with specific non-default parameters
196+
custom_params = {
197+
"C": 0.5, # Non-default value
198+
"max_iter": 50, # Non-default value
199+
"random_state": 42,
200+
}
201+
final_est = LogisticRegression(**custom_params)
202+
203+
settings = {
204+
"time_budget": 5,
205+
"estimator_list": ["rf", "lgbm"],
206+
"task": "classification",
207+
"ensemble": {
208+
"final_estimator": final_est,
209+
"passthrough": False,
210+
},
211+
"n_jobs": 1,
212+
}
213+
automl.fit(X_train=X_train, y_train=y_train, **settings)
214+
215+
# Verify that the final estimator in the stacker uses the exact parameters we specified
216+
if hasattr(automl.model, "final_estimator_"):
217+
# The model is a StackingClassifier
218+
fitted_final_estimator = automl.model.final_estimator_
219+
assert (
220+
abs(fitted_final_estimator.C - custom_params["C"]) < 1e-9
221+
), f"Expected C={custom_params['C']}, but got {fitted_final_estimator.C}"
222+
assert (
223+
fitted_final_estimator.max_iter == custom_params["max_iter"]
224+
), f"Expected max_iter={custom_params['max_iter']}, but got {fitted_final_estimator.max_iter}"
225+
print("✓ Final estimator parameters were preserved (not tuned)")
226+
184227
def test_dataframe(self):
185228
self.test_classification(True)
186229

0 commit comments

Comments
 (0)