microsoft
diff --git a/‎.github/copilot-instructions.md‎
Lines changed: 241 additions & 0 deletions b/‎.github/copilot-instructions.md‎
Lines changed: 241 additions & 0 deletions
diff --git a/‎flaml/automl/automl.py‎
Lines changed: 14 additions & 0 deletions b/‎flaml/automl/automl.py‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎flaml/default/estimator.py‎
Lines changed: 21 additions & 0 deletions b/‎flaml/default/estimator.py‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎flaml/tune/searcher/search_thread.py‎
Lines changed: 26 additions & 1 deletion b/‎flaml/tune/searcher/search_thread.py‎
Lines changed: 26 additions & 1 deletion
diff --git a/‎test/automl/test_multiclass.py‎
Lines changed: 43 additions & 0 deletions b/‎test/automl/test_multiclass.py‎
Lines changed: 43 additions & 0 deletions
@@ -0,0 +1,241 @@
+# GitHub Copilot Instructions for FLAML
+
+## Project Overview
+
+FLAML (Fast Library for Automated Machine Learning & Tuning) is a lightweight Python library for efficient automation of machine learning and AI operations. It automates workflow based on large language models, machine learning models, etc. and optimizes their performance.
+
+**Key Components:**
+
+- `flaml/automl/`: AutoML functionality for classification and regression
+- `flaml/tune/`: Generic hyperparameter tuning
+- `flaml/default/`: Zero-shot AutoML with default configurations
+- `flaml/autogen/`: Legacy autogen code (note: AutoGen has moved to a separate repository)
+- `flaml/fabric/`: Microsoft Fabric integration
+- `test/`: Comprehensive test suite
+
+## Build and Test Commands
+
+### Installation
+
+```bash
+# Basic installation
+pip install -e .
+
+# Install with test dependencies
+pip install -e .[test]
+
+# Install with automl dependencies
+pip install -e .[automl]
+
+# Install with forecast dependencies (Linux only)
+pip install -e .[forecast]
+```
+
+### Running Tests
+
+```bash
+# Run all tests (excluding autogen)
+pytest test/ --ignore=test/autogen --reruns 2 --reruns-delay 10
+
+# Run tests with coverage
+coverage run -a -m pytest test --ignore=test/autogen --reruns 2 --reruns-delay 10
+coverage xml
+
+# Check dependencies
+python test/check_dependency.py
+```
+
+### Linting and Formatting
+
+```bash
+# Run pre-commit hooks
+pre-commit run --all-files
+
+# Format with black (line length: 120)
+black . --line-length 120
+
+# Run ruff for linting and auto-fix
+ruff check . --fix
+```
+
+## Code Style and Formatting
+
+### Python Style
+
+- **Line length:** 120 characters (configured in both Black and Ruff)
+- **Formatter:** Black (v23.3.0+)
+- **Linter:** Ruff with Pyflakes and pycodestyle rules
+- **Import sorting:** Use isort (via Ruff)
+- **Python version:** Supports Python >= 3.10 (full support for 3.10, 3.11, 3.12; Python 3.13 tested but some optional dependencies may have limited compatibility)
+
+### Code Quality Rules
+
+- Follow Black formatting conventions
+- Keep imports sorted and organized
+- Avoid unused imports (F401) - these are flagged but not auto-fixed
+- Avoid wildcard imports (F403) where possible
+- Complexity: Max McCabe complexity of 10
+- Use type hints where appropriate
+- Write clear docstrings for public APIs
+
+### Pre-commit Hooks
+
+The repository uses pre-commit hooks for:
+
+- Checking for large files, AST syntax, YAML/TOML/JSON validity
+- Detecting merge conflicts and private keys
+- Trailing whitespace and end-of-file fixes
+- pyupgrade for Python 3.8+ syntax
+- Black formatting
+- Markdown formatting (mdformat with GFM and frontmatter support)
+- Ruff linting with auto-fix
+
+## Testing Strategy
+
+### Test Organization
+
+- Tests are in the `test/` directory, organized by module
+- `test/automl/`: AutoML feature tests
+- `test/tune/`: Hyperparameter tuning tests
+- `test/default/`: Zero-shot AutoML tests
+- `test/nlp/`: NLP-related tests
+- `test/spark/`: Spark integration tests
+
+### Test Requirements
+
+- Write tests for new functionality
+- Ensure tests pass on multiple Python versions (3.10, 3.11, 3.12, 3.13)
+- Tests should work on both Ubuntu and Windows
+- Use pytest markers for platform-specific tests (e.g., `@pytest.mark.spark`)
+- Tests should be idempotent and not depend on external state
+- Use `--reruns 2 --reruns-delay 10` for flaky tests
+
+### Coverage
+
+- Aim for good test coverage on new code
+- Coverage reports are generated for Python 3.11 builds
+- Coverage reports are uploaded to Codecov
+
+## Git Workflow and Best Practices
+
+### Branching
+
+- Main branch: `main`
+- Create feature branches from `main`
+- PR reviews are required before merging
+
+### Commit Messages
+
+- Use clear, descriptive commit messages
+- Reference issue numbers when applicable
+
+### Pull Requests
+
+- Ensure all tests pass before requesting review
+- Update documentation if adding new features
+- Follow the PR template in `.github/PULL_REQUEST_TEMPLATE.md`
+
+## Project Structure
+
+```
+flaml/
+├── automl/         # AutoML functionality
+├── tune/           # Hyperparameter tuning
+├── default/        # Zero-shot AutoML
+├── autogen/        # Legacy autogen (deprecated, moved to separate repo)
+├── fabric/         # Microsoft Fabric integration
+├── onlineml/       # Online learning
+└── version.py      # Version information
+
+test/               # Test suite
+├── automl/
+├── tune/
+├── default/
+├── nlp/
+└── spark/
+
+notebook/           # Example notebooks
+website/            # Documentation website
+```
+
+## Dependencies and Package Management
+
+### Core Dependencies
+
+- NumPy >= 1.17
+- Python >= 3.10 (officially supported: 3.10, 3.11, 3.12; Python 3.13 is tested in CI but may have limited compatibility with some optional dependencies)
+
+### Optional Dependencies
+
+- `[automl]`: lightgbm, xgboost, scipy, pandas, scikit-learn
+- `[test]`: Full test suite dependencies
+- `[spark]`: PySpark and joblib dependencies
+- `[forecast]`: holidays, prophet, statsmodels, hcrystalball, pytorch-forecasting, pytorch-lightning, tensorboardX
+- `[hf]`: Hugging Face transformers and datasets
+- See `setup.py` for complete list
+
+### Version Constraints
+
+- Be mindful of Python version-specific dependencies (check setup.py)
+- XGBoost versions differ based on Python version
+- NumPy 2.0+ only for Python >= 3.13
+- Some features (like vowpalwabbit) only work with older Python versions
+
+## Boundaries and Restrictions
+
+### Do NOT Modify
+
+- `.git/` directory and Git configuration
+- `LICENSE` file
+- Version information in `flaml/version.py` (unless explicitly updating version)
+- GitHub Actions workflows without careful consideration
+- Existing test files unless fixing bugs or adding coverage
+
+### Be Cautious With
+
+- `setup.py`: Changes to dependencies should be carefully reviewed
+- `pyproject.toml`: Linting and testing configuration
+- `.pre-commit-config.yaml`: Pre-commit hook configuration
+- Backward compatibility: FLAML is a library with external users
+
+### Security Considerations
+
+- Never commit secrets or API keys
+- Be careful with external data sources in tests
+- Validate user inputs in public APIs
+- Follow secure coding practices for ML operations
+
+## Special Notes
+
+### AutoGen Migration
+
+- AutoGen has moved to a separate repository: https://github.com/microsoft/autogen
+- The `flaml/autogen/` directory contains legacy code
+- Tests in `test/autogen/` are ignored in the main test suite
+- Direct users to the new AutoGen repository for AutoGen-related issues
+
+### Platform-Specific Considerations
+
+- Some tests only run on Linux (e.g., forecast tests with prophet)
+- Windows and Ubuntu are the primary supported platforms
+- macOS support exists but requires special libomp setup for lgbm/xgboost
+
+### Performance
+
+- FLAML focuses on efficient automation and tuning
+- Consider computational cost when adding new features
+- Optimize for low resource usage where possible
+
+## Documentation
+
+- Main documentation: https://microsoft.github.io/FLAML/
+- Update documentation when adding new features
+- Provide clear examples in docstrings
+- Add notebook examples for significant new features
+
+## Contributing
+
+- Follow the contributing guide: https://microsoft.github.io/FLAML/docs/Contribute
+- Sign the Microsoft CLA when making your first contribution
+- Be respectful and follow the Microsoft Open Source Code of Conduct
+- Join the Discord community for discussions: https://discord.gg/Cppx2vSPVP
@@ -180,6 +180,11 @@ def custom_metric(
                 and 'final_estimator' to specify the passthrough and
                 final_estimator in the stacker. The dict can also contain
                 'n_jobs' as the key to specify the number of jobs for the stacker.
+                Note: The hyperparameters of a custom 'final_estimator' are NOT
+                automatically tuned. If you provide an estimator instance (e.g.,
+                CatBoostClassifier()), it will use the parameters you specified
+                or their defaults. If 'final_estimator' is not provided, the best
+                model found during the search will be used as the final estimator.
             eval_method: A string of resampling strategy, one of
                 ['auto', 'cv', 'holdout'].
             split_ratio: A float of the valiation data percentage for holdout.
@@ -1859,6 +1864,11 @@ def custom_metric(
                 and 'final_estimator' to specify the passthrough and
                 final_estimator in the stacker. The dict can also contain
                 'n_jobs' as the key to specify the number of jobs for the stacker.
+                Note: The hyperparameters of a custom 'final_estimator' are NOT
+                automatically tuned. If you provide an estimator instance (e.g.,
+                CatBoostClassifier()), it will use the parameters you specified
+                or their defaults. If 'final_estimator' is not provided, the best
+                model found during the search will be used as the final estimator.
             eval_method: A string of resampling strategy, one of
                 ['auto', 'cv', 'holdout'].
             split_ratio: A float of the valiation data percentage for holdout.
@@ -3182,6 +3192,10 @@ def _search(self):
                     # the total degree of parallelization = parallelization degree per estimator * parallelization degree of ensemble
                 )
                 if isinstance(self._ensemble, dict):
+                    # Note: If a custom final_estimator is provided, it is used as-is without
+                    # hyperparameter tuning. The user is responsible for setting appropriate
+                    # parameters or using defaults. If not provided, the best model found
+                    # during the search (self._trained_estimator) is used.
                     final_estimator = self._ensemble.get("final_estimator", self._trained_estimator)
                     passthrough = self._ensemble.get("passthrough", True)
                     ensemble_n_jobs = self._ensemble.get("n_jobs", ensemble_n_jobs)
 
@@ -95,6 +95,27 @@ def suggest_hyperparams(self, X, y):
         def fit(self, X, y, *args, **params):
             hyperparams, estimator_name, X, y_transformed = self.suggest_hyperparams(X, y)
             self.set_params(**hyperparams)
+
+            # Transform eval_set if present
+            if "eval_set" in params and params["eval_set"] is not None:
+                transformed_eval_set = []
+                for eval_X, eval_y in params["eval_set"]:
+                    # Transform features
+                    eval_X_transformed = self._feature_transformer.transform(eval_X)
+                    # Transform labels if applicable
+                    if self._label_transformer and estimator_name in [
+                        "rf",
+                        "extra_tree",
+                        "xgboost",
+                        "xgb_limitdepth",
+                        "choose_xgb",
+                    ]:
+                        eval_y_transformed = self._label_transformer.transform(eval_y)
+                        transformed_eval_set.append((eval_X_transformed, eval_y_transformed))
+                    else:
+                        transformed_eval_set.append((eval_X_transformed, eval_y))
+                params["eval_set"] = transformed_eval_set
+
             if self._label_transformer and estimator_name in [
                 "rf",
                 "extra_tree",
 
@@ -25,6 +25,31 @@
 logger = logging.getLogger(__name__)
 
 
+def _recursive_dict_update(target: Dict, source: Dict) -> None:
+    """Recursively update target dictionary with source dictionary.
+
+    Unlike dict.update(), this function merges nested dictionaries instead of
+    replacing them entirely. This is crucial for configurations with nested
+    structures (e.g., XGBoost params).
+
+    Args:
+        target: The dictionary to be updated (modified in place).
+        source: The dictionary containing values to merge into target.
+
+    Example:
+        >>> target = {'params': {'eta': 0.1, 'max_depth': 3}}
+        >>> source = {'params': {'verbosity': 0}}
+        >>> _recursive_dict_update(target, source)
+        >>> target
+        {'params': {'eta': 0.1, 'max_depth': 3, 'verbosity': 0}}
+    """
+    for key, value in source.items():
+        if isinstance(value, dict) and key in target and isinstance(target[key], dict):
+            _recursive_dict_update(target[key], value)
+        else:
+            target[key] = value
+
+
 class SearchThread:
     """Class of global or local search thread."""
 
@@ -65,7 +90,7 @@ def suggest(self, trial_id: str) -> Optional[Dict]:
             try:
                 config = self._search_alg.suggest(trial_id)
                 if isinstance(self._search_alg._space, dict):
-                    config.update(self._const)
+                    _recursive_dict_update(config, self._const)
                 else:
                     # define by run
                     config, self.space = unflatten_hierarchical(config, self._space)
 
@@ -181,6 +181,49 @@ def test_ensemble(self):
         }
         automl.fit(X_train=X_train, y_train=y_train, **settings)
 
+    def test_ensemble_final_estimator_params_not_tuned(self):
+        """Test that final_estimator parameters in ensemble are not automatically tuned.
+
+        This test verifies that when a custom final_estimator is provided with specific
+        parameters, those parameters are used as-is without any hyperparameter tuning.
+        """
+        from sklearn.linear_model import LogisticRegression
+
+        automl = AutoML()
+        X_train, y_train = load_wine(return_X_y=True)
+
+        # Create a LogisticRegression with specific non-default parameters
+        custom_params = {
+            "C": 0.5,  # Non-default value
+            "max_iter": 50,  # Non-default value
+            "random_state": 42,
+        }
+        final_est = LogisticRegression(**custom_params)
+
+        settings = {
+            "time_budget": 5,
+            "estimator_list": ["rf", "lgbm"],
+            "task": "classification",
+            "ensemble": {
+                "final_estimator": final_est,
+                "passthrough": False,
+            },
+            "n_jobs": 1,
+        }
+        automl.fit(X_train=X_train, y_train=y_train, **settings)
+
+        # Verify that the final estimator in the stacker uses the exact parameters we specified
+        if hasattr(automl.model, "final_estimator_"):
+            # The model is a StackingClassifier
+            fitted_final_estimator = automl.model.final_estimator_
+            assert (
+                abs(fitted_final_estimator.C - custom_params["C"]) < 1e-9
+            ), f"Expected C={custom_params['C']}, but got {fitted_final_estimator.C}"
+            assert (
+                fitted_final_estimator.max_iter == custom_params["max_iter"]
+            ), f"Expected max_iter={custom_params['max_iter']}, but got {fitted_final_estimator.max_iter}"
+            print("✓ Final estimator parameters were preserved (not tuned)")
+
     def test_dataframe(self):
         self.test_classification(True)