-
Notifications
You must be signed in to change notification settings - Fork 548
Task based refactor #736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Task based refactor #736
Conversation
This allows us to swap out the generic AutoML logic for the time series specific when the task is in TS_FORECASTREGRESSION while maintaining backwards compatibility with the current flaml.automl public interface.
These will be used to separate task specific logic from the main AutoML entrypoint class
…/FLAML into time-series-extension
…/FLAML into time-series-extension
It isn't pretty, but it seems to get to the model now
…/FLAML into time-series-extension
…numpy forecast test fails
….py passes except for numpy and prophet
…rophet, apart from test_numpy
| @@ -1,4 +1,4 @@ | |||
| # Task Oriented AutoML | |||
| # GenericTask Oriented AutoML | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this and other changes from Task->GenericTask in the documentation a mistake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! Thanks for the review. Yeah, this and a couple of other instances appear to be a mistaken replacement. Likely Pycharm refactor going over-the-top. I've fixed them up now
The holidays are only used in time series forecasting which is optional. Make sure the import statement is invoked only under the time series environment. See https://github.com/microsoft/FLAML/blob/main/flaml/automl.py#L1096 as an example for nlp (another optional environment). |
qingyun-wu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be more reasonable to put the "nlp" and "time_series" folders into the "automl" folder?
Co-authored-by: Xueqing Liu <[email protected]>
Hey! Thanks for the review @gmdiana-hershey. I've added dataclasses as a Python version dependent install requirement, thanks for spotting it! Holidays turned out not to be used outside of the tests at present so I've removed the unnecessary test in test_forecast |
Hey! Thanks for the review @liususan091219. I've applied your relative import suggestions. On |
Hey! I think we ended up with this layout to avoid some circular imports, but I'll have another try at refactoring these into the automl subpackage 😄 |
|
Some checks failed. I wonder if it will be easier to break the PR down to smaller PRs. For example, the first PR to make is to just create the |
|
Thank you @markharley! We indeed need to re-organize the whole structure of the flaml folder considering the need of adding an automl folder. I am attaching a proposal for the new structure (considering all content in flaml, not just the changes involved in this PR). In this PR, perhaps you can just make .py files about automl and the time_series folder in the right place. We can come up with a plan with the other changes (and perhaps also discuss this proposed structure plan in the maintainer meeting on 10/10).
|
| def __init__( | ||
| self, | ||
| task_name: str, | ||
| X_train: Union[np.ndarray, pd.DataFrame], | ||
| y_train: Union[np.ndarray, pd.DataFrame, pd.Series], | ||
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constructor here indicates that a Task object needs to know about X_train and y_train when it's constructed.
The implementation indicates that no reference to the dataset is stored inside the Task object.
Why? What's the relation between a Task object and a dataset exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example, at this stage we could infer it it's a binary or multi-category classification, and whether it's a univariate or panel regression, so the user wouldn't have the hassle of specifying that
|
|
||
| def __init__( | ||
| self, | ||
| task_name: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs documentation on the allowed task_name name?
| "scipy>=1.4.1", | ||
| "pandas>=1.1.4", | ||
| "scikit-learn>=0.24", | ||
| "dataclasses>=0.8 ; python_version=='3.6'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "dataclasses>=0.8 ; python_version=='3.6'", | |
| "dataclasses>=0.8 ; python_version>='3.6'", |
Why are these changes needed?
Related issue number
N/A
Checks