Skip to content

Conversation

@markharley
Copy link
Collaborator

Why are these changes needed?

  • Refactor to facilitate improvement of time-series fitting.
  • Delegate task specific logic to a Task sub-class, simplifying the AutoML class.
  • Introduce a TimeSeriesDataset class for encapsulation of time-series relevant data operations.
  • Introduce a TimeSeriesEstimator parent class to standardise time-series models.

Related issue number

N/A

Checks

markharley and others added 30 commits August 11, 2022 09:41
This allows us to swap out the generic AutoML logic for the time series
specific when the task is in TS_FORECASTREGRESSION while maintaining
backwards compatibility with the current flaml.automl public interface.
These will be used to separate task specific logic from the main AutoML
entrypoint class
It isn't pretty, but it seems to get to the model now
@@ -1,4 +1,4 @@
# Task Oriented AutoML
# GenericTask Oriented AutoML
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this and other changes from Task->GenericTask in the documentation a mistake?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Thanks for the review. Yeah, this and a couple of other instances appear to be a mistaken replacement. Likely Pycharm refactor going over-the-top. I've fixed them up now

@liususan091219
Copy link
Contributor

Was looking at the failed build and noticed that there were some import issues.

  • Python 3.6 doesn't include data classes, but there is a backport that could be added to the build script (and setup).
  • Also saw that importing holidays fails. That's not part of the [test] requirements in setup.py (see https://github.com/markharley/FLAML/blob/fafd6524c9466d8ab9c9049bb6102c34cd5819ec/setup.py#L91).
  • It sounds like new tests were added for time series tasks? Should the requirements in [ts_forecast] and [forecast] also be included in [test] in that case?

The holidays are only used in time series forecasting which is optional. Make sure the import statement is invoked only under the time series environment. See https://github.com/microsoft/FLAML/blob/main/flaml/automl.py#L1096 as an example for nlp (another optional environment).

Copy link
Contributor

@qingyun-wu qingyun-wu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more reasonable to put the "nlp" and "time_series" folders into the "automl" folder?

@markharley
Copy link
Collaborator Author

markharley commented Oct 2, 2022

Was looking at the failed build and noticed that there were some import issues.

  • Python 3.6 doesn't include data classes, but there is a backport that could be added to the build script (and setup).
  • Also saw that importing holidays fails. That's not part of the [test] requirements in setup.py (see https://github.com/markharley/FLAML/blob/fafd6524c9466d8ab9c9049bb6102c34cd5819ec/setup.py#L91).
  • It sounds like new tests were added for time series tasks? Should the requirements in [ts_forecast] and [forecast] also be included in [test] in that case?

Hey! Thanks for the review @gmdiana-hershey. I've added dataclasses as a Python version dependent install requirement, thanks for spotting it! Holidays turned out not to be used outside of the tests at present so I've removed the unnecessary test in test_forecast

@markharley
Copy link
Collaborator Author

Was looking at the failed build and noticed that there were some import issues.

  • Python 3.6 doesn't include data classes, but there is a backport that could be added to the build script (and setup).
  • Also saw that importing holidays fails. That's not part of the [test] requirements in setup.py (see https://github.com/markharley/FLAML/blob/fafd6524c9466d8ab9c9049bb6102c34cd5819ec/setup.py#L91).
  • It sounds like new tests were added for time series tasks? Should the requirements in [ts_forecast] and [forecast] also be included in [test] in that case?

The holidays are only used in time series forecasting which is optional. Make sure the import statement is invoked only under the time series environment. See https://github.com/microsoft/FLAML/blob/main/flaml/automl.py#L1096 as an example for nlp (another optional environment).

Hey! Thanks for the review @liususan091219. I've applied your relative import suggestions. On holidays, it turns out that it wasn't called outside of tests at present and so I've removed the offending test and redundant code paths

@markharley
Copy link
Collaborator Author

Would it be more reasonable to put the "nlp" and "time_series" folders into the "automl" folder?

Hey! I think we ended up with this layout to avoid some circular imports, but I'll have another try at refactoring these into the automl subpackage 😄

@sonichi
Copy link
Contributor

sonichi commented Oct 2, 2022

Some checks failed. I wonder if it will be easier to break the PR down to smaller PRs. For example, the first PR to make is to just create the automl subpackage and move files to corresponding locations. Otherwise this PR would take a long time to merge and you will get conflicts often.

@qingyun-wu
Copy link
Contributor

Thank you @markharley! We indeed need to re-organize the whole structure of the flaml folder considering the need of adding an automl folder. I am attaching a proposal for the new structure (considering all content in flaml, not just the changes involved in this PR).

In this PR, perhaps you can just make .py files about automl and the time_series folder in the right place. We can come up with a plan with the other changes (and perhaps also discuss this proposed structure plan in the maintainer meeting on 10/10).

  • automl
    • nlp
    • time_series
      automl.py
      data.py
      ml.py
      model.py
      train_log.py
      config.py
      [and other files need to be added in the automl folder]
  • tune
    • searcher
    • scheduler
      [and existing files in the tune foler]
  • onlineml
  • default

Comment on lines +61 to +66
def __init__(
self,
task_name: str,
X_train: Union[np.ndarray, pd.DataFrame],
y_train: Union[np.ndarray, pd.DataFrame, pd.Series],
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor here indicates that a Task object needs to know about X_train and y_train when it's constructed.
The implementation indicates that no reference to the dataset is stored inside the Task object.
Why? What's the relation between a Task object and a dataset exactly?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example, at this stage we could infer it it's a binary or multi-category classification, and whether it's a univariate or panel regression, so the user wouldn't have the hassle of specifying that


def __init__(
self,
task_name: str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs documentation on the allowed task_name name?

"scipy>=1.4.1",
"pandas>=1.1.4",
"scikit-learn>=0.24",
"dataclasses>=0.8 ; python_version=='3.6'",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"dataclasses>=0.8 ; python_version=='3.6'",
"dataclasses>=0.8 ; python_version>='3.6'",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants