Skip to content

Commit 611cf5c

Browse files
authored
Merge pull request #1181 from automl/master
Synchronize dev and master again
2 parents dbc7170 + 6f1e5c3 commit 611cf5c

File tree

46 files changed

+44176
-13006
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+44176
-13006
lines changed

.github/ISSUE_TEMPLATE/question.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
---
2+
name: Question
3+
about: Ask a question!
4+
title: "[Question] My Question?"
5+
labels: ''
6+
assignees: ''
7+
8+
---
9+
10+
# Short Question Description
11+
A clear single sentence question we can try to help with?
12+
13+
With some extra context to follow it up. This way the question is clear for both you and us without it being lost in the paragraph.
14+
Some useful information to help us with your question:
15+
* How did this question come about?
16+
* Would a small code snippet help?
17+
* What have you already looked at?
18+
19+
Before you ask, please have a look at the
20+
* [Documentation](https://automl.github.io/auto-sklearn/master/manual.html)
21+
* If it's related but not clear, please include it in your question with a link, we'll try to make it better!
22+
* [Examples](https://automl.github.io/auto-sklearn/master/examples/index.html)
23+
* Likewise, an example can answer many questions! However we can't cover all question with examples but if you think your question would benefit from an example, let us know!
24+
* [Issues](https://github.com/automl/auto-sklearn/issues?q=label%3Aquestion+)
25+
* We try to label all questions with the label `Question`, maybe someone has already asked. If the question is about a feature, try searching more of the issues. If you find something related but doesn't directly answer your question, please link to it with #(issue number)!
26+
27+
# System Details (if relevant)
28+
* Which version of `auto-sklearn` are you using?
29+
* Are you running this on Linux / Mac / ... ?

.github/workflows/stale.yaml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ jobs:
99
steps:
1010
- uses: actions/stale@v3
1111
with:
12-
days-before-stale: 60
12+
days-before-stale: 30
1313
days-before-close: 7
1414
stale-issue-message: >
1515
This issue has been automatically marked as stale because it has not had
@@ -18,5 +18,7 @@ jobs:
1818
close-issue-message: >
1919
This issue has been automatically closed due to inactivity.
2020
stale-issue-label: 'stale'
21-
only-issue-labels: 'Answered,Feedback-Required,invalid,wontfix'
21+
# Only issues with ANY of these labels are checked.
22+
# Separate multiple labels with commas (eg. "incomplete,waiting-feedback").
23+
any-of-labels: 'Answered,Feedback-Required,invalid,wontfix'
2224
exempt-all-milestones: true

COPYING

Lines changed: 0 additions & 31 deletions
This file was deleted.

LICENSE.txt

Lines changed: 24 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,29 @@
1-
Copyright (c) 2014, Matthias Feurer
1+
BSD 3-Clause License
2+
3+
Copyright (c) 2014-2021, AutoML Freiburg
24
All rights reserved.
35

46
Redistribution and use in source and binary forms, with or without
57
modification, are permitted provided that the following conditions are met:
6-
* Redistributions of source code must retain the above copyright
7-
notice, this list of conditions and the following disclaimer.
8-
* Redistributions in binary form must reproduce the above copyright
9-
notice, this list of conditions and the following disclaimer in the
10-
documentation and/or other materials provided with the distribution.
11-
* Neither the name of the <organization> nor the
12-
names of its contributors may be used to endorse or promote products
13-
derived from this software without specific prior written permission.
148

15-
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
16-
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
17-
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
18-
DISCLAIMED. IN NO EVENT SHALL <COPYRIGHT HOLDER> BE LIABLE FOR ANY
19-
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
20-
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
21-
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
22-
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
23-
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
24-
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
9+
1. Redistributions of source code must retain the above copyright notice, this
10+
list of conditions and the following disclaimer.
11+
12+
2. Redistributions in binary form must reproduce the above copyright notice,
13+
this list of conditions and the following disclaimer in the documentation
14+
and/or other materials provided with the distribution.
15+
16+
3. Neither the name of the copyright holder nor the names of its
17+
contributors may be used to endorse or promote products derived from
18+
this software without specific prior written permission.
19+
20+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
21+
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22+
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
23+
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
24+
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25+
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
26+
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
27+
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
28+
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29+
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

MANIFEST.in

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,5 +4,6 @@ recursive-include autosklearn/metalearning/files *.txt
44
include autosklearn/util/logging.yaml
55
include requirements.txt
66
include autosklearn/requirements.txt
7-
recursive-include autosklearn/experimental/askl2_portfolios *.json
7+
recursive-include autosklearn/experimental/ *.json
88
include autosklearn/experimental/askl2_training_data.json
9+
include LICENSE.txt

autosklearn/__version__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
"""Version information."""
22

33
# The following line *must* be the last in the module, exactly as formatted:
4-
__version__ = "0.12.6"
4+
__version__ = "0.12.7"

autosklearn/automl.py

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -885,9 +885,10 @@ def subsample_if_too_large(
885885
task: int,
886886
):
887887
if memory_limit and isinstance(X, np.ndarray):
888+
888889
if X.dtype == np.float32:
889890
multiplier = 4
890-
elif X.dtype in (np.float64, np.float):
891+
elif X.dtype in (np.float64, float):
891892
multiplier = 8
892893
elif (
893894
# In spite of the names, np.float96 and np.float128
@@ -903,6 +904,21 @@ def subsample_if_too_large(
903904
multiplier = 8
904905
logger.warning('Unknown dtype for X: %s, assuming it takes 8 bit/number',
905906
str(X.dtype))
907+
908+
megabytes = X.shape[0] * X.shape[1] * multiplier / 1024 / 1024
909+
if memory_limit <= megabytes * 10 and X.dtype != np.float32:
910+
cast_to = {
911+
8: np.float32,
912+
16: np.float64,
913+
}.get(multiplier, np.float32)
914+
logger.warning(
915+
'Dataset too large for memory limit %dMB, reducing the precision from %s to %s',
916+
memory_limit,
917+
X.dtype,
918+
cast_to,
919+
)
920+
X = X.astype(cast_to)
921+
906922
megabytes = X.shape[0] * X.shape[1] * multiplier / 1024 / 1024
907923
if memory_limit <= megabytes * 10:
908924
new_num_samples = int(

autosklearn/experimental/askl2.py

Lines changed: 68 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -15,53 +15,60 @@
1515
import autosklearn
1616
from autosklearn.classification import AutoSklearnClassifier
1717
import autosklearn.experimental.selector
18-
from autosklearn.metrics import Scorer
18+
from autosklearn.metrics import Scorer, balanced_accuracy, roc_auc, log_loss, accuracy
1919

20+
metrics = (balanced_accuracy, roc_auc, log_loss)
21+
selector_files = {}
2022
this_directory = pathlib.Path(__file__).resolve().parent
21-
training_data_file = this_directory / 'askl2_training_data.json'
22-
with open(training_data_file) as fh:
23-
training_data = json.load(fh)
24-
fh.seek(0)
25-
m = hashlib.md5()
26-
m.update(fh.read().encode('utf8'))
27-
training_data_hash = m.hexdigest()[:10]
28-
selector_filename = "askl2_selector_%s_%s_%s.pkl" % (
29-
autosklearn.__version__,
30-
sklearn.__version__,
31-
training_data_hash
32-
)
33-
selector_directory = os.environ.get('XDG_CACHE_HOME')
34-
if selector_directory is None:
35-
selector_directory = pathlib.Path.home()
36-
selector_directory = pathlib.Path(selector_directory).joinpath('auto-sklearn').expanduser()
37-
selector_file = selector_directory / selector_filename
38-
metafeatures = pd.DataFrame(training_data['metafeatures'])
39-
y_values = np.array(training_data['y_values'])
40-
strategies = training_data['strategies']
41-
minima_for_methods = training_data['minima_for_methods']
42-
maxima_for_methods = training_data['maxima_for_methods']
43-
if not selector_file.exists():
44-
selector = autosklearn.experimental.selector.OneVSOneSelector(
45-
configuration=training_data['configuration'],
46-
default_strategy_idx=strategies.index('RF_SH-eta4-i_holdout_iterative_es_if'),
47-
rng=1,
23+
for metric in metrics:
24+
training_data_file = this_directory / metric.name / 'askl2_training_data.json'
25+
with open(training_data_file) as fh:
26+
training_data = json.load(fh)
27+
fh.seek(0)
28+
m = hashlib.md5()
29+
m.update(fh.read().encode('utf8'))
30+
training_data_hash = m.hexdigest()[:10]
31+
selector_filename = "askl2_selector_%s_%s_%s_%s.pkl" % (
32+
autosklearn.__version__,
33+
sklearn.__version__,
34+
metric.name,
35+
training_data_hash
4836
)
49-
selector.fit(
50-
X=metafeatures,
51-
y=y_values,
52-
methods=strategies,
53-
minima=minima_for_methods,
54-
maxima=maxima_for_methods,
55-
)
56-
selector_file.parent.mkdir(exist_ok=True, parents=True)
57-
try:
58-
with open(selector_file, 'wb') as fh:
59-
pickle.dump(selector, fh)
60-
except Exception as e:
61-
print("AutoSklearn2Classifier needs to create a selector file under "
62-
"the user's home directory or XDG_CACHE_HOME. Nevertheless "
63-
"the path {} is not writable.".format(selector_file))
64-
raise e
37+
selector_directory = os.environ.get('XDG_CACHE_HOME')
38+
if selector_directory is None:
39+
selector_directory = pathlib.Path.home()
40+
selector_directory = pathlib.Path(selector_directory).joinpath('auto-sklearn').expanduser()
41+
selector_files[metric.name] = selector_directory / selector_filename
42+
metafeatures = pd.DataFrame(training_data['metafeatures'])
43+
strategies = training_data['strategies']
44+
y_values = pd.DataFrame(training_data['y_values'], columns=strategies, index=metafeatures.index)
45+
minima_for_methods = training_data['minima_for_methods']
46+
maxima_for_methods = training_data['maxima_for_methods']
47+
default_strategies = training_data['tie_break_order']
48+
if not selector_files[metric.name].exists():
49+
selector = autosklearn.experimental.selector.OVORF(
50+
configuration=training_data['configuration'],
51+
random_state=np.random.RandomState(1),
52+
n_estimators=500,
53+
tie_break_order=default_strategies,
54+
)
55+
selector = autosklearn.experimental.selector.FallbackWrapper(selector, default_strategies)
56+
selector.fit(
57+
X=metafeatures,
58+
y=y_values,
59+
minima=minima_for_methods,
60+
maxima=maxima_for_methods,
61+
)
62+
selector_files[metric.name].parent.mkdir(exist_ok=True, parents=True)
63+
64+
try:
65+
with open(selector_files[metric.name], 'wb') as fh:
66+
pickle.dump(selector, fh)
67+
except Exception as e:
68+
print("AutoSklearn2Classifier needs to create a selector file under "
69+
"the user's home directory or XDG_CACHE_HOME. Nevertheless "
70+
"the path {} is not writable.".format(selector_files[metric.name]))
71+
raise e
6572

6673

6774
class SmacObjectCallback:
@@ -286,7 +293,7 @@ def __init__(
286293
Attributes
287294
----------
288295
289-
cv_results\_ : dict of numpy (masked) ndarrays
296+
cv_results_ : dict of numpy (masked) ndarrays
290297
A dict with keys as column headers and values as columns, that can be
291298
imported into a pandas ``DataFrame``.
292299
@@ -334,10 +341,22 @@ def fit(self, X, y,
334341
feat_type=None,
335342
dataset_name=None):
336343

344+
if self.metric is None:
345+
if len(y.shape) == 1 or y.shape[1] == 1:
346+
self.metric = accuracy
347+
else:
348+
self.metric = log_loss
349+
350+
if self.metric in metrics:
351+
metric_name = self.metric.name
352+
selector_file = selector_files[metric_name]
353+
else:
354+
metric_name = 'balanced_accuracy'
355+
selector_file = selector_files[metric_name]
337356
with open(selector_file, 'rb') as fh:
338357
selector = pickle.load(fh)
339358

340-
metafeatures = np.array([len(np.unique(y)), X.shape[1], X.shape[0]])
359+
metafeatures = pd.DataFrame({dataset_name: [X.shape[1], X.shape[0]]}).transpose()
341360
selection = np.argmax(selector.predict(metafeatures))
342361
automl_policy = strategies[selection]
343362

@@ -388,7 +407,9 @@ def fit(self, X, y,
388407
else:
389408
resampling_strategy_kwargs = None
390409

391-
portfolio_file = this_directory / 'askl2_portfolios' / ('%s.json' % automl_policy)
410+
portfolio_file = (
411+
this_directory / metric_name / 'askl2_portfolios' / ('%s.json' % automl_policy)
412+
)
392413
with open(portfolio_file) as fh:
393414
portfolio_json = json.load(fh)
394415
portfolio = portfolio_json['portfolio']

0 commit comments

Comments
 (0)