Skip to content

Commit d306d88

Browse files
committed
MAINT rework documentation
1 parent 822d33a commit d306d88

File tree

3 files changed

+54
-43
lines changed

3 files changed

+54
-43
lines changed

autosklearn/estimators.py

Lines changed: 41 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -113,32 +113,46 @@ def __init__(self,
113113
----------
114114
time_left_for_this_task : int, optional (default=3600)
115115
Time limit in seconds for the search of appropriate
116-
models. By increasing this value, *auto-sklearn* will find better
117-
configurations.
116+
models. By increasing this value, *auto-sklearn* has a higher
117+
chance of finding better models.
118118
119119
per_run_time_limit : int, optional (default=360)
120-
Time limit for a single call to machine learning model.
120+
Time limit for a single call to the machine learning model.
121+
Model fitting will be terminated if the machine learning
122+
algorithm runs over the time limit. Set this value high enough so
123+
that typical machine learning algorithms can be fit on the
124+
training data.
121125
122126
initial_configurations_via_metalearning : int, optional (default=25)
127+
Initialize the hyperparameter optimization algorithm with this
128+
many configurations which worked well on previously seen
129+
datasets. Disable if the hyperparameter optimization algorithm
130+
should start from scratch.
123131
124132
ensemble_size : int, optional (default=50)
133+
Number of models added to the ensemble built by `Ensemble
134+
selection from libraries of models. Models are drawn with
135+
replacement.
125136
126137
ensemble_nbest : int, optional (default=50)
138+
Only consider the ``ensemble_nbest`` models when building an
139+
ensemble. Implements `Model Library Pruning` from `Getting the
140+
most out of ensemble selection`.
127141
128142
seed : int, optional (default=1)
129143
130144
ml_memory_limit : int, optional (3000)
131-
Memory limit for the machine learning algorithm. If the machine
132-
learning algorithm allocates tries to allocate more memory,
133-
its evaluation will be stopped.
145+
Memory limit in MB for the machine learning algorithm.
146+
`auto-sklearn` will stop fitting the machine learning algorithm if
147+
it tries to allocate more than `ml_memory_limit` MB.
134148
135149
include_estimators : dict, optional (None)
136-
If None all possible estimators are used. Otherwise specifies set of
137-
estimators to use
150+
If None, all possible estimators are used. Otherwise specifies
151+
set of estimators to use
138152
139153
include_preprocessors : dict, optional (None)
140-
If None all possible preprocessors are used. Otherwise specifies set of
141-
preprocessors to use
154+
If None all possible preprocessors are used. Otherwise specifies set
155+
of preprocessors to use
142156
143157
resampling_strategy : string, optional ('holdout')
144158
how to to handle overfitting, might need 'resampling_strategy_arguments'
@@ -148,24 +162,21 @@ def __init__(self,
148162
fit where possible
149163
* 'cv': crossvalidation, requires 'folds'
150164
* 'nested-cv': crossvalidation, requires 'outer-folds, 'inner-folds'
151-
* 'partial-cv': crossvalidation, requires 'folds' , calls
152-
iterative fit where possible
153165
154166
resampling_strategy_arguments : dict, optional if 'holdout' (None)
155167
Additional arguments for resampling_strategy
156168
* 'holdout': None
157169
* 'holdout-iterative-fit': None
158170
* 'cv': {'folds': int}
159171
* 'nested-cv': {'outer_folds': int, 'inner_folds'
160-
* 'partial-cv': {'folds': int}
161172
162173
tmp_folder : string, optional (None)
163-
folder to store configuration output, if None automatically use
164-
/tmp/autosklearn_tmp_$pid_$random_number
174+
folder to store configuration output and log files, if ``None``
175+
automatically use ``/tmp/autosklearn_tmp_$pid_$random_number``
165176
166177
output_folder : string, optional (None)
167-
folder to store trained models, if None automatically use
168-
/tmp/autosklearn_output_$pid_$random_number
178+
folder to store predictions for optional test set, if ``None``
179+
automatically use ``/tmp/autosklearn_output_$pid_$random_number``
169180
170181
delete_tmp_folder_after_terminate: string, optional (True)
171182
remove tmp_folder, when finished. If tmp_folder is None
@@ -176,10 +187,10 @@ def __init__(self,
176187
output_dir will always be deleted
177188
178189
shared_mode: bool, optional (False)
179-
run smac in shared-model-node. This only works if arguments
180-
tmp_folder and output_folder are given and sets both
181-
delete_tmp_folder_after_terminate and
182-
delete_output_folder_after_terminate to False.
190+
Run smac in shared-model-node. This only works if arguments
191+
``tmp_folder`` and ``output_folder`` are given and both
192+
``delete_tmp_folder_after_terminate`` and
193+
``delete_output_folder_after_terminate`` are set to False.
183194
184195
Attributes
185196
----------
@@ -193,6 +204,14 @@ def __init__(self,
193204
cross-validation folds
194205
* ``cv_validation_scores``, the list of scores for each fold
195206
207+
cv_results_ : dict of numpy (masked) ndarrays
208+
A dict with keys as column headers and values as columns, that can be
209+
imported into a pandas ``DataFrame``.
210+
211+
This attribute is a backward port to already support the advanced
212+
output of scikit-learn 0.18. Not all keys returned by scikit-learn
213+
are supported yet.
214+
196215
"""
197216
self.time_left_for_this_task = time_left_for_this_task
198217
self.per_run_time_limit = per_run_time_limit
@@ -276,7 +295,7 @@ def fit(self, X, y,
276295
metric='acc_metric',
277296
feat_type=None,
278297
dataset_name=None):
279-
"""Fit *autosklearn* to given training set (X, y).
298+
"""Fit *auto-sklearn* to given training set (X, y).
280299
281300
Parameters
282301
----------
@@ -308,8 +327,6 @@ def fit(self, X, y,
308327
self
309328
310329
"""
311-
# Fit is supposed to be idempotent!
312-
# But not if we use share_mode.
313330
return super(AutoSklearnClassifier, self).fit(X, y, metric, feat_type, dataset_name)
314331

315332
def predict(self, X):

doc/index.rst

Lines changed: 11 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -27,19 +27,13 @@ Example
2727
*******
2828

2929
>>> import autosklearn.classification
30+
>>> import sklearn.cross_validation
3031
>>> import sklearn.datasets
3132
>>> digits = sklearn.datasets.load_digits()
3233
>>> X = digits.data
3334
>>> y = digits.target
34-
>>> import numpy as np
35-
>>> indices = np.arange(X.shape[0])
36-
>>> np.random.shuffle(indices)
37-
>>> X = X[indices]
38-
>>> y = y[indices]
39-
>>> X_train = X[:1000]
40-
>>> y_train = y[:1000]
41-
>>> X_test = X[1000:]
42-
>>> y_test = y[1000:]
35+
>>> X_train, X_test, y_train, y_test = \
36+
sklearn.cross_validation.train_test_split(X, y, random_state=1)
4337
>>> automl = autosklearn.classification.AutoSklearnClassifier()
4438
>>> automl.fit(X_train, y_train)
4539
>>> print(automl.score(X_test,y_test))
@@ -69,8 +63,7 @@ Then install *auto-sklearn*
6963
pip install auto-sklearn
7064
7165
We recommend installing *auto-sklearn* into a `virtual environment
72-
<http://docs.python-guide.org/en/latest/dev/virtualenvs/>`_ or into an
73-
`anaconda environment <https://www.continuum.io/downloads>`_..
66+
<http://docs.python-guide.org/en/latest/dev/virtualenvs/>`_.
7467

7568
Manual
7669
******
@@ -83,14 +76,13 @@ Manual
8376
License
8477
*******
8578
*auto-sklearn* is licensed the same way as *scikit-learn*,
86-
namely the 3-clause BSD license. The subprojects it uses, most notably SMAC,
87-
can have different licenses.
79+
namely the 3-clause BSD license.
8880

8981
Citing auto-sklearn
9082
*******************
9183

9284
If you use auto-sklearn in a scientific publication, we would appreciate
93-
citations to the following paper:
85+
references to the following paper:
9486

9587

9688
`Efficient and Robust Automated Machine Learning
@@ -113,9 +105,11 @@ citations to the following paper:
113105

114106
Contributing
115107
************
116-
*auto-sklearn* is developed mainly by the `Machine Learning for Automated
117-
Algorithm Design <http://aad.informatik.uni-freiburg.de>`_ group at the
118-
University of Freiburg.
108+
109+
We appreciate all contribution to auto-sklearn, from bug reports,
110+
documentation to new features. If you want to contribute to the code, you can
111+
pick an issue from the `issue tracker <https://github.com/automl/auto-sklearn/issues>`_
112+
which is marked with `Needs contributer`.
119113

120114
.. note::
121115

doc/manual.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,8 @@ the model building procedure may use up to all cores. Such behaviour is
3434
unintended by auto-sklearn and is most likely due to numpy being installed
3535
from `pypi` as a binary wheel (`see here http://scikit-learn-general.narkive
3636
.com/44ywvAHA/binary-wheel-packages-for-linux-are-coming`_). Executing
37-
`export OPENBLAS_NUM_THREADS=1` should disable such behaviours and make numpy
38-
only use a single core at a time.
37+
``export OPENBLAS_NUM_THREADS=1`` should disable such behaviours and make numpy
38+
only use a single core at a time.
3939

4040
Model persistence
4141
*****************

0 commit comments

Comments
 (0)