Skip to content

Commit 9128804

Browse files
Merge pull request #34 from aced-differentiate/update-sl-interfaces
Update sl interfaces
2 parents c3f9813 + bdc4f8a commit 9128804

File tree

13 files changed

+2287
-1039
lines changed

13 files changed

+2287
-1039
lines changed

docs/README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ For additional details please see the User Guide, Tutorials, and API sections.
1616
One of the core philosophies of AutoCat is to provide modular and extensible tooling to
1717
facilitate closed-loop computational materials discovery workflows. Within this submodule
1818
are classes for defining a design space, featurization,
19-
regression, and defining a closed-loop sequential learning iterator. The
19+
regression, selecting candidate systems, and defining a closed-loop sequential learning iterator. The
2020
key classes intended for each of these purposes are:
2121

2222
- [**`DesignSpace`**](User_Guide/Learning/sequential#designspace): define a design space to explore
@@ -25,6 +25,9 @@ key classes intended for each of these purposes are:
2525

2626
- [**`Predictor`**](User_Guide/Learning/predictors): a regressor for predicting materials properties
2727

28+
- [**`CandidateSelector`**](User_Guide/Learning/sequential.md#candidateselector): propose candidate system(s)
29+
for evaluation
30+
2831
- [**`SequentialLearner`**](User_Guide/Learning/sequential#sequentiallearner): define a closed-loop iterator
2932

3033

docs/Tutorials/pred_h.md

Lines changed: 71 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,14 @@ function.
1414
```py
1515
>>> # Generate the clean single-atom alloy structures
1616
>>> from autocat.saa import generate_saa_structures
17-
>>> from autocat.utils import extract_structures
17+
>>> from autocat.utils import flatten_structures_dict
1818
>>> saa_struct_dict = generate_saa_structures(
1919
... ["Fe", "Cu", "Au"],
2020
... ["Pt", "Pd", "Ni"],
2121
... facets={"Fe":["110"], "Cu":["111"], "Au":["111"]},
2222
... n_fixed_layers=2,
2323
... )
24-
>>> saa_structs = extract_structures(saa_struct_dict)
24+
>>> saa_structs = flatten_structures_dict(saa_struct_dict)
2525
```
2626

2727
Now that we have the clean structures, let's adsorb hydrogen on the surface.
@@ -33,14 +33,10 @@ function.
3333
```py
3434
>>> # Adsorb hydrogen onto each of the generated SAA surfaces
3535
>>> from autocat.adsorption import place_adsorbate
36+
>>> from ase import Atoms
3637
>>> ads_structs = []
3738
>>> for clean_struct in saa_structs:
38-
... ads_dict = place_adsorbate(
39-
... clean_struct,
40-
... "H",
41-
... (0.,0.)
42-
... )
43-
... ads_struct = extract_structures(ads_dict)[0]
39+
... ads_struct = place_adsorbate(clean_struct, Atoms("H"))
4440
... ads_structs.append(ads_struct)
4541
```
4642

@@ -62,6 +58,16 @@ if any of the labels for a structure are unknown, it can be included as a `numpy
6258
```py
6359
>>> from autocat.learning.sequential import DesignSpace
6460
>>> design_space = DesignSpace(ads_structs, labels)
61+
>>> design_space
62+
+-------------------------+-------------------------------------------+
63+
| | DesignSpace |
64+
+-------------------------+-------------------------------------------+
65+
| total # of systems | 9 |
66+
| # of unlabelled systems | 0 |
67+
| unique species present | ['Fe', 'H', 'Pt', 'Pd', 'Ni', 'Cu', 'Au'] |
68+
| maximum label | 1.0173326963281424 |
69+
| minimum label | -1.4789390894451206 |
70+
+-------------------------+-------------------------------------------+
6571
```
6672

6773
## Setting up a `Predictor`
@@ -71,43 +77,79 @@ When setting up our `Predictor` we now have two choices to make:
7177
1. The technique to be used for featurizing the systems
7278
2. The regression model to be used for training and predictions
7379

74-
Internally, the `Predictor` will contain a `Featurizer` object which contains all of
75-
our choices for how to featurize the systems. Our choice of featurizer class and
76-
the associated kwargs are specified via the `featurizer_class` and
77-
`featurization_kwargs` arguments, respectively. By providing the design space structures
80+
Internally, the `Predictor` will contain a `Featurizer` object (that the user supplies)
81+
which stores all of our choices for how to featurize the systems. Our choice of
82+
featurizer class and the associated kwargs are specified via the `featurizer_class` and
83+
`kwargs` arguments, respectively. By providing the design space structures
7884
some of the kwargs related to the featurization (e.g. maximum structure size) can be
7985
automatically obtained.
8086

81-
Similarly, we can specify the regressor to be used within the `model_class` and
82-
`model_kwargs` arguments. The class should be "`sklearn`-like" with `fit` and
83-
`predict` methods.
87+
Let's featurize the hydrogen environment via `dscribe`'s `SOAP` class
88+
```py
89+
>>> from autocat.learning.featurizers import Featurizer
90+
>>> from dscribe.descriptors.soap import SOAP
91+
>>> featurizer = Featurizer(
92+
... featurizer_class=SOAP,
93+
... kwargs={"rcut": 7.0, "nmax": 8, "lmax": 8},
94+
... design_space_structures=design_space.design_space_structures
95+
... )
96+
>>> featurizer
97+
+-----------------------------------+-------------------------------------------+
98+
| | Featurizer |
99+
+-----------------------------------+-------------------------------------------+
100+
| class | dscribe.descriptors.soap.SOAP |
101+
| kwargs | {'rcut': 7.0, 'nmax': 8, 'lmax': 8} |
102+
| species list | ['Fe', 'Ni', 'Pt', 'Pd', 'Au', 'Cu', 'H'] |
103+
| maximum structure size | 37 |
104+
| preset | None |
105+
| design space structures provided? | True |
106+
+-----------------------------------+-------------------------------------------+
107+
```
84108

85-
Let's featurize the hydrogen environment via `dscribe`'s `SOAP` class with
86-
`sklearn`'s `GaussianProcessRegressor` for regression.
109+
Similarly, we can specify the regressor to be used. The class should
110+
be "`sklearn`-like" with `fit` and `predict` methods.
87111

112+
Here we will use `sklearn`'s `GaussianProcessRegressor` for regression.
88113
```py
89114
>>> from sklearn.gaussian_process import GaussianProcessRegressor
90115
>>> from sklearn.gaussian_process.kernels import RBF
91-
>>> from dscribe import SOAP
92-
>>> from autocat.learning.predictors import Predictor
93116
>>> kernel = RBF(1.5)
94-
>>> model_kwargs={"kernel": kernel}
95-
>>> featurization_kwargs={
96-
... "design_space_structures": design_space.design_space_structures,
97-
... "kwargs": {"rcut": 7.0, "nmax": 8, "lmax": 8}
98-
... }
117+
>>> regressor = GaussianProcessRegressor(kernel=kernel)
118+
```
119+
120+
Now that we have both our `Featurizer` and regressor, we can construct
121+
a `Predictor` object.
122+
123+
```py
124+
>>> from autocat.learning.predictors import Predictor
99125
>>> predictor = Predictor(
100-
... model_class=GaussianProcessRegressor,
101-
... model_kwargs=model_kwargs,
102-
... featurizer_class=SOAP,
103-
... featurization_kwargs=featurization_kwargs,
126+
... regressor=regressor,
127+
... featurizer=featurizer,
104128
... )
129+
>>> predictor
130+
+-----------+------------------------------------------------------------------+
131+
| | Predictor |
132+
+-----------+------------------------------------------------------------------+
133+
| regressor | <class 'sklearn.gaussian_process._gpr.GaussianProcessRegressor'> |
134+
| is fit? | False |
135+
+-----------+------------------------------------------------------------------+
136+
+-----------------------------------+-------------------------------------------+
137+
| | Featurizer |
138+
+-----------------------------------+-------------------------------------------+
139+
| class | dscribe.descriptors.soap.SOAP |
140+
| kwargs | {'rcut': 7.0, 'nmax': 8, 'lmax': 8} |
141+
| species list | ['Fe', 'Ni', 'Pt', 'Pd', 'Au', 'Cu', 'H'] |
142+
| maximum structure size | 37 |
143+
| preset | None |
144+
| design space structures provided? | True |
145+
+-----------------------------------+-------------------------------------------+
105146
```
106147

107148
## Training and making predictions
108149

109150
With our newly defined `Predictor` we can train it using data from our
110-
`DesignSpace` and the `fit` method.
151+
`DesignSpace` and the `fit` method. Again, please note we are using random labels
152+
here, solely for demonstration purposes.
111153

112154
```py
113155
>>> train_structures = design_space.design_space_structures[:5]

docs/Tutorials/sl.md

Lines changed: 115 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,13 @@ the structures will be clean mono-elemental surfaces which we can generate via
1010
```py
1111
>>> # Generate the clean surfaces
1212
>>> from autocat.surface import generate_surface_structures
13-
>>> from autocat.utils import extract_structures
13+
>>> from autocat.utils import flatten_structures_dict
1414
>>> surfs_dict = generate_surface_structures(
1515
... ["Pt", "Cu", "Li", "Ti"],
1616
... n_fixed_layers=2,
1717
... default_lat_param_lib="pbe_fd"
1818
... )
19-
>>> surfs = extract_structures(surfs_dict)
19+
>>> surfs = flatten_structures_dict(surfs_dict)
2020
```
2121

2222
In this case we specified that the default lattice parameters
@@ -30,14 +30,24 @@ to your design space!
3030
```py
3131
>>> # Generate the labels for each structure
3232
>>> import numpy as np
33-
>>> labels = np.random.uniform(-1.5,1.5,size=len(ads_structs))
33+
>>> labels = np.random.uniform(-1.5,1.5,size=len(surfs))
3434
```
3535

3636
Taking the structures and labels we can define our `DesignSpace`.
3737

3838
```py
3939
>>> from autocat.learning.sequential import DesignSpace
4040
>>> design_space = DesignSpace(surfs, labels)
41+
>>> design_space
42+
+-------------------------+--------------------------+
43+
| | DesignSpace |
44+
+-------------------------+--------------------------+
45+
| total # of systems | 10 |
46+
| # of unlabelled systems | 0 |
47+
| unique species present | ['Pt', 'Cu', 'Li', 'Ti'] |
48+
| maximum label | 1.1205404366846423 |
49+
| minimum label | -1.3259701029215702 |
50+
+-------------------------+--------------------------+
4151
```
4252

4353
## Doing a single simulated sequential learning run
@@ -51,35 +61,117 @@ returned at the end of the run.
5161
As before, we will need to make choices with regard to the `Predictor` settings.
5262
In this case we will use a `SineMatrix` featurizer alongside a `GaussianProcessRegressor`.
5363

64+
```py
65+
>>> from sklearn.gaussian_process import GaussianProcessRegressor
66+
>>> from sklearn.gaussian_process.kernels import RBF
67+
>>> from dscribe.descriptors.sinematrix import SineMatrix
68+
>>> from autocat.learning.featurizers import Featurizer
69+
>>> from autocat.learning.predictors import Predictor
70+
>>> kernel = RBF(1.5)
71+
>>> regressor = GaussianProcessRegressor(kernel=kernel)
72+
>>> featurizer = Featurizer(
73+
... featurizer_class=SineMatrix,
74+
... design_space_structures=design_space.design_space_structures
75+
... )
76+
>>> predictor = Predictor(regressor=regressor, featurizer=featurizer)
77+
>>> predictor
78+
+-----------+------------------------------------------------------------------+
79+
| | Predictor |
80+
+-----------+------------------------------------------------------------------+
81+
| regressor | <class 'sklearn.gaussian_process._gpr.GaussianProcessRegressor'> |
82+
| is fit? | False |
83+
+-----------+------------------------------------------------------------------+
84+
+-----------------------------------+-------------------------------------------+
85+
| | Featurizer |
86+
+-----------------------------------+-------------------------------------------+
87+
| class | dscribe.descriptors.sinematrix.SineMatrix |
88+
| kwargs | None |
89+
| species list | ['Li', 'Ti', 'Pt', 'Cu'] |
90+
| maximum structure size | 36 |
91+
| preset | None |
92+
| design space structures provided? | True |
93+
+-----------------------------------+-------------------------------------------+
94+
```
95+
5496
We also need to select parameters with regard to candidate selection.
5597
This includes the acquisition function to be used,
5698
target window (if applicable), and number of candidates to pick at each iteration.
99+
This can be done via the `CandidateSelector` object.
57100
Let's use a maximum uncertainty acquisition function to pick candidates based on their
58-
associated uncertainty values. We'll also restrict the run to conduct 5 iterations.
101+
associated uncertainty values.
102+
103+
```py
104+
>>> from autocat.learning.sequential import CandidateSelector
105+
>>> candidate_selector = CandidateSelector(
106+
... acquisition_function="MU",
107+
... num_candidates_to_pick=1
108+
... )
109+
>>> candidate_selector
110+
+-------------------------------+--------------------+
111+
| | Candidate Selector |
112+
+-------------------------------+--------------------+
113+
| acquisition function | MU |
114+
| # of candidates to pick | 1 |
115+
| target window | None |
116+
| include hhi? | False |
117+
| include segregation energies? | False |
118+
+-------------------------------+--------------------+
119+
```
120+
121+
Now we have everything we need to conduct a simulated sequential learning loop.
122+
We'll restrict the run to conduct 5 iterations.
59123

60124
```py
61-
>>> from sklearn.gaussian_process import GaussianProcessRegressor
62-
>>> from dscribe import SineMatrix
63125
>>> from autocat.learning.sequential import simulated_sequential_learning
64-
>>> kernel = RBF(1.5)
65-
>>> model_kwargs = {"kernel": kernel}
66-
>>> featurization_kwargs = {
67-
... "design_space_structures": design_space.design_space_structures,
68-
... }
69-
>>> predictor_kwargs = {
70-
... "model_class": GaussianProcessRegressor,
71-
... "model_kwargs": model_kwargs,
72-
... "featurizer_class": SineMatrix,
73-
... "featurization_kwargs": featurization_kwargs
74-
... }
75-
>>> candidate_selection_kwargs = {"aq": "MU"}
76126
>>> sim_seq_learn = simulated_sequential_learning(
77127
... full_design_space=design_space,
128+
... candidate_selector=candidate_selector,
129+
... predictor=predictor,
78130
... init_training_size=1,
79131
... number_of_sl_loops=5,
80-
... candidate_selection_kwargs=candidate_selection_kwargs,
81-
... predictor_kwargs=predictor_kwargs,
82132
... )
133+
>>> sim_seq_learn
134+
+----------------------------------+--------------------+
135+
| | Sequential Learner |
136+
+----------------------------------+--------------------+
137+
| iteration count | 6 |
138+
| next candidate system structures | ['Cu36'] |
139+
| next candidate system indices | [5] |
140+
+----------------------------------+--------------------+
141+
+-------------------------------+--------------------+
142+
| | Candidate Selector |
143+
+-------------------------------+--------------------+
144+
| acquisition function | MU |
145+
| # of candidates to pick | 1 |
146+
| target window | None |
147+
| include hhi? | False |
148+
| include segregation energies? | False |
149+
+-------------------------------+--------------------+
150+
+-------------------------+--------------------------+
151+
| | DesignSpace |
152+
+-------------------------+--------------------------+
153+
| total # of systems | 10 |
154+
| # of unlabelled systems | 4 |
155+
| unique species present | ['Pt', 'Cu', 'Li', 'Ti'] |
156+
| maximum label | 0.9712050050259604 |
157+
| minimum label | -1.3259701029215702 |
158+
+-------------------------+--------------------------+
159+
+-----------+------------------------------------------------------------------+
160+
| | Predictor |
161+
+-----------+------------------------------------------------------------------+
162+
| regressor | <class 'sklearn.gaussian_process._gpr.GaussianProcessRegressor'> |
163+
| is fit? | True |
164+
+-----------+------------------------------------------------------------------+
165+
+-----------------------------------+-------------------------------------------+
166+
| | Featurizer |
167+
+-----------------------------------+-------------------------------------------+
168+
| class | dscribe.descriptors.sinematrix.SineMatrix |
169+
| kwargs | None |
170+
| species list | ['Li', 'Ti', 'Pt', 'Cu'] |
171+
| maximum structure size | 36 |
172+
| preset | None |
173+
| design space structures provided? | True |
174+
+-----------------------------------+-------------------------------------------+
83175
```
84176

85177
Within the returned `SequentialLearner` object we now have information we can use
@@ -97,12 +189,13 @@ of running in parallel (since this is an embarrassingly parallel operation). Her
97189
three independent runs in serial.
98190

99191
```py
192+
>>> from autocat.learning.sequential import multiple_simulated_sequential_learning_runs
100193
>>> runs_history = multiple_simulated_sequential_learning_runs(
101194
... full_design_space=design_space,
195+
... candidate_selector=candidate_selector,
196+
... predictor=predictor,
102197
... init_training_size=1,
103198
... number_of_sl_loops=5,
104-
... candidate_selection_kwargs=candidate_selection_kwargs,
105-
... predictor_kwargs=predictor_kwargs,
106199
... number_of_runs=3,
107200
... # number_of_parallel_jobs=N if you wanted to run in parallel
108201
... )

0 commit comments

Comments
 (0)