Skip to content

Commit 1406cc1

Browse files
committed
Merge branch 'develop' of https://github.com/yngtodd/Benchmarks into develop
2 parents a5d44c9 + 9733e3c commit 1406cc1

File tree

9 files changed

+97
-132
lines changed

9 files changed

+97
-132
lines changed

examples/darts/README.rst

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,77 @@
22
DARTS Examples
33
==============
44

5+
Differentiable architecture search
6+
7+
TLDR
8+
----
9+
510
Our recommended ordering of examples:
611

712
1. **Uno**: learn how to use the neural network building blocks in DARTS to
813
define a fully connected model using DARTS.
914

1015
2. **Advanced**: how to define our own neural network primitives to be optimized
1116
by DARTS.
17+
18+
Setup
19+
-----
20+
21+
Darts makes use of Pytorch. You can find binaries for both Pytorch and Torchvision (used in the advanced
22+
example) at the `pytorch website`_.
23+
24+
The Algorithm
25+
-------------
26+
27+
This is an adaptation of Hanxiao Liu et al's DARTS algorithm, extending
28+
the work to handle convolutional neural networks for NLP problems and more.
29+
Details of the original authors' approach can be found in their 2019 ICLR paper_.
30+
31+
DARTS works by composing various neural net primitives, defined as Pytorch *nn.Modules*,
32+
to create a larger directed acyclic graph (DAG) that is to be your model. This
33+
composition is differentiable as we take the softmax of the choice of primitive types
34+
at each layer of the network. To make this more clear, let's first define a few abstractions
35+
in the algorithm:
36+
37+
1. **Primitve**: this is the fundamental block of computation, defined as an *nn.Module*.
38+
At each layer of your network, one of these primitves will be chosen by taking the
39+
softmax of all possible primitives at that layer. Examples could be a convolution block,
40+
a linear layer, a skip connect, or anything that you can come up with (subject to a few
41+
constraints).
42+
43+
2. **Cell**: this is an abstraction that holds each of the primitive types for level of your
44+
network. This is where we perform the softmax over the possible primitive types.
45+
46+
3. **Nodes**: this is the level of abstraction that would normally be considered a layer in
47+
your network. It can contain one or more *Cells*.
48+
49+
4. **Architecture**: The abstraction that contains all nodes in the graph. This computes a
50+
Hessian product with respect to the *alpha* parameters as defined in the paper.
51+
52+
5. **Genotype**: genotypes are instances of a particular configuration of the graph. As the
53+
optimization runs, and each cell computes the softmax over their primitive types, the final
54+
configuration of all nodes with their resulting primitive is a genotype.
55+
56+
In the DARTS algorithm, we define a number of primitives that we would like to compose together
57+
to form our neural network. The original paper started with 8 primitive types. These types
58+
were originally designed for a vision task, and largely consist of convolution type operations.
59+
We have since adapted these types for the *P3B5* benchmark, creating 1D convolution types for
60+
our NLP tasks. If you would like to see how these primitives are defined, along with their
61+
necessary constructors used by DARTS, you can find them in
62+
`darts.modules.operations.conv.py`_.
63+
64+
These primitives are then contained within a cell, and one or more cells are contained within a
65+
node in the graph. DARTS then works by composing these nodes together and taking the softmax over
66+
their primitives in each cell. Finally, the *Architecture* abstraction contains all nodes, and is
67+
responsible for differentiating the composition of the nodes with respect to two *alpha* parameters
68+
as defined in the paper. The end result is that we have a differentiable model that composes its
69+
components as the model is training.
70+
71+
As the optimization runs, the model will print the resulting loss with respect to a given *Genotype*.
72+
The final model will be the *Genotype* with corresponding to the lowest loss.
73+
74+
.. References
75+
.. ----------
76+
.. _paper: https://openreview.net/forum?id=S1eYHoC5FX
77+
.. _darts.modules.operations.conv.py: ../../../common/darts/modules/operations/conv.py
78+
.. _pytorch website: https://pytorch.org/

examples/darts/advanced/README.rst

Lines changed: 10 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -2,59 +2,6 @@
22
DARTS Advanced
33
==============
44

5-
6-
Differentiable architecture search
7-
8-
This is an adaptation of Hanxiao Liu et al's DARTS algorithm, extending
9-
the work to handle convolutional neural networks for NLP problems and more.
10-
Details of the original authors' approach can be found in their 2019 ICLR paper_.
11-
12-
DARTS works by composing various neural net primitives, defined as Pytorch *nn.Modules*,
13-
to create a larger directed acyclic graph (DAG) that is to be your model. This
14-
composition is differentiable as we take the softmax of the choice of primitive types
15-
at each layer of the network. To make this more clear, let's first define a few abstractions
16-
in the algorithm:
17-
18-
1. **Primitve**: this is the fundamental block of computation, defined as an *nn.Module*.
19-
At each layer of your network, one of these primitves will be chosen by taking the
20-
softmax of all possible primitives at that layer. Examples could be a convolution block,
21-
a linear layer, a skip connect, or anything that you can come up with (subject to a few
22-
constraints).
23-
24-
2. **Cell**: this is an abstraction that holds each of the primitive types for level of your
25-
network. This is where we perform the softmax over the possible primitive types.
26-
27-
3. **Nodes**: this is the level of abstraction that would normally be considered a layer in
28-
your network. It can contain one or more *Cells*.
29-
30-
4. **Architecture**: The abstraction that contains all nodes in the graph. This computes a
31-
Hessian product with respect to the *alpha* parameters as defined in the paper.
32-
33-
5. **Genotype**: genotypes are instances of a particular configuration of the graph. As the
34-
optimization runs, and each cell computes the softmax over their primitive types, the final
35-
configuration of all nodes with their resulting primitive is a genotype.
36-
37-
In the DARTS algorithm, we define a number of primitives that we would like to compose together
38-
to form our neural network. The original paper started with 8 primitive types. These types
39-
were originally designed for a vision task, and largely consist of convolution type operations.
40-
We have since adapted these types for the *P3B5* benchmark, creating 1D convolution types for
41-
our NLP tasks. If you would like to see how these primitives are defined, along with their
42-
necessary constructors used by DARTS, you can find them in
43-
`darts.modules.operations.conv.py`_.
44-
45-
These primitives are then contained within a cell, and one or more cells are contained within a
46-
node in the graph. DARTS then works by composing these nodes together and taking the softmax over
47-
their primitives in each cell. Finally, the *Architecture* abstraction contains all nodes, and is
48-
responsible for differentiating the composition of the nodes with respect to two *alpha* parameters
49-
as defined in the paper. The end result is that we have a differentiable model that composes its
50-
components as the model is training.
51-
52-
As the optimization runs, the model will print the resulting loss with respect to a given *Genotype*.
53-
The final model will be the *Genotype* with corresponding to the lowest loss.
54-
55-
Adnvanced Example
56-
-----------------
57-
585
In this example we will take a look at how to define our own primitives to be handled by DARTS. If
596
you have not read the `Uno example`_, I would recommend taking a look at that first. There we showed
607
how we can use the built in primitives to DARTS. As reference, you can also look to see how those
@@ -172,7 +119,16 @@ of the primitives must have the same number of input and output features, this w
172119
of features from any of your primitives. Since DARTS cannot know ahead of time what your primitives will be,
173120
we must specify how many features will go into our final fully connected layer of the network.
174121

175-
Finally, to run this example:
122+
Run the Example
123+
---------------
124+
125+
First, make sure that you can get the example data by installing `torchvision`:
126+
127+
.. code-block::
128+
129+
pip install torchvision
130+
131+
Then run the example with
176132

177133
.. code-block::
178134
Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,15 @@
11
[Global_Params]
22
model_name = 'darts_uno'
3-
unrolled = False
43
data_url = 'ftp.mcs.anl.gov/pub/candle/public/benchmarks/Pilot1/uno/'
54
savepath = './results'
65
log_interval = 10
76
train_data = 'top_21_auc_1fold.uno.h5'
8-
learning_rate = 0.01
7+
learning_rate = 0.025
98
learning_rate_min = 0.001
109
momentum = 0.9
1110
weight_decay = 3e-4
1211
grad_clip = 5
1312
batch_size = 100
1413
epochs = 10
1514
seed = 13
16-
lr = 0.025
17-
lr_min = 0.001
1815

examples/darts/advanced/example.py

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -77,23 +77,22 @@ def run(params):
7777

7878
optimizer = optim.SGD(
7979
model.parameters(),
80-
args.lr,
80+
args.learning_rate,
8181
momentum=args.momentum,
8282
weight_decay=args.weight_decay
8383
)
8484

8585
scheduler = optim.lr_scheduler.CosineAnnealingLR(
8686
optimizer,
8787
float(args.epochs),
88-
eta_min=args.lr_min
88+
eta_min=args.learning_rate_min
8989
)
9090

9191
train_meter = darts.EpochMeter(tasks, 'train')
9292
valid_meter = darts.EpochMeter(tasks, 'valid')
9393

9494
for epoch in range(args.epochs):
9595

96-
scheduler.step()
9796
lr = scheduler.get_lr()[0]
9897
logger.info(f'\nEpoch: {epoch} lr: {lr}')
9998

@@ -106,7 +105,7 @@ def run(params):
106105
architecture,
107106
criterion,
108107
optimizer,
109-
lr,
108+
scheduler,
110109
args,
111110
tasks,
112111
train_meter,
@@ -121,7 +120,7 @@ def train(trainloader,
121120
architecture,
122121
criterion,
123122
optimizer,
124-
lr,
123+
scheduler,
125124
args,
126125
tasks,
127126
meter,
@@ -141,6 +140,8 @@ def train(trainloader,
141140
x_search = darts.to_device(x_search, device)
142141
target_search = darts.to_device(target_search, device)
143142

143+
lr = scheduler.get_lr()[0]
144+
144145
# 1. update alpha
145146
architecture.step(
146147
data,
@@ -149,7 +150,7 @@ def train(trainloader,
149150
target_search,
150151
lr,
151152
optimizer,
152-
unrolled=args.unrolled
153+
unrolled=False
153154
)
154155

155156
logits = model(data)
@@ -160,6 +161,7 @@ def train(trainloader,
160161
loss.backward()
161162
nn.utils.clip_grad_norm_(model.parameters(), args.grad_clip)
162163
optimizer.step()
164+
scheduler.step()
163165

164166
prec1 = darts.multitask_accuracy_topk(logits, target, topk=(1,))
165167
meter.update_batch_loss(loss.item(), batch_size)

examples/darts/advanced/example_setup.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@
1717
'weight_decay',
1818
'grad_clip',
1919
'seed',
20-
'unrolled',
2120
'batch_size',
2221
'epochs',
2322
]

examples/darts/uno/README.rst

Lines changed: 2 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -2,59 +2,6 @@
22
DARTS UNO
33
=========
44

5-
6-
Differentiable architecture search
7-
8-
This is an adaptation of Hanxiao Liu et al's DARTS algorithm, extending
9-
the work to handle convolutional neural networks for NLP problems and more.
10-
Details of the original authors' approach can be found in their 2019 ICLR paper_.
11-
12-
DARTS works by composing various neural net primitives, defined as Pytorch *nn.Modules*,
13-
to create a larger directed acyclic graph (DAG) that is to be your model. This
14-
composition is differentiable as we take the softmax of the choice of primitive types
15-
at each layer of the network. To make this more clear, let's first define a few abstractions
16-
in the algorithm:
17-
18-
1. **Primitve**: this is the fundamental block of computation, defined as an *nn.Module*.
19-
At each layer of your network, one of these primitves will be chosen by taking the
20-
softmax of all possible primitives at that layer. Examples could be a convolution block,
21-
a linear layer, a skip connect, or anything that you can come up with (subject to a few
22-
constraints).
23-
24-
2. **Cell**: this is an abstraction that holds each of the primitive types for level of your
25-
network. This is where we perform the softmax over the possible primitive types.
26-
27-
3. **Nodes**: this is the level of abstraction that would normally be considered a layer in
28-
your network. It can contain one or more *Cells*.
29-
30-
4. **Architecture**: The abstraction that contains all nodes in the graph. This computes a
31-
Hessian product with respect to the *alpha* parameters as defined in the paper.
32-
33-
5. **Genotype**: genotypes are instances of a particular configuration of the graph. As the
34-
optimization runs, and each cell computes the softmax over their primitive types, the final
35-
configuration of all nodes with their resulting primitive is a genotype.
36-
37-
In the DARTS algorithm, we define a number of primitives that we would like to compose together
38-
to form our neural network. The original paper started with 8 primitive types. These types
39-
were originally designed for a vision task, and largely consist of convolution type operations.
40-
We have since adapted these types for the *P3B5* benchmark, creating 1D convolution types for
41-
our NLP tasks. If you would like to see how these primitives are defined, along with their
42-
necessary constructors used by DARTS, you can find them in
43-
`darts.modules.operations.conv.py`_.
44-
45-
These primitives are then contained within a cell, and one or more cells are contained within a
46-
node in the graph. DARTS then works by composing these nodes together and taking the softmax over
47-
their primitives in each cell. Finally, the *Architecture* abstraction contains all nodes, and is
48-
responsible for differentiating the composition of the nodes with respect to two *alpha* parameters
49-
as defined in the paper. The end result is that we have a differentiable model that composes its
50-
components as the model is training.
51-
52-
As the optimization runs, the model will print the resulting loss with respect to a given *Genotype*.
53-
The final model will be the *Genotype* with corresponding to the lowest loss.
54-
55-
UNO Example
56-
-----------
57-
585
Let's take a look at a look at using DARTS for the Pilot 1 Uno example. In the Uno
596
problem the task is to classify tumor dose response with respect to a few different
607
data sources. For simplicity, we will use one source, Uno's gene data, to be used
@@ -115,7 +62,8 @@ data and labels of the training set, but also the data and labels of our validat
11562
simplicity of this tutorial, *x_search* and *target_search* are from our training set, but these
11663
would normally use a separate validation set.
11764

118-
Finally, to run this example:
65+
Run the Example
66+
---------------
11967

12068
.. code-block::
12169
Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,15 @@
11
[Global_Params]
22
model_name = 'darts_uno'
3-
unrolled = True
43
data_url = 'http://ftp.mcs.anl.gov/pub/candle/public/benchmarks/Pilot1/uno/'
54
savepath = '.'
65
log_interval = 10
76
train_data = 'top_21_auc_1fold.uno.h5'
8-
learning_rate = 0.01
7+
learning_rate = 0.025
98
learning_rate_min = 0.001
109
momentum = 0.9
1110
weight_decay = 3e-4
1211
grad_clip = 5
1312
batch_size = 100
1413
epochs = 10
1514
seed = 13
16-
lr = 0.025
17-
lr_min = 0.001
1815

examples/darts/uno/example_setup.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@
1717
'weight_decay',
1818
'grad_clip',
1919
'seed',
20-
'unrolled',
2120
'batch_size',
2221
'epochs',
2322
]

0 commit comments

Comments
 (0)