Merge branch 'develop' of https://github.com/yngtodd/Benchmarks into develop

yngtodd · yngtodd · commit 1406cc1b6852 · 2020-05-12T17:06:57.000-04:00
diff --git a/examples/darts/README.rst b/examples/darts/README.rst
@@ -2,10 +2,77 @@
 DARTS Examples
 ==============
 
+Differentiable architecture search
+
+TLDR
+----
+
 Our recommended ordering of examples:
 
 1. **Uno**: learn how to use the neural network building blocks in DARTS to 
    define a fully connected model using DARTS.
 
 2. **Advanced**: how to define our own neural network primitives to be optimized
    by DARTS.
+
+Setup
+-----
+
+Darts makes use of Pytorch. You can find binaries for both Pytorch and Torchvision (used in the advanced
+example) at the `pytorch website`_.
+
+The Algorithm
+-------------
+
+This is an adaptation of Hanxiao Liu et al's DARTS algorithm, extending
+the work to handle convolutional neural networks for NLP problems and more.
+Details of the original authors' approach can be found in their 2019 ICLR paper_.
+
+DARTS works by composing various neural net primitives, defined as Pytorch *nn.Modules*,
+to create a larger directed acyclic graph (DAG) that is to be your model. This
+composition is differentiable as we take the softmax of the choice of primitive types
+at each layer of the network. To make this more clear, let's first define a few abstractions
+in the algorithm:
+
+1. **Primitve**: this is the fundamental block of computation, defined as an *nn.Module*.
+   At each layer of your network, one of these primitves will be chosen by taking the
+   softmax of all possible primitives at that layer. Examples could be a convolution block,
+   a linear layer, a skip connect, or anything that you can come up with (subject to a few
+   constraints).
+
+2. **Cell**: this is an abstraction that holds each of the primitive types for level of your
+   network. This is where we perform the softmax over the possible primitive types.
+
+3. **Nodes**: this is the level of abstraction that would normally be considered a layer in
+   your network. It can contain one or more *Cells*.
+
+4. **Architecture**: The abstraction that contains all nodes in the graph. This computes a
+   Hessian product with respect to the *alpha* parameters as defined in the paper.
+
+5. **Genotype**: genotypes are instances of a particular configuration of the graph. As the
+   optimization runs, and each cell computes the softmax over their primitive types, the final
+   configuration of all nodes with their resulting primitive is a genotype.
+
+In the DARTS algorithm, we define a number of primitives that we would like to compose together
+to form our neural network. The original paper started with 8 primitive types. These types
+were originally designed for a vision task, and largely consist of convolution type operations.
+We have since adapted these types for the *P3B5* benchmark, creating 1D convolution types for
+our NLP tasks. If you would like to see how these primitives are defined, along with their
+necessary constructors used by DARTS, you can find them in
+`darts.modules.operations.conv.py`_.
+
+These primitives are then contained within a cell, and one or more cells are contained within a
+node in the graph. DARTS then works by composing these nodes together and taking the softmax over
+their primitives in each cell. Finally, the *Architecture* abstraction contains all nodes, and is
+responsible for differentiating the composition of the nodes with respect to two *alpha* parameters
+as defined in the paper. The end result is that we have a differentiable model that composes its
+components as the model is training.
+
+As the optimization runs, the model will print the resulting loss with respect to a given *Genotype*.
+The final model will be the *Genotype* with corresponding to the lowest loss.
+
+.. References
+.. ----------
+.. _paper: https://openreview.net/forum?id=S1eYHoC5FX
+.. _darts.modules.operations.conv.py: ../../../common/darts/modules/operations/conv.py
+.. _pytorch website: https://pytorch.org/
diff --git a/examples/darts/advanced/README.rst b/examples/darts/advanced/README.rst
@@ -2,59 +2,6 @@
 DARTS Advanced
 ==============
 
-
-Differentiable architecture search
-
-This is an adaptation of Hanxiao Liu et al's DARTS algorithm, extending 
-the work to handle convolutional neural networks for NLP problems and more. 
-Details of the original authors' approach can be found in their 2019 ICLR paper_.
-
-DARTS works by composing various neural net primitives, defined as Pytorch *nn.Modules*,
-to create a larger directed acyclic graph (DAG) that is to be your model. This 
-composition is differentiable as we take the softmax of the choice of primitive types 
-at each layer of the network. To make this more clear, let's first define a few abstractions
-in the algorithm:
-
-1. **Primitve**: this is the fundamental block of computation, defined as an *nn.Module*. 
-   At each layer of your network, one of these primitves will be chosen by taking the 
-   softmax of all possible primitives at that layer. Examples could be a convolution block, 
-   a linear layer, a skip connect, or anything that you can come up with (subject to a few 
-   constraints).
-
-2. **Cell**: this is an abstraction that holds each of the primitive types for level of your 
-   network. This is where we perform the softmax over the possible primitive types.
-
-3. **Nodes**: this is the level of abstraction that would normally be considered a layer in
-   your network. It can contain one or more *Cells*.
-
-4. **Architecture**: The abstraction that contains all nodes in the graph. This computes a 
-   Hessian product with respect to the *alpha* parameters as defined in the paper. 
-
-5. **Genotype**: genotypes are instances of a particular configuration of the graph. As the 
-   optimization runs, and each cell computes the softmax over their primitive types, the final
-   configuration of all nodes with their resulting primitive is a genotype.
-
-In the DARTS algorithm, we define a number of primitives that we would like to compose together 
-to form our neural network. The original paper started with 8 primitive types. These types 
-were originally designed for a vision task, and largely consist of convolution type operations. 
-We have since adapted these types for the *P3B5* benchmark, creating 1D convolution types for
-our NLP tasks. If you would like to see how these primitives are defined, along with their 
-necessary constructors used by DARTS, you can find them in 
-`darts.modules.operations.conv.py`_.
-
-These primitives are then contained within a cell, and one or more cells are contained within a 
-node in the graph. DARTS then works by composing these nodes together and taking the softmax over
-their primitives in each cell. Finally, the *Architecture* abstraction contains all nodes, and is
-responsible for differentiating the composition of the nodes with respect to two *alpha* parameters
-as defined in the paper. The end result is that we have a differentiable model that composes its 
-components as the model is training.
-
-As the optimization runs, the model will print the resulting loss with respect to a given *Genotype*.
-The final model will be the *Genotype* with corresponding to the lowest loss.
-
-Adnvanced Example
------------------
-
 In this example we will take a look at how to define our own primitives to be handled by DARTS. If 
 you have not read the `Uno example`_, I would recommend taking a look at that first. There we showed 
 how we can use the built in primitives to DARTS. As reference, you can also look to see how those 
@@ -172,7 +119,16 @@ of the primitives must have the same number of input and output features, this w
 of features from any of your primitives. Since DARTS cannot know ahead of time what your primitives will be,
 we must specify how many features will go into our final fully connected layer of the network.
 
-Finally, to run this example:
+Run the Example
+---------------
+
+First, make sure that you can get the example data by installing `torchvision`:
+
+.. code-block::
+
+    pip install torchvision
+
+Then run the example with
 
 .. code-block::
 
diff --git a/examples/darts/advanced/default_model.txt b/examples/darts/advanced/default_model.txt
@@ -1,18 +1,15 @@
 [Global_Params]
 model_name = 'darts_uno'
-unrolled = False
 data_url = 'ftp.mcs.anl.gov/pub/candle/public/benchmarks/Pilot1/uno/' 
 savepath = './results'
 log_interval = 10
 train_data = 'top_21_auc_1fold.uno.h5'
-learning_rate = 0.01
+learning_rate = 0.025
 learning_rate_min = 0.001
 momentum = 0.9
 weight_decay = 3e-4
 grad_clip = 5 
 batch_size = 100
 epochs = 10
 seed = 13
-lr = 0.025
-lr_min = 0.001
 
diff --git a/examples/darts/advanced/example.py b/examples/darts/advanced/example.py
@@ -77,23 +77,22 @@ def run(params):
 
     optimizer = optim.SGD(
         model.parameters(),
-        args.lr,
+        args.learning_rate,
         momentum=args.momentum,
         weight_decay=args.weight_decay
     )
 
     scheduler = optim.lr_scheduler.CosineAnnealingLR(
         optimizer,
         float(args.epochs),
-        eta_min=args.lr_min
+        eta_min=args.learning_rate_min
     )
 
     train_meter = darts.EpochMeter(tasks, 'train')
     valid_meter = darts.EpochMeter(tasks, 'valid')
 
     for epoch in range(args.epochs):
 
-        scheduler.step()
         lr = scheduler.get_lr()[0]
         logger.info(f'\nEpoch: {epoch} lr: {lr}')
 
@@ -106,7 +105,7 @@ def run(params):
             architecture,
             criterion,
             optimizer,
-            lr,
+            scheduler,
             args,
             tasks,
             train_meter,
@@ -121,7 +120,7 @@ def train(trainloader,
           architecture,
           criterion,
           optimizer,
-          lr,
+          scheduler,
           args,
           tasks,
           meter,
@@ -141,6 +140,8 @@ def train(trainloader,
         x_search = darts.to_device(x_search, device)
         target_search = darts.to_device(target_search, device)
 
+        lr = scheduler.get_lr()[0]
+
         # 1. update alpha
         architecture.step(
             data,
@@ -149,7 +150,7 @@ def train(trainloader,
             target_search,
             lr,
             optimizer,
-            unrolled=args.unrolled
+            unrolled=False
         )
 
         logits = model(data)
@@ -160,6 +161,7 @@ def train(trainloader,
         loss.backward()
         nn.utils.clip_grad_norm_(model.parameters(), args.grad_clip)
         optimizer.step()
+        scheduler.step()
 
         prec1 = darts.multitask_accuracy_topk(logits, target, topk=(1,))
         meter.update_batch_loss(loss.item(), batch_size)
diff --git a/examples/darts/advanced/example_setup.py b/examples/darts/advanced/example_setup.py
@@ -17,7 +17,6 @@
     'weight_decay',
     'grad_clip',
     'seed',
-    'unrolled',
     'batch_size',
     'epochs',
 ]
diff --git a/examples/darts/uno/README.rst b/examples/darts/uno/README.rst
@@ -2,59 +2,6 @@
 DARTS UNO
 =========
 
-
-Differentiable architecture search
-
-This is an adaptation of Hanxiao Liu et al's DARTS algorithm, extending 
-the work to handle convolutional neural networks for NLP problems and more. 
-Details of the original authors' approach can be found in their 2019 ICLR paper_.
-
-DARTS works by composing various neural net primitives, defined as Pytorch *nn.Modules*,
-to create a larger directed acyclic graph (DAG) that is to be your model. This 
-composition is differentiable as we take the softmax of the choice of primitive types 
-at each layer of the network. To make this more clear, let's first define a few abstractions
-in the algorithm:
-
-1. **Primitve**: this is the fundamental block of computation, defined as an *nn.Module*. 
-   At each layer of your network, one of these primitves will be chosen by taking the 
-   softmax of all possible primitives at that layer. Examples could be a convolution block, 
-   a linear layer, a skip connect, or anything that you can come up with (subject to a few 
-   constraints).
-
-2. **Cell**: this is an abstraction that holds each of the primitive types for level of your 
-   network. This is where we perform the softmax over the possible primitive types.
-
-3. **Nodes**: this is the level of abstraction that would normally be considered a layer in
-   your network. It can contain one or more *Cells*.
-
-4. **Architecture**: The abstraction that contains all nodes in the graph. This computes a 
-   Hessian product with respect to the *alpha* parameters as defined in the paper. 
-
-5. **Genotype**: genotypes are instances of a particular configuration of the graph. As the 
-   optimization runs, and each cell computes the softmax over their primitive types, the final
-   configuration of all nodes with their resulting primitive is a genotype.
-
-In the DARTS algorithm, we define a number of primitives that we would like to compose together 
-to form our neural network. The original paper started with 8 primitive types. These types 
-were originally designed for a vision task, and largely consist of convolution type operations. 
-We have since adapted these types for the *P3B5* benchmark, creating 1D convolution types for
-our NLP tasks. If you would like to see how these primitives are defined, along with their 
-necessary constructors used by DARTS, you can find them in 
-`darts.modules.operations.conv.py`_.
-
-These primitives are then contained within a cell, and one or more cells are contained within a 
-node in the graph. DARTS then works by composing these nodes together and taking the softmax over
-their primitives in each cell. Finally, the *Architecture* abstraction contains all nodes, and is
-responsible for differentiating the composition of the nodes with respect to two *alpha* parameters
-as defined in the paper. The end result is that we have a differentiable model that composes its 
-components as the model is training.
-
-As the optimization runs, the model will print the resulting loss with respect to a given *Genotype*.
-The final model will be the *Genotype* with corresponding to the lowest loss.
-
-UNO Example
------------
-
 Let's take a look at a look at using DARTS for the Pilot 1 Uno example. In the Uno
 problem the task is to classify tumor dose response with respect to a few different 
 data sources. For simplicity, we will use one source, Uno's gene data, to be used 
@@ -115,7 +62,8 @@ data and labels of the training set, but also the data and labels of our validat
 simplicity of this tutorial, *x_search* and *target_search* are from our training set, but these 
 would normally use a separate validation set.
 
-Finally, to run this example:
+Run the Example
+---------------
 
 .. code-block::
 
diff --git a/examples/darts/uno/default_model.txt b/examples/darts/uno/default_model.txt
@@ -1,18 +1,15 @@
 [Global_Params]
 model_name = 'darts_uno'
-unrolled = True
 data_url = 'http://ftp.mcs.anl.gov/pub/candle/public/benchmarks/Pilot1/uno/' 
 savepath = '.'
 log_interval = 10
 train_data = 'top_21_auc_1fold.uno.h5'
-learning_rate = 0.01
+learning_rate = 0.025
 learning_rate_min = 0.001
 momentum = 0.9
 weight_decay = 3e-4
 grad_clip = 5 
 batch_size = 100
 epochs = 10
 seed = 13
-lr = 0.025
-lr_min = 0.001
 
diff --git a/examples/darts/uno/example_setup.py b/examples/darts/uno/example_setup.py
@@ -17,7 +17,6 @@
     'weight_decay',
     'grad_clip',
     'seed',
-    'unrolled',
     'batch_size',
     'epochs',
 ]
diff --git a/examples/darts/uno/uno_example.py b/examples/darts/uno/uno_example.py

Original file line number	Diff line number	Diff line change
`@@ -17,7 +17,6 @@`
`17`	`17`	`'weight_decay',`
`18`	`18`	`'grad_clip',`
`19`	`19`	`'seed',`
`20`		`- 'unrolled',`
`21`	`20`	`'batch_size',`
`22`	`21`	`'epochs',`
`23`	`22`	`]`