WIP: Add BFGS & Catch Test Framework by rhl- · Pull Request #183 · elemental/Elemental

rhl- · 2016-10-18T19:44:01Z

This Pull Request adds support for Basic BFGS within Elemental. I am using the Zoom line search algorithm found on Page 60-61 of Nocedal & Wright. The BFGS implementation itself maintains the updates to the inverse of the approximate hessian, and simply applies them at every iteration.

I have tested the optimizer with three functions, f(x) = x'x, and f(x) = x'A'Ax + x'b, and the rosenbrock function. As such I have growing confidence in the correctness of the procedure.

The scheme uses random starting vectors, and I have extremely infrequently seen some of the tests converge to a vector of NaN's. I am not yet sure why this is happening, as I am not able to reproduce it regularly. Any help here would be appreciated.

Nocedal & Wright provide no guidance on the setting of alphaMax, which I have set to 100. There are a few parameters set which I am not confident on. Also, I would appreciated commentary on the stopping criteria for the Zoom procedure.

In addition, I have edited the CMakeLists.txt to allow the interactive CLion Debugger to function with Elemental, and I have removed the logic which forces a recompile on a git commit.

With 30-60 minute long recompiles, this logic gets in the way of active development.

…d to get more test cases implemented.

…ll problem in line search.

poulson · 2016-10-19T15:17:52Z

CMakeLists.txt

-  message(WARNING "Build mode not specified, defaulting to Release build.")
-  set(CMAKE_BUILD_TYPE Release)
-endif()
+#if(CMAKE_BUILD_TYPE STREQUAL "Release")


What do you do when no CMake build type was specified?

Not sure what "CMake" mode this is, but, it appears that the output is no optimization flags on my mac. Maybe that means -O0 by default?

I've gone ahead and added this check, and setting to Release in the event that its not set. In the event that this is not set, there are numerous build errors within Elemental. This seems like a deficiency. I'll see if I can look more into this.

The production CMake explicitly forces either Debug or Release by design. It is strange to remove this logic and claim it is a deficiency in the original code. There needs to be a decision as to whether to enable the debug checks or not.

I am just trying to use the CLion IDE Interactive Debugger. With those lines it will not build in debug mode, without them, it will.

What I said was a deficiency is that if you manage to leave CMAKE_BUILD_TYPE unspecified it gives a number of compile errors. I don't think there is anything controversial about that.

If you would like I can leave this change out of the diff, and we can discuss it separately.

poulson · 2016-10-19T15:18:57Z

include/El/optimization/bfgs.hpp

+namespace El {
+
+template< typename T, typename Function, typename Gradient>
+T zoom( const Function& f, const Gradient& gradient, T f0, const DistMatrix<T>& x0, const DistMatrix<T>& p,


Why the insistence on starting function names with lowercase letters (as opposed to the 10,000 other functions in the library)?

Sorry :) I'm trying my best to do camel case.

poulson · 2016-10-19T15:20:34Z

include/El/optimization/bfgs.hpp

+T zoom( const Function& f, const Gradient& gradient, T f0, const DistMatrix<T>& x0, const DistMatrix<T>& p,
+        T alpha_low, T alpha_high, T c1, T c2){
+    DistMatrix<T> x_j(x0);
+    DistMatrix<T> g2(p.Height(), 1);


It is important to construct DistMatrix instances in subroutines using any implied process grid (e.g., DistMatrix<T> g2(p.Height(), 1, x0.Grid());). Otherwise, this routine will not work when x0 used a non-default process grid.

Yes, I figured I was messing this up somehow.

poulson · 2016-10-19T15:21:27Z

include/El/optimization/bfgs.hpp

+        T alpha_low, T alpha_high, T c1, T c2){
+    DistMatrix<T> x_j(x0);
+    DistMatrix<T> g2(p.Height(), 1);
+    while (alpha_high - alpha_low > 10*El::limits::Epsilon<T>()){


Is T assumed to be a real variable? Otherwise one should use El::limits::Epsilon<Base<T>>()

poulson · 2016-10-19T15:22:55Z

include/El/optimization/bfgs.hpp

+    DistMatrix<T> x_j(x0);
+    DistMatrix<T> g2(p.Height(), 1);
+    while (alpha_high - alpha_low > 10*El::limits::Epsilon<T>()){
+        T alpha = (alpha_low + alpha_high)/2.0;


I would highly recommend against ever using floating-point literals in templated routines and would instead recommend T(2) instead of T(2.0). This happens to be safe for 2.0 but the conversion generally leads to a loss in precision.

poulson · 2016-10-19T15:23:21Z

include/El/optimization/bfgs.hpp

+        }
+        alpha_low = alpha;
+    }
+    return (alpha_high+alpha_low/2.0);


Same comment as above here.

poulson · 2016-10-19T15:24:30Z

include/El/optimization/bfgs.hpp

+T lineSearch( const Function& f, const Gradient& gradient,
+              const DistMatrix<T>& g, Int D,
+              const DistMatrix<T>& x0, const DistMatrix<T>& p,
+              Int maxIter=100, T c1=1e-4, T c2=0.9){


Same comment as above here; I would recommend using a function of machine epsilon for c1 and to set T c2=T(9)/T(10)

Can you recommend the specific function?

Pow(limits::Epsilon<Real>(),Real(1)/Real(4)) would return the fourth-root of machine epsilon, which is roughly 1-e4 for double-precision. I'm not saying that this is necessarily the right generalization, but it's worth considering whether you want 1e-4 for single precision.

poulson · 2016-10-22T16:22:12Z

CMakeLists.txt

  message(FATAL_ERROR "In-source build attempted; please clean the CMake cache and then switch to an out-of-source build, e.g.,\nrm CMakeCache.txt && rm -Rf CMakeFiles/\nmkdir build/ && cd build/ && cmake ..")
 endif()

+if (NOT CMAKE_BUILD_TYPE)


It is also important that we handle mis-set build types (e.g., the old PureDebug choice should cause an error). I think that we need to enumerate the allowed values and complain otherwise.

Is that an old Elemental build type? whats the difference between it and Debug?

poulson · 2016-10-22T16:22:51Z

include/El/optimization/bfgs.hpp

@@ -47,18 +47,18 @@ template< typename T, typename Function, typename Gradient>
 T lineSearch( const Function& f, const Gradient& gradient,


Capital letter for this function name please?

poulson · 2016-10-22T16:23:54Z

include/El/optimization/bfgs.hpp

-        T fval = 0;
+        T  alpha(1);
+        T  alpha_prev(0);
+        T  alphaMax(1e3);


I am not completely sure, but my intuition is that 1e3 is a double-precision literal.

T x( __ ); will ensure the right type. What is the concern?

It is moreso to call attention to the fact that there is a conversion happening. This is fine in this case since double-precision represents 1e3 exactly.

poulson · 2016-10-22T16:27:20Z

include/El/optimization/bfgs.hpp

-    while (alpha_high - alpha_low > 10*El::limits::Epsilon<T>()){
-        T alpha = (alpha_low + alpha_high)/2.0;
+    DistMatrix<T> g2(p.Height(), 1, x0.Grid());
+    while (alpha_high - alpha_low > T(10)*El::limits::Epsilon<Base<T>>()){


There is no reason to need the El:: prefix in this case since this function is defined within the El namespace. The only exceptions that I'm aware of are due to conflicts caused by Argument Dependent Lookup where one of the arguments comes from a different namespace which has a function with the same name as the one from the El namespace.

Also, for the sake of consistency with the rest of the library, can we use symmetry with braces and put them on a line by themselves?

poulson · 2016-10-22T16:43:50Z

include/El/optimization/bfgs.hpp

-    for( std::size_t iter=0; (norm_g > 100*limits::Epsilon<T>()); ++iter){
+    for( std::size_t iter=0; (norm_g > T(100)*limits::Epsilon<Base<T>>()); ++iter){
        //std::cout << "iter: " << iter << std::endl;
        //El::Display(x, "Iterate");


Can we remove the debugging artifacts?

poulson · 2016-11-07T03:52:11Z

include/El/optimization/bfgs.hpp

@@ -0,0 +1,226 @@
+/*
+   Copyright (c) 2009-2016, Ryan H. Lewis


I wasn't aware that you had been working on Elemental since 2009! :-)

neither am I. Fixed.

poulson · 2016-11-07T03:53:33Z

include/El/optimization/bfgs.hpp

+    DistMatrix<T> g2(p.Height(), 1, x0.Grid());
+    while (alpha_high - alpha_low > T(10)*limits::Epsilon<Base<T>>())
+    {
+        T alpha = (alpha_low + alpha_high)/T(2.0);


It makes more sense to use T(2) rather than T(2.0), as the former will be an int to T conversion while the latter will be a double to T conversion.

poulson · 2016-11-07T03:54:01Z

include/El/optimization/bfgs.hpp

+        }
+        alpha_low = alpha;
+    }
+    return (alpha_high+alpha_low)/T(2.0);


Same comment as before about T(2) being preferred.

poulson · 2016-11-07T03:56:13Z

include/El/optimization/bfgs.hpp

+namespace El {
+
+template< typename T, typename Function, typename Gradient>
+T Zoom( const Function& f, const Gradient& gradient, T f0, const DistMatrix<T>& x0, const DistMatrix<T>& p,


As mentioned before, it would be useful to have descriptive names for many of these constants; f, f0, and x0 seem reasonable, but p, alpha_low, alpha_high, c1, and c2 are ambiguous without referring to Nocedal and Wright. For that matter, it would make sense to name the function something more descriptive than Zoom.

These parameters match the notation in nocedal and wright. I'd prefer to leave them and mention that explicitly. Is that acceptable?

poulson · 2016-11-07T03:57:43Z

include/El/optimization/bfgs.hpp

+ * This algorithm attempts to satsify the weak wolf conditions.
+ */
+template< typename T, typename Function, typename Gradient>
+T lineSearch( const Function& f, const Gradient& gradient,


As mentioned before, it would be more stylistically consistent to call the routine LineSearch. And, since Elemental already contains several one-off line searches of various flavors (within the Interior Point Methods), it would be a good idea to prefix the name LineSearch with the type of the line search.

I'm not sure what this line search is called. It's the one from nocedal and wright, more or less, but they don't name it. feel free to propose a name and i'll adopt it.

poulson · 2016-11-07T04:05:22Z

tests/optimization/bfgs.cpp

+#define CATCH_CONFIG_RUNNER
+#include <catch.hpp>
+
+using namespace El;


There is a rather strange mix of using and not using the El:: prefix for member variables and constants below.

I try to always use namespaces (renaming them locally when they are too long to type). it enhances readability, but, I did copy this cpp file from another one in El.

poulson · 2016-11-07T04:06:08Z

tests/optimization/bfgs.cpp

+template< typename T>
+std::pair< DistMatrix<T>, T>
+SimpleQuadraticBFGSTest( const Int & N){
+  const std::function< T(const DistMatrix<T>&)>


Why the switch to two-space indentation here?

this is all CLion. I'm trying to get its rules correct. apologies.

poulson · 2016-11-07T04:06:27Z

tests/optimization/bfgs.cpp

+  const std::function< T(const DistMatrix<T>&)>
+  quadratic_function = [&](const DistMatrix<T>& theta)
+  {
+        DistMatrix<T> y( theta);


Two-space followed by six-space?

poulson · 2016-11-07T04:07:11Z

tests/optimization/bfgs.cpp

+
+template< typename T>
+std::pair< DistMatrix<T>, T>
+RosenbrockTest(){


Thank you for adding a Rosenbrock test!

poulson · 2016-11-07T04:09:23Z

tests/optimization/bfgs.cpp

+    const std::function< T(const DistMatrix<T>&)>
+          rosenbrock = [&](const DistMatrix<T>& theta)
+    {
+        auto x1 = theta.Get(0,0);


Each Get call to a DistMatrix will involve an mpi::Broadcast under the hood; this is expensive for querying a vector of length two (when the vector could be duplicated across all processes), but I suppose this is okay for testing.

yeah, that's bad. How do I fix this again?

If theta is distributed as a DistMatrix<T,STAR,STAR> then one could call GetLocal instead.

…ctly.

poulson · 2016-11-15T16:20:44Z

include/El/optimization/bfgs.hpp

+ *
+ * Note: it is _not_ required that alphaLow < alphaHigh.
+ *
+ * @param f


I would strongly recommend documenting what each routine is meant to accomplish before documenting the parameters.

…ilures.

poulson · 2016-11-21T01:33:20Z

include/El/optimization/bfgs.hpp

            }
        }while( !done);
        if( !limits::IsFinite(beta)){ RuntimeError("Line search failed to brack point satisfying weak wolfe conditions. Function may be unbounded below"); }
+        RuntimeError(std::string("[")+alpha+","+beta+"] brackets an interval containing a point satisfying WWC");


RuntimeError is smarter than you're giving it credit and you can simplify this line to RuntimeError("[",alpha,",",beta,"] brackets an interval containing a point satisfying WWC");

rhl- and others added 8 commits August 30, 2016 13:32

initial bfgs sketch.

ecdbd59

Merge branch 'master' of github.com:elemental/Elemental into add_bfgs

440f415

bfgs compiles. need to debug and such.

3ae931b

more.

10d1471

code appears to work correctly for a simple univariate quadratic. nee…

8476dd0

…d to get more test cases implemented.

implemented nocedal and wright line search from page 60.

4037c21

added the rosenbrock function as a testc ase.

9d2d910

added unit test framework, got clion debugging functioning. fixed sma…

91b1ed8

…ll problem in line search.

rhl- added the bug label Oct 18, 2016

rhl- assigned andreasnoack and poulson Oct 18, 2016

rhl- added enhancement and removed bug labels Oct 18, 2016

forgot catch header.

e7e511c

rhl- unassigned andreasnoack Oct 18, 2016

rhl- added this to the Release 0.87 milestone Oct 18, 2016

poulson reviewed Oct 19, 2016

View reviewed changes

include/El/optimization/bfgs.hpp Outdated

}

alpha_low = alpha;

}

return (alpha_high+alpha_low/2.0);

Copy link

Member

poulson Oct 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above here.

poulson reviewed Oct 19, 2016

View reviewed changes

Addresses review.

f8dc7fd

poulson reviewed Oct 22, 2016

View reviewed changes

capital F.

770baec

poulson reviewed Oct 22, 2016

View reviewed changes

rhl- added 3 commits October 22, 2016 11:20

addresses review items, formatting, and removes un-necessary warning.

6edb245

fixes to bfgs.

f3da919

WIP.

97b0c84

poulson requested changes Nov 7, 2016

View reviewed changes

rhl- added 8 commits November 13, 2016 14:54

trying to get the lineSearch routine with interpolation written corre…

53a0da2

…ctly.

Update not update.

bd70faf

fix bug.

aa99386

run each test 10 times just incase there are funny failures.

8b078b2

added more rosenbrock tests.

59ec7ce

write less code.

459db88

fix error code handling, perhaps makes test fail?

66f4769

borrowing More & Thuente Interpolation. still not always working.

9a59baa

poulson reviewed Nov 15, 2016

View reviewed changes

rhl- added 5 commits November 20, 2016 13:46

much simpler line search.

7aef47f

a number of tweaks to the lineSearch, explicit gaurds against some fa…

24557ae

…ilures.

accidently deleted ptg

859daee

fix debug statement

ea90dfb

add an exception just in case.

dc42de5

poulson reviewed Nov 21, 2016

View reviewed changes

rhl- added 7 commits November 20, 2016 17:39

Merge branch 'master' of github.com:elemental/Elemental into add_bfgs

7e496a6

fix string.

f5d5e96

adopting coefficients used by LBFGS

7e9bddd

introduced a bug earlier today.

d761213

const bug.

de631f2

using nocedals recursive hessian inverse method.

9a6c3cf

tuple access and variables names.

3fd404b

rhl- modified the milestones: Release 0.87, Elemental Release 0.88 Nov 29, 2016

rhl- changed the title ~~Add BFGS & Catch Test Framework~~ WIP: Add BFGS & Catch Test Framework Nov 30, 2016

rhl- closed this Feb 17, 2017

		@@ -47,18 +47,18 @@ template< typename T, typename Function, typename Gradient>
		T lineSearch( const Function& f, const Gradient& gradient,

Conversation

rhl- commented Oct 18, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

poulson Oct 19, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

poulson Oct 19, 2016 •

edited

Loading