Conversation
…d to get more test cases implemented.
…ll problem in line search.
CMakeLists.txt
Outdated
| message(WARNING "Build mode not specified, defaulting to Release build.") | ||
| set(CMAKE_BUILD_TYPE Release) | ||
| endif() | ||
| #if(CMAKE_BUILD_TYPE STREQUAL "Release") |
There was a problem hiding this comment.
What do you do when no CMake build type was specified?
There was a problem hiding this comment.
Not sure what "CMake" mode this is, but, it appears that the output is no optimization flags on my mac. Maybe that means -O0 by default?
There was a problem hiding this comment.
I've gone ahead and added this check, and setting to Release in the event that its not set. In the event that this is not set, there are numerous build errors within Elemental. This seems like a deficiency. I'll see if I can look more into this.
There was a problem hiding this comment.
The production CMake explicitly forces either Debug or Release by design. It is strange to remove this logic and claim it is a deficiency in the original code. There needs to be a decision as to whether to enable the debug checks or not.
There was a problem hiding this comment.
I am just trying to use the CLion IDE Interactive Debugger. With those lines it will not build in debug mode, without them, it will.
What I said was a deficiency is that if you manage to leave CMAKE_BUILD_TYPE unspecified it gives a number of compile errors. I don't think there is anything controversial about that.
There was a problem hiding this comment.
If you would like I can leave this change out of the diff, and we can discuss it separately.
include/El/optimization/bfgs.hpp
Outdated
| namespace El { | ||
|
|
||
| template< typename T, typename Function, typename Gradient> | ||
| T zoom( const Function& f, const Gradient& gradient, T f0, const DistMatrix<T>& x0, const DistMatrix<T>& p, |
There was a problem hiding this comment.
Why the insistence on starting function names with lowercase letters (as opposed to the 10,000 other functions in the library)?
There was a problem hiding this comment.
Sorry :) I'm trying my best to do camel case.
include/El/optimization/bfgs.hpp
Outdated
| T zoom( const Function& f, const Gradient& gradient, T f0, const DistMatrix<T>& x0, const DistMatrix<T>& p, | ||
| T alpha_low, T alpha_high, T c1, T c2){ | ||
| DistMatrix<T> x_j(x0); | ||
| DistMatrix<T> g2(p.Height(), 1); |
There was a problem hiding this comment.
It is important to construct DistMatrix instances in subroutines using any implied process grid (e.g., DistMatrix<T> g2(p.Height(), 1, x0.Grid());). Otherwise, this routine will not work when x0 used a non-default process grid.
There was a problem hiding this comment.
Yes, I figured I was messing this up somehow.
include/El/optimization/bfgs.hpp
Outdated
| T alpha_low, T alpha_high, T c1, T c2){ | ||
| DistMatrix<T> x_j(x0); | ||
| DistMatrix<T> g2(p.Height(), 1); | ||
| while (alpha_high - alpha_low > 10*El::limits::Epsilon<T>()){ |
There was a problem hiding this comment.
Is T assumed to be a real variable? Otherwise one should use El::limits::Epsilon<Base<T>>()
include/El/optimization/bfgs.hpp
Outdated
| DistMatrix<T> x_j(x0); | ||
| DistMatrix<T> g2(p.Height(), 1); | ||
| while (alpha_high - alpha_low > 10*El::limits::Epsilon<T>()){ | ||
| T alpha = (alpha_low + alpha_high)/2.0; |
There was a problem hiding this comment.
I would highly recommend against ever using floating-point literals in templated routines and would instead recommend T(2) instead of T(2.0). This happens to be safe for 2.0 but the conversion generally leads to a loss in precision.
include/El/optimization/bfgs.hpp
Outdated
| } | ||
| alpha_low = alpha; | ||
| } | ||
| return (alpha_high+alpha_low/2.0); |
include/El/optimization/bfgs.hpp
Outdated
| T lineSearch( const Function& f, const Gradient& gradient, | ||
| const DistMatrix<T>& g, Int D, | ||
| const DistMatrix<T>& x0, const DistMatrix<T>& p, | ||
| Int maxIter=100, T c1=1e-4, T c2=0.9){ |
There was a problem hiding this comment.
Same comment as above here; I would recommend using a function of machine epsilon for c1 and to set T c2=T(9)/T(10)
There was a problem hiding this comment.
Can you recommend the specific function?
There was a problem hiding this comment.
Pow(limits::Epsilon<Real>(),Real(1)/Real(4)) would return the fourth-root of machine epsilon, which is roughly 1-e4 for double-precision. I'm not saying that this is necessarily the right generalization, but it's worth considering whether you want 1e-4 for single precision.
| message(FATAL_ERROR "In-source build attempted; please clean the CMake cache and then switch to an out-of-source build, e.g.,\nrm CMakeCache.txt && rm -Rf CMakeFiles/\nmkdir build/ && cd build/ && cmake ..") | ||
| endif() | ||
|
|
||
| if (NOT CMAKE_BUILD_TYPE) |
There was a problem hiding this comment.
It is also important that we handle mis-set build types (e.g., the old PureDebug choice should cause an error). I think that we need to enumerate the allowed values and complain otherwise.
There was a problem hiding this comment.
Is that an old Elemental build type? whats the difference between it and Debug?
include/El/optimization/bfgs.hpp
Outdated
| @@ -47,18 +47,18 @@ template< typename T, typename Function, typename Gradient> | |||
| T lineSearch( const Function& f, const Gradient& gradient, | |||
There was a problem hiding this comment.
Capital letter for this function name please?
include/El/optimization/bfgs.hpp
Outdated
| T fval = 0; | ||
| T alpha(1); | ||
| T alpha_prev(0); | ||
| T alphaMax(1e3); |
There was a problem hiding this comment.
I am not completely sure, but my intuition is that 1e3 is a double-precision literal.
There was a problem hiding this comment.
T x( __ ); will ensure the right type. What is the concern?
There was a problem hiding this comment.
It is moreso to call attention to the fact that there is a conversion happening. This is fine in this case since double-precision represents 1e3 exactly.
include/El/optimization/bfgs.hpp
Outdated
| while (alpha_high - alpha_low > 10*El::limits::Epsilon<T>()){ | ||
| T alpha = (alpha_low + alpha_high)/2.0; | ||
| DistMatrix<T> g2(p.Height(), 1, x0.Grid()); | ||
| while (alpha_high - alpha_low > T(10)*El::limits::Epsilon<Base<T>>()){ |
There was a problem hiding this comment.
There is no reason to need the El:: prefix in this case since this function is defined within the El namespace. The only exceptions that I'm aware of are due to conflicts caused by Argument Dependent Lookup where one of the arguments comes from a different namespace which has a function with the same name as the one from the El namespace.
There was a problem hiding this comment.
Also, for the sake of consistency with the rest of the library, can we use symmetry with braces and put them on a line by themselves?
include/El/optimization/bfgs.hpp
Outdated
| for( std::size_t iter=0; (norm_g > 100*limits::Epsilon<T>()); ++iter){ | ||
| for( std::size_t iter=0; (norm_g > T(100)*limits::Epsilon<Base<T>>()); ++iter){ | ||
| //std::cout << "iter: " << iter << std::endl; | ||
| //El::Display(x, "Iterate"); |
There was a problem hiding this comment.
Can we remove the debugging artifacts?
include/El/optimization/bfgs.hpp
Outdated
| @@ -0,0 +1,226 @@ | |||
| /* | |||
| Copyright (c) 2009-2016, Ryan H. Lewis | |||
There was a problem hiding this comment.
I wasn't aware that you had been working on Elemental since 2009! :-)
include/El/optimization/bfgs.hpp
Outdated
| DistMatrix<T> g2(p.Height(), 1, x0.Grid()); | ||
| while (alpha_high - alpha_low > T(10)*limits::Epsilon<Base<T>>()) | ||
| { | ||
| T alpha = (alpha_low + alpha_high)/T(2.0); |
There was a problem hiding this comment.
It makes more sense to use T(2) rather than T(2.0), as the former will be an int to T conversion while the latter will be a double to T conversion.
include/El/optimization/bfgs.hpp
Outdated
| } | ||
| alpha_low = alpha; | ||
| } | ||
| return (alpha_high+alpha_low)/T(2.0); |
There was a problem hiding this comment.
Same comment as before about T(2) being preferred.
include/El/optimization/bfgs.hpp
Outdated
| namespace El { | ||
|
|
||
| template< typename T, typename Function, typename Gradient> | ||
| T Zoom( const Function& f, const Gradient& gradient, T f0, const DistMatrix<T>& x0, const DistMatrix<T>& p, |
There was a problem hiding this comment.
As mentioned before, it would be useful to have descriptive names for many of these constants; f, f0, and x0 seem reasonable, but p, alpha_low, alpha_high, c1, and c2 are ambiguous without referring to Nocedal and Wright. For that matter, it would make sense to name the function something more descriptive than Zoom.
There was a problem hiding this comment.
These parameters match the notation in nocedal and wright. I'd prefer to leave them and mention that explicitly. Is that acceptable?
include/El/optimization/bfgs.hpp
Outdated
| * This algorithm attempts to satsify the weak wolf conditions. | ||
| */ | ||
| template< typename T, typename Function, typename Gradient> | ||
| T lineSearch( const Function& f, const Gradient& gradient, |
There was a problem hiding this comment.
As mentioned before, it would be more stylistically consistent to call the routine LineSearch. And, since Elemental already contains several one-off line searches of various flavors (within the Interior Point Methods), it would be a good idea to prefix the name LineSearch with the type of the line search.
There was a problem hiding this comment.
I'm not sure what this line search is called. It's the one from nocedal and wright, more or less, but they don't name it. feel free to propose a name and i'll adopt it.
| #define CATCH_CONFIG_RUNNER | ||
| #include <catch.hpp> | ||
|
|
||
| using namespace El; |
There was a problem hiding this comment.
There is a rather strange mix of using and not using the El:: prefix for member variables and constants below.
There was a problem hiding this comment.
I try to always use namespaces (renaming them locally when they are too long to type). it enhances readability, but, I did copy this cpp file from another one in El.
| template< typename T> | ||
| std::pair< DistMatrix<T>, T> | ||
| SimpleQuadraticBFGSTest( const Int & N){ | ||
| const std::function< T(const DistMatrix<T>&)> |
There was a problem hiding this comment.
Why the switch to two-space indentation here?
There was a problem hiding this comment.
this is all CLion. I'm trying to get its rules correct. apologies.
| const std::function< T(const DistMatrix<T>&)> | ||
| quadratic_function = [&](const DistMatrix<T>& theta) | ||
| { | ||
| DistMatrix<T> y( theta); |
There was a problem hiding this comment.
Two-space followed by six-space?
|
|
||
| template< typename T> | ||
| std::pair< DistMatrix<T>, T> | ||
| RosenbrockTest(){ |
There was a problem hiding this comment.
Thank you for adding a Rosenbrock test!
tests/optimization/bfgs.cpp
Outdated
| const std::function< T(const DistMatrix<T>&)> | ||
| rosenbrock = [&](const DistMatrix<T>& theta) | ||
| { | ||
| auto x1 = theta.Get(0,0); |
There was a problem hiding this comment.
Each Get call to a DistMatrix will involve an mpi::Broadcast under the hood; this is expensive for querying a vector of length two (when the vector could be duplicated across all processes), but I suppose this is okay for testing.
There was a problem hiding this comment.
yeah, that's bad. How do I fix this again?
There was a problem hiding this comment.
If theta is distributed as a DistMatrix<T,STAR,STAR> then one could call GetLocal instead.
include/El/optimization/bfgs.hpp
Outdated
| * | ||
| * Note: it is _not_ required that alphaLow < alphaHigh. | ||
| * | ||
| * @param f |
There was a problem hiding this comment.
I would strongly recommend documenting what each routine is meant to accomplish before documenting the parameters.
include/El/optimization/bfgs.hpp
Outdated
| } | ||
| }while( !done); | ||
| if( !limits::IsFinite(beta)){ RuntimeError("Line search failed to brack point satisfying weak wolfe conditions. Function may be unbounded below"); } | ||
| RuntimeError(std::string("[")+alpha+","+beta+"] brackets an interval containing a point satisfying WWC"); |
There was a problem hiding this comment.
RuntimeError is smarter than you're giving it credit and you can simplify this line to RuntimeError("[",alpha,",",beta,"] brackets an interval containing a point satisfying WWC");
This Pull Request adds support for Basic BFGS within Elemental. I am using the Zoom line search algorithm found on Page 60-61 of Nocedal & Wright. The BFGS implementation itself maintains the updates to the inverse of the approximate hessian, and simply applies them at every iteration.
I have tested the optimizer with three functions, f(x) = x'x, and f(x) = x'A'Ax + x'b, and the rosenbrock function. As such I have growing confidence in the correctness of the procedure.
The scheme uses random starting vectors, and I have extremely infrequently seen some of the tests converge to a vector of NaN's. I am not yet sure why this is happening, as I am not able to reproduce it regularly. Any help here would be appreciated.
Nocedal & Wright provide no guidance on the setting of alphaMax, which I have set to 100. There are a few parameters set which I am not confident on. Also, I would appreciated commentary on the stopping criteria for the Zoom procedure.
In addition, I have edited the CMakeLists.txt to allow the interactive CLion Debugger to function with Elemental, and I have removed the logic which forces a recompile on a git commit.
With 30-60 minute long recompiles, this logic gets in the way of active development.