Skip to content

WIP: Add BFGS & Catch Test Framework#183

Closed
rhl- wants to merge 39 commits intomasterfrom
add_bfgs
Closed

WIP: Add BFGS & Catch Test Framework#183
rhl- wants to merge 39 commits intomasterfrom
add_bfgs

Conversation

@rhl-
Copy link
Member

@rhl- rhl- commented Oct 18, 2016

This Pull Request adds support for Basic BFGS within Elemental. I am using the Zoom line search algorithm found on Page 60-61 of Nocedal & Wright. The BFGS implementation itself maintains the updates to the inverse of the approximate hessian, and simply applies them at every iteration.

I have tested the optimizer with three functions, f(x) = x'x, and f(x) = x'A'Ax + x'b, and the rosenbrock function. As such I have growing confidence in the correctness of the procedure.

The scheme uses random starting vectors, and I have extremely infrequently seen some of the tests converge to a vector of NaN's. I am not yet sure why this is happening, as I am not able to reproduce it regularly. Any help here would be appreciated.

Nocedal & Wright provide no guidance on the setting of alphaMax, which I have set to 100. There are a few parameters set which I am not confident on. Also, I would appreciated commentary on the stopping criteria for the Zoom procedure.

In addition, I have edited the CMakeLists.txt to allow the interactive CLion Debugger to function with Elemental, and I have removed the logic which forces a recompile on a git commit.

With 30-60 minute long recompiles, this logic gets in the way of active development.

@rhl- rhl- added the bug label Oct 18, 2016
@rhl- rhl- added enhancement and removed bug labels Oct 18, 2016
@rhl- rhl- added this to the Release 0.87 milestone Oct 18, 2016
CMakeLists.txt Outdated
message(WARNING "Build mode not specified, defaulting to Release build.")
set(CMAKE_BUILD_TYPE Release)
endif()
#if(CMAKE_BUILD_TYPE STREQUAL "Release")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you do when no CMake build type was specified?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what "CMake" mode this is, but, it appears that the output is no optimization flags on my mac. Maybe that means -O0 by default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone ahead and added this check, and setting to Release in the event that its not set. In the event that this is not set, there are numerous build errors within Elemental. This seems like a deficiency. I'll see if I can look more into this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The production CMake explicitly forces either Debug or Release by design. It is strange to remove this logic and claim it is a deficiency in the original code. There needs to be a decision as to whether to enable the debug checks or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just trying to use the CLion IDE Interactive Debugger. With those lines it will not build in debug mode, without them, it will.

What I said was a deficiency is that if you manage to leave CMAKE_BUILD_TYPE unspecified it gives a number of compile errors. I don't think there is anything controversial about that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you would like I can leave this change out of the diff, and we can discuss it separately.

namespace El {

template< typename T, typename Function, typename Gradient>
T zoom( const Function& f, const Gradient& gradient, T f0, const DistMatrix<T>& x0, const DistMatrix<T>& p,
Copy link
Member

@poulson poulson Oct 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the insistence on starting function names with lowercase letters (as opposed to the 10,000 other functions in the library)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry :) I'm trying my best to do camel case.

T zoom( const Function& f, const Gradient& gradient, T f0, const DistMatrix<T>& x0, const DistMatrix<T>& p,
T alpha_low, T alpha_high, T c1, T c2){
DistMatrix<T> x_j(x0);
DistMatrix<T> g2(p.Height(), 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is important to construct DistMatrix instances in subroutines using any implied process grid (e.g., DistMatrix<T> g2(p.Height(), 1, x0.Grid());). Otherwise, this routine will not work when x0 used a non-default process grid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I figured I was messing this up somehow.

T alpha_low, T alpha_high, T c1, T c2){
DistMatrix<T> x_j(x0);
DistMatrix<T> g2(p.Height(), 1);
while (alpha_high - alpha_low > 10*El::limits::Epsilon<T>()){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is T assumed to be a real variable? Otherwise one should use El::limits::Epsilon<Base<T>>()

DistMatrix<T> x_j(x0);
DistMatrix<T> g2(p.Height(), 1);
while (alpha_high - alpha_low > 10*El::limits::Epsilon<T>()){
T alpha = (alpha_low + alpha_high)/2.0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would highly recommend against ever using floating-point literals in templated routines and would instead recommend T(2) instead of T(2.0). This happens to be safe for 2.0 but the conversion generally leads to a loss in precision.

}
alpha_low = alpha;
}
return (alpha_high+alpha_low/2.0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above here.

T lineSearch( const Function& f, const Gradient& gradient,
const DistMatrix<T>& g, Int D,
const DistMatrix<T>& x0, const DistMatrix<T>& p,
Int maxIter=100, T c1=1e-4, T c2=0.9){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above here; I would recommend using a function of machine epsilon for c1 and to set T c2=T(9)/T(10)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you recommend the specific function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pow(limits::Epsilon<Real>(),Real(1)/Real(4)) would return the fourth-root of machine epsilon, which is roughly 1-e4 for double-precision. I'm not saying that this is necessarily the right generalization, but it's worth considering whether you want 1e-4 for single precision.

message(FATAL_ERROR "In-source build attempted; please clean the CMake cache and then switch to an out-of-source build, e.g.,\nrm CMakeCache.txt && rm -Rf CMakeFiles/\nmkdir build/ && cd build/ && cmake ..")
endif()

if (NOT CMAKE_BUILD_TYPE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also important that we handle mis-set build types (e.g., the old PureDebug choice should cause an error). I think that we need to enumerate the allowed values and complain otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that an old Elemental build type? whats the difference between it and Debug?

@@ -47,18 +47,18 @@ template< typename T, typename Function, typename Gradient>
T lineSearch( const Function& f, const Gradient& gradient,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capital letter for this function name please?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

T fval = 0;
T alpha(1);
T alpha_prev(0);
T alphaMax(1e3);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not completely sure, but my intuition is that 1e3 is a double-precision literal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

T x( __ ); will ensure the right type. What is the concern?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is moreso to call attention to the fact that there is a conversion happening. This is fine in this case since double-precision represents 1e3 exactly.

while (alpha_high - alpha_low > 10*El::limits::Epsilon<T>()){
T alpha = (alpha_low + alpha_high)/2.0;
DistMatrix<T> g2(p.Height(), 1, x0.Grid());
while (alpha_high - alpha_low > T(10)*El::limits::Epsilon<Base<T>>()){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reason to need the El:: prefix in this case since this function is defined within the El namespace. The only exceptions that I'm aware of are due to conflicts caused by Argument Dependent Lookup where one of the arguments comes from a different namespace which has a function with the same name as the one from the El namespace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, for the sake of consistency with the rest of the library, can we use symmetry with braces and put them on a line by themselves?

for( std::size_t iter=0; (norm_g > 100*limits::Epsilon<T>()); ++iter){
for( std::size_t iter=0; (norm_g > T(100)*limits::Epsilon<Base<T>>()); ++iter){
//std::cout << "iter: " << iter << std::endl;
//El::Display(x, "Iterate");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the debugging artifacts?

@@ -0,0 +1,226 @@
/*
Copyright (c) 2009-2016, Ryan H. Lewis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware that you had been working on Elemental since 2009! :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neither am I. Fixed.

DistMatrix<T> g2(p.Height(), 1, x0.Grid());
while (alpha_high - alpha_low > T(10)*limits::Epsilon<Base<T>>())
{
T alpha = (alpha_low + alpha_high)/T(2.0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes more sense to use T(2) rather than T(2.0), as the former will be an int to T conversion while the latter will be a double to T conversion.

}
alpha_low = alpha;
}
return (alpha_high+alpha_low)/T(2.0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before about T(2) being preferred.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

namespace El {

template< typename T, typename Function, typename Gradient>
T Zoom( const Function& f, const Gradient& gradient, T f0, const DistMatrix<T>& x0, const DistMatrix<T>& p,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned before, it would be useful to have descriptive names for many of these constants; f, f0, and x0 seem reasonable, but p, alpha_low, alpha_high, c1, and c2 are ambiguous without referring to Nocedal and Wright. For that matter, it would make sense to name the function something more descriptive than Zoom.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These parameters match the notation in nocedal and wright. I'd prefer to leave them and mention that explicitly. Is that acceptable?

* This algorithm attempts to satsify the weak wolf conditions.
*/
template< typename T, typename Function, typename Gradient>
T lineSearch( const Function& f, const Gradient& gradient,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned before, it would be more stylistically consistent to call the routine LineSearch. And, since Elemental already contains several one-off line searches of various flavors (within the Interior Point Methods), it would be a good idea to prefix the name LineSearch with the type of the line search.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this line search is called. It's the one from nocedal and wright, more or less, but they don't name it. feel free to propose a name and i'll adopt it.

#define CATCH_CONFIG_RUNNER
#include <catch.hpp>

using namespace El;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a rather strange mix of using and not using the El:: prefix for member variables and constants below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to always use namespaces (renaming them locally when they are too long to type). it enhances readability, but, I did copy this cpp file from another one in El.

template< typename T>
std::pair< DistMatrix<T>, T>
SimpleQuadraticBFGSTest( const Int & N){
const std::function< T(const DistMatrix<T>&)>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the switch to two-space indentation here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is all CLion. I'm trying to get its rules correct. apologies.

const std::function< T(const DistMatrix<T>&)>
quadratic_function = [&](const DistMatrix<T>& theta)
{
DistMatrix<T> y( theta);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two-space followed by six-space?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above.


template< typename T>
std::pair< DistMatrix<T>, T>
RosenbrockTest(){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding a Rosenbrock test!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

const std::function< T(const DistMatrix<T>&)>
rosenbrock = [&](const DistMatrix<T>& theta)
{
auto x1 = theta.Get(0,0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each Get call to a DistMatrix will involve an mpi::Broadcast under the hood; this is expensive for querying a vector of length two (when the vector could be duplicated across all processes), but I suppose this is okay for testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's bad. How do I fix this again?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If theta is distributed as a DistMatrix<T,STAR,STAR> then one could call GetLocal instead.

*
* Note: it is _not_ required that alphaLow < alphaHigh.
*
* @param f
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly recommend documenting what each routine is meant to accomplish before documenting the parameters.

}
}while( !done);
if( !limits::IsFinite(beta)){ RuntimeError("Line search failed to brack point satisfying weak wolfe conditions. Function may be unbounded below"); }
RuntimeError(std::string("[")+alpha+","+beta+"] brackets an interval containing a point satisfying WWC");
Copy link
Member

@poulson poulson Nov 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RuntimeError is smarter than you're giving it credit and you can simplify this line to RuntimeError("[",alpha,",",beta,"] brackets an interval containing a point satisfying WWC");

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool.

@rhl- rhl- modified the milestones: Release 0.87, Elemental Release 0.88 Nov 29, 2016
@rhl- rhl- changed the title Add BFGS & Catch Test Framework WIP: Add BFGS & Catch Test Framework Nov 30, 2016
@rhl- rhl- closed this Feb 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants