Optimization problem using exact Hessian

In summary: You can solve for the unique matrix C by using the LU decomposition of W:Solve for C in CV = W (where C is a matrix and V&W are column vectors).via C*(v#q) = w#q = Cwhere q is a row vector and # is some outter product, such that v#q = I = Identity.
  • #1
brydustin
205
0
hi, I'm kind of new to optimization theory, and I have to maximize a multi-dimensional problem where I know the exact gradient and hessian. In other words, techniques such as BFGS are not sufficient because I don't want to approximate the Hessian (with an initial guess for example of H=I), I have the exact (analytical) Hessian and want to optimize my problem up to second order
i.e.
f(x + δ) ≈ f(x) + gT δ +.5*δTHδ (2nd order Taylor expansiona around δ)

Could someone please suggest a technique? Thanks in advance!
 
Physics news on Phys.org
  • #2
Newton's Method?
 
  • #3
Well yes, I was thinking about this but was unsure how to best pick gamma
for
x_(n+1) = x_n - γ[Hf(x_n)]^-1 Del(f(x_n)
such that gamma satisfies the Wolfe conditions and the optimization is the quickest possible.

Also I wasn't sure if Newton's method was the best analytical method or if its just the first one everyone learns (like bisection method in single variable numerical methods is first day technique).
 
  • #4
Optimizing a second order polynomial is called quadratic programming, and you can find a lot of literature on it by doing a google search. Are you solving this inside of MATLAB or something, or writing your own optimizer from scratch?
 
  • #5
Office_Shredder said:
Optimizing a second order polynomial is called quadratic programming, and you can find a lot of literature on it by doing a google search. Are you solving this inside of MATLAB or something, or writing your own optimizer from scratch?


I believe that I'm going to be writting this from scratch, ... , I may not though, if I am satisfied with what Mathematica or Fortran has to offer.
 
  • #6
Matlab's interior point algorithm, called by fmincon:
http://www.mathworks.com/help/toolbox/optim/ug/fmincon.html

allows you to supply the hessian of your objective function/constraints. Here's an example code:
http://www.mathworks.com/help/toolbox/optim/ug/brn4nh7.html#bri8026

If the dimension of your problem is small (less than 100 maybe, depending on the complexity of your function/constraints) you can also try the active set/sequential quadratic programming methods. You can't give it the hessian, but that doesn't change the fact that they can still be effective optimizers. I think active-set requires fewer function evaluations on average, so if your function takes a while to evaluate it might be worth trying
 
Last edited by a moderator:
  • #7
brydustin said:
I believe that I'm going to be writting this from scratch, ... , I may not though, if I am satisfied with what Mathematica or Fortran has to offer.
So have you tried Mathematica's NMinimize function? Or NSolve to find the root of the gradient? (I would be inclined to try NSolve first with an analytical Hessian.)

On Newton's method, I wouldn't worry that much about picking an "optimal" gamma. Just pick something, halve it if the step doesn't take you to an improved place, double it if it works well.

On quadratic programming: Do you have inequality constraints that you haven't mentioned? If not, QP will not buy you anything. Without inequalities, QP is trivial, and will just reduce to Newton in your case.
 
  • #8
pmsrw3 said:
So have you tried Mathematica's NMinimize function? Or NSolve to find the root of the gradient? (I would be inclined to try NSolve first with an analytical Hessian.)

On Newton's method, I wouldn't worry that much about picking an "optimal" gamma. Just pick something, halve it if the step doesn't take you to an improved place, double it if it works well.

On quadratic programming: Do you have inequality constraints that you haven't mentioned? If not, QP will not buy you anything. Without inequalities, QP is trivial, and will just reduce to Newton in your case.

Thanks for the ideas. I do not have any inequality constraints... so yeah, I guess I'll just give this Newton method a go. Thanks for the ideas, I'll try the NSolve/NMinimize functions as well.
 
  • #9
Hi again...
I have another question related to this post, after applying the method discussed above.
I iteratively calculate a matrix C by some method such that an input vector V, is used to generate it, and it is iteratively done... additionally, V is iteratively changed... let the final value be W.
then this means that (C_final)*...(C_2nd_iteration)*(C_first_iteration)*V = W.
However, I would like to calculate the product above, and I DONT want to add the additional obvious solution to my code that would be intensive (i.e. C_(k+1) = C_(k)*C(k-1) or some similar equivalent).

what I would like to do is something along the lines of
Solve for C in CV = W (where C is a matrix and V&W are column vectors).
via C*(v#q) = w#q = C
where q is a row vector and # is some outter product, such that v#q = I = Identity.
I know very little about the outter product (uniqueness, how to calculate, etc...)
So yeah... how can I find q?
thanks a lot for the help, its very appreciated!
 
  • #10
In principle what you want to do is very simple. You do your iteration on one vector v1, then you do it on a second v2, ..., until you've done a full basis set. Then you can easily solve for C. In fact, if you choose the basis vectors [itex]\delta_{ij}[/itex], then you get C directly. But: this actually turns out to be exactly as much work as just multiplying the matrices, which you were trying to avoid.

I don't think there's any way to get that product matrix any more efficiently than by multiplying all the component matrices. You can't solve for C on the basis of a single column vector. You need, in fact, exactly as many independent vectors as the dimension of C.
 
  • #11
pmsrw3 said:
In principle what you want to do is very simple. You do your iteration on one vector v1, then you do it on a second v2, ..., until you've done a full basis set. Then you can easily solve for C. In fact, if you choose the basis vectors [itex]\delta_{ij}[/itex], then you get C directly. But: this actually turns out to be exactly as much work as just multiplying the matrices, which you were trying to avoid.

I don't think there's any way to get that product matrix any more efficiently than by multiplying all the component matrices. You can't solve for C on the basis of a single column vector. You need, in fact, exactly as many independent vectors as the dimension of C.

yeah, I was just thinking that the calculation would be impossible ... it would mean that
the "outter-product inverse" of V in the first slot would just be the scalar inverse of the first element of V (so {1,1} would equal one, when "solving" for I), but then the next element in the "inverse" would have to be zero, so that the the rest of the row in I is zero, and so on. But immediately, we see the problem... thanks for the advice. Absurd ideas are the laundry work of scientists/mathematicians.
 

FAQ: Optimization problem using exact Hessian

1. What is an optimization problem?

An optimization problem is a type of mathematical problem where the goal is to find the best solution from all possible solutions. The best solution is typically defined as the one that maximizes or minimizes a specific objective function, while satisfying a set of constraints.

2. What is the Hessian matrix?

The Hessian matrix is a square matrix of second-order partial derivatives of a function. It is used to determine the curvature of a function at a specific point and is often used in optimization algorithms to find the minimum or maximum of a function.

3. How is the Hessian matrix used in optimization problems?

The Hessian matrix is used in optimization problems to calculate the direction and step size of an algorithm at each iteration. It is also used to determine the convergence of the algorithm and to check if a point is a local minimum or maximum.

4. What is the difference between exact Hessian and approximate Hessian?

The exact Hessian is calculated using the second-order partial derivatives of a function, while the approximate Hessian is estimated using numerical methods such as finite differences. The exact Hessian is more accurate but can be computationally expensive, while the approximate Hessian is less accurate but computationally efficient.

5. What are the advantages of using the exact Hessian in optimization problems?

The exact Hessian can provide more accurate results compared to using an approximate Hessian. It can also help improve the convergence rate of optimization algorithms and provide more information about the curvature of a function. However, it may not be feasible to use the exact Hessian in large-scale optimization problems due to its computational cost.

Similar threads

Replies
1
Views
2K
Replies
6
Views
2K
Replies
1
Views
2K
Replies
1
Views
3K
Replies
0
Views
1K
Replies
3
Views
2K
Replies
5
Views
1K
Back
Top