Non-negative least squares

Regression analysis
Part of a series on
Models
Linear regression Simple regression Polynomial regression General linear model
Generalized linear model Vector generalized linear model Discrete choice Binomial regression Binary regression Logistic regression Multinomial logistic regression Mixed logit Probit Multinomial probit Ordered logit Ordered probit Poisson
Multilevel model Fixed effects Random effects Linear mixed-effects model Nonlinear mixed-effects model
Nonlinear regression Nonparametric Semiparametric Robust Quantile Isotonic Principal components Least angle Local Segmented
Errors-in-variables
Estimation
Least squares Linear Non-linear
Ordinary Weighted Generalized Generalized estimating equation
Partial Total Non-negative Ridge regression Regularized
Least absolute deviations Iteratively reweighted Bayesian Bayesian multivariate Least-squares spectral analysis
Background
Regression validation Mean and predicted response Errors and residuals Goodness of fit Studentized residual Gauss–Markov theorem
Mathematics portal
v t e

In mathematical optimization, the problem of non-negative least squares (NNLS) is a type of constrained least squares problem where the coefficients are not allowed to become negative. That is, given a matrix $A$ and a (column) vector of response variables $y$ , the goal is to find^[1]

\operatorname {arg\,min} \limits _{\mathbf {x} }\|\mathbf {Ax} -\mathbf {y} \|_{2}^{2}

subject to

x \geq 0

Here $x \geq 0$ means that each component of the vector $x$ should be non-negative, and $‖\cdot‖ 2$ denotes the Euclidean norm.

Non-negative least squares problems turn up as subproblems in matrix decomposition, e.g. in algorithms for PARAFAC^[2] and non-negative matrix/tensor factorization.^[3]^[4] The latter can be considered a generalization of NNLS.^[1]

Another generalization of NNLS is bounded-variable least squares (BVLS), with simultaneous upper and lower bounds $α i \leq x i \leq β i$ .^[5]^: 291^[6]

YouTube Encyclopedic

1/5
Views:
439 037
28 810
27 290
833
441 176

Transcription

Let's say I have some matrix A. Let's say it's an n-by-k matrix, and I have the equation Ax is equal to b. So in this case, x would have to be a member of Rk, because we have k columns here, and b is a member of Rn. Now, let's say that it just so happens that there is no solution to Ax is equal to b. What does that mean? Let's just expand out A. I think you already know what that means. If I write a like this, a1, a2, if I just write it as its columns vectors right there, all the way through ak, and then I multiply it times x1, x2, all the way through xk, this is the same thing as that equation there. I just kind of wrote out the two matrices. Now, this is the same thing as x1 times a1 plus x2 times a2, all the way to plus xk times ak is equal to the vector b. Now, if this has no solution, then that means that there's no set of weights here on the column vectors of a, where we can get to b. Or another way to say it is, no linear combinations of the column vectors of a will be equal to b. Or an even further way of saying it is that b is not in the column space of a. No linear combination of these guys can equal to that. So let's see if we can visualize it a bit. So let me draw the column space of a. So maybe the column space of a looks something like this right here. I'll just assume it's a plane in Rn. It doesn't have to be a plane. Things can be very general, but let's say that this is the column space. This is the column space of a. Now, if that's the column space and b is not in the column space, maybe we can draw b like this. Maybe b, let's say this is the origin right there, and b just pops out right there. So this is the 0 vector. This is my vector b, clearly not in my column spaces, clearly not in this plane. Now, up until now, we would get an equation like that. We would make an augmented matrix, put in reduced row echelon form, and get a line that said 0 equals 1, and we'd say, no solution, nothing we can do here. But what if we can do better? You know, we clearly can't find a solution to this. But what if we can find a solution that gets us close to this? So what if I want to find some x, I'll call it x-star for now, where-- so I want to find some x-star, where A times x-star is-- and this is a vector-- as close as possible-- let me write this-- as close to b as possible. Or another way to view it, when I say close, I'm talking about length, so I want to minimize the length of-- let me write this down. I want to minimize the length of b minus A times x-star. Now, some of you all might already know where this is going. But when you take the difference between 2 and then take its length, what does that look like? Let me just call Ax. Ax is going to be a member of my column space. Let me just call that v. Ax is equal to v. You multiply any vector in Rk times your matrix A, you're going to get a member of your column space. So any Ax is going to be in your column space. And maybe that is the vector v is equal to A times x-star. And we want this vector to get as close as possible to this as long as it stays-- I mean, it has to be in my column space. But we want the distance between this vector and this vector to be minimized. Now, I just want to show you where the terminology for this will come from. I haven't given it its proper title yet. If you were to take this vector-- let just call this vector v for simplicity-- that this is equivalent to the length of the vector. You take the difference between each of the elements. So b1 minus v1, b2 minus v2, all the way to bn minus vn. And if you take the length of this vector, this is the same thing as this. This is going to be equal to the square root. Let me take the length squared, actually. The length squared of this is just going to be b1 minus v1 squared plus b2 minus v2 squared plus all the way to bn minus vn squared. And I want to minimize this. So I want to make this value the least value that it can be possible, or I want to get the least squares estimate here. And that's why, this last minute or two when I was just explaining this, that was just to give you the motivation for why this right here is called the least squares estimate, or the least squares solution, or the least squares approximation for the equation Ax equals b. There is no solution to this, but maybe we can find some x-star, where if I multiply A times x-star, this is clearly going to be in my column space and I want to get this vector to be as close to b as possible. Now, we've already seen in several videos, what is the closest vector in any subspace to a vector that's not in my subspace? Well, the closest vector to it is the projection. The closest vector to b, that's in my subspace, is going to be the projection of b onto my column space. That is the closest vector there. So if I want to minimize this, I want to figure out my x-star, where Ax-star is equal to the projection of my vector b onto my subspace or onto the column space of A. Remember what we're doing here. We said Axb has no solution, but maybe we can find some x that gets us as close as possible. So I'm calling that my least squares solution or my least squares approximation. And this guy right here is clearly going to be in my column space, because you take some vector x times A, that's going to be a linear combination of these column vectors, so it's going to be in the column space. And I want this guy to be as close as possible to this guy. Well, the closest vector in my column space to that guy is the projection. So Ax needs to be equal to the projection of b on my column space. It needs to be equal to that. But this is still pretty hard to find. You saw how, you know, you took A times the inverse of A transpose A times A transpose. That's hard to find that transformation matrix. So let's see if we can find an easier way to figure out the least squares solution, or kind of our best solution. It's not THE solution. It's our BEST solution to this right here. That's why we call it the least squares solution or approximation. Let's just subtract b from both sides of this and we might get something interesting. So what happens if we take Ax minus the vector b on both sides of this equation? I'll do it up here on the right. On the left-hand side we get A times x-star. It's hard write the x and then the star because they're very similar. And we subtract b from it. We subtract our vector b. That's going to be equal to the projection of b onto our column space minus b. All I did is I subtracted b from both sides of this equation. Now, what is the projection of b minus our vector b? If we draw it right here, it's going to be this vector right-- let me do it in this orange color. It's going to be this right here. It's going to be that vector right there, right? If I take the projection of b, which is that, minus b, I'm going to get this vector. you we could say b plus this vector is equal to my projection of b onto my subspace. So this vector right here is orthogonal. It's actually part of the definition of a projection that this guy is going to be orthogonal to my subspace or to my column space. And so this guy is orthogonal to my column space. So I can write Ax-star minus b, it's orthogonal to my column space, or we could say it's a member of the orthogonal complement of my column space. The orthogonal complement is just the set of everything, all of the vectors that are orthogonal to everything in your subspace, in your column space right here. So this vector right here that's kind of pointing straight down onto my plane is clearly a member of the orthogonal complement of my column space. Now, this might look familiar to you already. What is the orthogonal complement of my column space? The orthogonal complement of my column space is equal to the null space of a transpose, or the left null space of A. We've done this in many, many videos. So we can say that A times my least squares estimate of the equation Ax is equal to b-- I wrote that. So x-star is my least squares solution to Ax is equal to b. So A times that minus b is a member of the null space of A transpose. Now, what does that mean? Well, that means that if I multiply A transpose times this guy right here, times Ax-star-- and let me, no I don't want to lose the vector signs there on the x. This is a vector. I don't want to forget that. Ax-star minus b. So if I multiply A transpose times this right there, that is the same thing is that, what am I going to get? Well, this is a member of the null space of A transpose, so this times A transpose has got to be equal to 0. It is a solution to A transpose times something is equal to the 0 vector. Now. Let's see if we can simplify this a little bit. We get A transpose A times x-star minus A transpose b is equal to 0, and then if we add this term to both sides of the equation, we are left with A transpose A times the least squares solution to Ax equal to b is equal to A transpose b. That's what we get. Now, why did we do all of this work? Remember what we started with. We said we're trying to find a solution to Ax is equal to b, but there was no solution. So we said, well, let's find at least an x-star that minimizes b, that minimizes the distance between b and Ax-star. And we call this the least squares solution. We call it the least squares solution because, when you actually take the length, or when you're minimizing the length, you're minimizing the squares of the differences right there. So it's the least squares solution. Now, to find this, we know that this has to be the closest vector in our subspace to b. And we know that the closest vector in our subspace to b is the projection of b onto our subspace, onto our column space of A. And so, we know that A-- let me switch colors. We know that A times our least squares solution should be equal to the projection of b onto the column space of A. If we can find some x in Rk that satisfies this, that is our least squares solution. But we've seen before that the projection b is easier said than done. You know, there's a lot of work to it. So maybe we can do it a simpler way. And this is our simpler way. If we're looking for this, alternately, we can just find a solution to this equation. So you give me an Ax equal to b, there is no solution. Well, what I'm going to do is I'm just going to multiply both sides of this equation times A transpose. If I multiply both sides of this equation by A transpose, I get A transpose times Ax is equal to A transpose-- and I want to do that in the same blue-- A-- no, that's not the same blue-- A transpose b. All I did is I multiplied both sides of this. Now, the solution to this equation will not be the same as the solution to this equation. This right here will always have a solution, and this right here is our least squares solution. So this right here is our least squares solution. And notice, this is some matrix, and then this right here is some vector. This right here is some vector. So long as we can find a solution here, we've given our best shot at finding a solution to Ax equal to b. We've minimized the error. We're going to get Ax-star, and the difference between Ax-star and b is going to be minimized. It's going to be our least squares solution. It's all a little bit abstract right now in this video, but hopefully, in the next video, we'll realize that it's actually a very, very useful concept.

Quadratic programming version

The NNLS problem is equivalent to a quadratic programming problem

\operatorname {arg\,min} \limits _{\mathbf {x\geq 0} }\left({\frac {1}{2}}\mathbf {x} ^{\mathsf {T}}\mathbf {Q} \mathbf {x} +\mathbf {c} ^{\mathsf {T}}\mathbf {x} \right),

where $Q$ = $A T A$ and $c$ = $- A T y$ . This problem is convex, as $Q$ is positive semidefinite and the non-negativity constraints form a convex feasible set.^[7]

Algorithms

The first widely used algorithm for solving this problem is an active-set method published by Lawson and Hanson in their 1974 book Solving Least Squares Problems.^[5]^: 291 In pseudocode, this algorithm looks as follows:^[1]^[2]

Inputs:
- a real-valued matrix $A$ of dimension $m \times n$ ,
- a real-valued vector $y$ of dimension $m$ ,
- a real value $ε$ , the tolerance for the stopping criterion.
Initialize:
- Set $P = \emptyset$ .
- Set $R = {1, ..., n$ }.
- Set $x$ to an all-zero vector of dimension $n$ .
- Set $w = A T (y - A x)$ .
- Let $w R$ denote the sub-vector with indexes from R
Main loop: while R ≠ ∅ and max(w^R) > ε:
- Let $j$ in $R$ be the index of $max(w R)$ in $w$ .
- Add $j$ to $P$ .
- Remove $j$ from $R$ .
- Let $A P$ be $A$ restricted to the variables included in $P$ .
- Let $s$ be vector of same length as $x$ . Let $s P$ denote the sub-vector with indexes from P, and let $s R$ denote the sub-vector with indexes from R.
- Set $s P = ((A P) T A P) -1 (A P) T y$
- Set $s R$ to zero
- While min(s^P) ≤ 0:
  - Let $α = min .mw-parser-output .sfrac{white-space:nowrap}.mw-parser-output .sfrac.tion,.mw-parser-output .sfrac .tion{display:inline-block;vertical-align:-0.5em;font-size:85%;text-align:center}.mw-parser-output .sfrac .num,.mw-parser-output .sfrac .den{display:block;line-height:1em;margin:0 0.1em}.mw-parser-output .sfrac .den{border-top:1px solid}.mw-parser-output .sr-only{border:0;clip:rect(0,0,0,0);height:1px;margin:-1px;overflow:hidden;padding:0;position:absolute;width:1px}xi/xi − si for i in P where si ≤ 0$ .
  - Set $x$ to $x + α (s - x)$ .
  - Move to $R$ all indices $j$ in $P$ such that $x j \leq 0$ .
  - Set $s P = ((A P) T A P) -1 (A P) T y$
  - Set $s R$ to zero.
- Set $x$ to $s$ .
- Set $w$ to $A T (y - A x)$ .
Output: x

This algorithm takes a finite number of steps to reach a solution and smoothly improves its candidate solution as it goes (so it can find good approximate solutions when cut off at a reasonable number of iterations), but is very slow in practice, owing largely to the computation of the pseudoinverse $((A P) T A P) -1$ .^[1] Variants of this algorithm are available in MATLAB as the routine lsqnonneg^[8]^[1] and in SciPy as optimize.nnls.^[9]

Many improved algorithms have been suggested since 1974.^[1] Fast NNLS (FNNLS) is an optimized version of the Lawson—Hanson algorithm.^[2] Other algorithms include variants of Landweber's gradient descent method^[10] and coordinate-wise optimization based on the quadratic programming problem above.^[7]

References

^ ^a ^b ^c ^d ^e ^f Chen, Donghui; Plemmons, Robert J. (2009). Nonnegativity constraints in numerical analysis. Symposium on the Birth of Numerical Analysis. CiteSeerX 10.1.1.157.9203.
^ ^a ^b ^c Bro, Rasmus; De Jong, Sijmen (1997). "A fast non-negativity-constrained least squares algorithm". Journal of Chemometrics. 11 (5): 393. doi:10.1002/(SICI)1099-128X(199709/10)11:5<393::AID-CEM483>3.0.CO;2-L.
^ Lin, Chih-Jen (2007). "Projected Gradient Methods for Nonnegative Matrix Factorization" (PDF). Neural Computation. 19 (10): 2756–2779. CiteSeerX 10.1.1.308.9135. doi:10.1162/neco.2007.19.10.2756. PMID 17716011.
^ Boutsidis, Christos; Drineas, Petros (2009). "Random projections for the nonnegative least-squares problem". Linear Algebra and Its Applications. 431 (5–7): 760–771. arXiv:0812.4547. doi:10.1016/j.laa.2009.03.026.
^ ^a ^b Lawson, Charles L.; Hanson, Richard J. (1995). "23. Linear Least Squares with Linear Inequality Constraints". Solving Least Squares Problems. SIAM. p. 161. doi:10.1137/1.9781611971217.ch23. ISBN 978-0-89871-356-5.
^ Stark, Philip B.; Parker, Robert L. (1995). "Bounded-variable least-squares: an algorithm and applications" (PDF). Computational Statistics. 10: 129.
^ ^a ^b Franc, Vojtěch; Hlaváč, Václav; Navara, Mirko (2005). "Sequential Coordinate-Wise Algorithm for the Non-negative Least Squares Problem". Computer Analysis of Images and Patterns. Lecture Notes in Computer Science. Vol. 3691. pp. 407–414. doi:10.1007/11556121_50. ISBN 978-3-540-28969-2.
^ "lsqnonneg". MATLAB Documentation. Retrieved October 28, 2022.
^ "scipy.optimize.nnls". SciPy v0.13.0 Reference Guide. Retrieved 25 January 2014.
^ Johansson, B. R.; Elfving, T.; Kozlov, V.; Censor, Y.; Forssén, P. E.; Granlund, G. S. (2006). "The application of an oblique-projected Landweber method to a model of supervised learning". Mathematical and Computer Modelling. 43 (7–8): 892. doi:10.1016/j.mcm.2005.12.010.

This page was last edited on 26 August 2023, at 06:30

From Wikipedia, the free encyclopedia

YouTube Encyclopedic

Transcription

Quadratic programming version

Algorithms

See also

References