Comparing LU or QR decompositions for solving least squares

Clash Royale CLAN TAG#URR8PPP
up vote
4
down vote
favorite
Let $X in R^mtimes n$ with $m>n$. We aim to solve $y=Xbeta$ where $hatbeta$ is the least square estimator. The least squares solution for
$hatbeta = (X^TX)^-1X^Ty$ can be obtained using QR decomposition
on $X$ and $LU$ decomposition on $X^TX$. The aim to compare these.
I noticed that we can use Cholesky decomposition instead of $LU$, since $X^TX$ is symmetric and positive definite.
Using $LU$ we have:
$hatbeta = (X^TX)^-1X^Ty=(LU)^-1X^Ty$, solve $a=X^Ty$ which is order $O(2nm)$, then $L^-1b=a$ at cost $sum_1^k=n (2k-1)$ and finally $U^-1a$ at the same cost of $sum_1^k=n (2k-1)$.
I didn't count the cost of computing $L^-1$ and $U^-1$.
Using $QR$ we have:
$hatbeta = (X^TX)^-1X^Ty=((QR)^TQR)^-1R^TQ^Ty=R^-1Q^Ty$, where we solve $Q^Ty=a$ at cost $O(n^2)$ and $R^-1a$ with cost $sum_1^k=n (2k-1)$.
Comparing the decompositions:
It seems that QR decomposition is much better than LU. I think the cost of computing QR is higher than LU, which is why we could prefer to use LU. On the other hand if we are given the decompositions, we should use QR.
$SVD$ decomposition:
Is there any advantage to use SVD decomposition?
linear-algebra matrices statistics matrix-decomposition least-squares
 |Â
show 15 more comments
up vote
4
down vote
favorite
Let $X in R^mtimes n$ with $m>n$. We aim to solve $y=Xbeta$ where $hatbeta$ is the least square estimator. The least squares solution for
$hatbeta = (X^TX)^-1X^Ty$ can be obtained using QR decomposition
on $X$ and $LU$ decomposition on $X^TX$. The aim to compare these.
I noticed that we can use Cholesky decomposition instead of $LU$, since $X^TX$ is symmetric and positive definite.
Using $LU$ we have:
$hatbeta = (X^TX)^-1X^Ty=(LU)^-1X^Ty$, solve $a=X^Ty$ which is order $O(2nm)$, then $L^-1b=a$ at cost $sum_1^k=n (2k-1)$ and finally $U^-1a$ at the same cost of $sum_1^k=n (2k-1)$.
I didn't count the cost of computing $L^-1$ and $U^-1$.
Using $QR$ we have:
$hatbeta = (X^TX)^-1X^Ty=((QR)^TQR)^-1R^TQ^Ty=R^-1Q^Ty$, where we solve $Q^Ty=a$ at cost $O(n^2)$ and $R^-1a$ with cost $sum_1^k=n (2k-1)$.
Comparing the decompositions:
It seems that QR decomposition is much better than LU. I think the cost of computing QR is higher than LU, which is why we could prefer to use LU. On the other hand if we are given the decompositions, we should use QR.
$SVD$ decomposition:
Is there any advantage to use SVD decomposition?
linear-algebra matrices statistics matrix-decomposition least-squares
why does it seem that QR is much better? $sum _1 ^n (2k-1) = mathcalO (n^2)$. So both are $mathcalO (n^2)$
â Paul
Dec 2 '16 at 13:48
The exact number must be bigger, since $2nm>n^2$ and we are adding 2 summs for $LU$ instead of 1
â GRS
Dec 2 '16 at 13:51
@user251257 I think it's the opposite. LU is twice as expensive: $O(2nm)+2O(n^2)$ as opposed to QR: $O(n^2)+O(n^2)$
â GRS
Dec 2 '16 at 14:00
Which size is $Q$?
â user251257
Dec 2 '16 at 14:02
Also the way you come up with $R^-1 Q^T y$ isn't sound.
â user251257
Dec 2 '16 at 14:03
 |Â
show 15 more comments
up vote
4
down vote
favorite
up vote
4
down vote
favorite
Let $X in R^mtimes n$ with $m>n$. We aim to solve $y=Xbeta$ where $hatbeta$ is the least square estimator. The least squares solution for
$hatbeta = (X^TX)^-1X^Ty$ can be obtained using QR decomposition
on $X$ and $LU$ decomposition on $X^TX$. The aim to compare these.
I noticed that we can use Cholesky decomposition instead of $LU$, since $X^TX$ is symmetric and positive definite.
Using $LU$ we have:
$hatbeta = (X^TX)^-1X^Ty=(LU)^-1X^Ty$, solve $a=X^Ty$ which is order $O(2nm)$, then $L^-1b=a$ at cost $sum_1^k=n (2k-1)$ and finally $U^-1a$ at the same cost of $sum_1^k=n (2k-1)$.
I didn't count the cost of computing $L^-1$ and $U^-1$.
Using $QR$ we have:
$hatbeta = (X^TX)^-1X^Ty=((QR)^TQR)^-1R^TQ^Ty=R^-1Q^Ty$, where we solve $Q^Ty=a$ at cost $O(n^2)$ and $R^-1a$ with cost $sum_1^k=n (2k-1)$.
Comparing the decompositions:
It seems that QR decomposition is much better than LU. I think the cost of computing QR is higher than LU, which is why we could prefer to use LU. On the other hand if we are given the decompositions, we should use QR.
$SVD$ decomposition:
Is there any advantage to use SVD decomposition?
linear-algebra matrices statistics matrix-decomposition least-squares
Let $X in R^mtimes n$ with $m>n$. We aim to solve $y=Xbeta$ where $hatbeta$ is the least square estimator. The least squares solution for
$hatbeta = (X^TX)^-1X^Ty$ can be obtained using QR decomposition
on $X$ and $LU$ decomposition on $X^TX$. The aim to compare these.
I noticed that we can use Cholesky decomposition instead of $LU$, since $X^TX$ is symmetric and positive definite.
Using $LU$ we have:
$hatbeta = (X^TX)^-1X^Ty=(LU)^-1X^Ty$, solve $a=X^Ty$ which is order $O(2nm)$, then $L^-1b=a$ at cost $sum_1^k=n (2k-1)$ and finally $U^-1a$ at the same cost of $sum_1^k=n (2k-1)$.
I didn't count the cost of computing $L^-1$ and $U^-1$.
Using $QR$ we have:
$hatbeta = (X^TX)^-1X^Ty=((QR)^TQR)^-1R^TQ^Ty=R^-1Q^Ty$, where we solve $Q^Ty=a$ at cost $O(n^2)$ and $R^-1a$ with cost $sum_1^k=n (2k-1)$.
Comparing the decompositions:
It seems that QR decomposition is much better than LU. I think the cost of computing QR is higher than LU, which is why we could prefer to use LU. On the other hand if we are given the decompositions, we should use QR.
$SVD$ decomposition:
Is there any advantage to use SVD decomposition?
linear-algebra matrices statistics matrix-decomposition least-squares
asked Dec 2 '16 at 13:39
GRS
1,130721
1,130721
why does it seem that QR is much better? $sum _1 ^n (2k-1) = mathcalO (n^2)$. So both are $mathcalO (n^2)$
â Paul
Dec 2 '16 at 13:48
The exact number must be bigger, since $2nm>n^2$ and we are adding 2 summs for $LU$ instead of 1
â GRS
Dec 2 '16 at 13:51
@user251257 I think it's the opposite. LU is twice as expensive: $O(2nm)+2O(n^2)$ as opposed to QR: $O(n^2)+O(n^2)$
â GRS
Dec 2 '16 at 14:00
Which size is $Q$?
â user251257
Dec 2 '16 at 14:02
Also the way you come up with $R^-1 Q^T y$ isn't sound.
â user251257
Dec 2 '16 at 14:03
 |Â
show 15 more comments
why does it seem that QR is much better? $sum _1 ^n (2k-1) = mathcalO (n^2)$. So both are $mathcalO (n^2)$
â Paul
Dec 2 '16 at 13:48
The exact number must be bigger, since $2nm>n^2$ and we are adding 2 summs for $LU$ instead of 1
â GRS
Dec 2 '16 at 13:51
@user251257 I think it's the opposite. LU is twice as expensive: $O(2nm)+2O(n^2)$ as opposed to QR: $O(n^2)+O(n^2)$
â GRS
Dec 2 '16 at 14:00
Which size is $Q$?
â user251257
Dec 2 '16 at 14:02
Also the way you come up with $R^-1 Q^T y$ isn't sound.
â user251257
Dec 2 '16 at 14:03
why does it seem that QR is much better? $sum _1 ^n (2k-1) = mathcalO (n^2)$. So both are $mathcalO (n^2)$
â Paul
Dec 2 '16 at 13:48
why does it seem that QR is much better? $sum _1 ^n (2k-1) = mathcalO (n^2)$. So both are $mathcalO (n^2)$
â Paul
Dec 2 '16 at 13:48
The exact number must be bigger, since $2nm>n^2$ and we are adding 2 summs for $LU$ instead of 1
â GRS
Dec 2 '16 at 13:51
The exact number must be bigger, since $2nm>n^2$ and we are adding 2 summs for $LU$ instead of 1
â GRS
Dec 2 '16 at 13:51
@user251257 I think it's the opposite. LU is twice as expensive: $O(2nm)+2O(n^2)$ as opposed to QR: $O(n^2)+O(n^2)$
â GRS
Dec 2 '16 at 14:00
@user251257 I think it's the opposite. LU is twice as expensive: $O(2nm)+2O(n^2)$ as opposed to QR: $O(n^2)+O(n^2)$
â GRS
Dec 2 '16 at 14:00
Which size is $Q$?
â user251257
Dec 2 '16 at 14:02
Which size is $Q$?
â user251257
Dec 2 '16 at 14:02
Also the way you come up with $R^-1 Q^T y$ isn't sound.
â user251257
Dec 2 '16 at 14:03
Also the way you come up with $R^-1 Q^T y$ isn't sound.
â user251257
Dec 2 '16 at 14:03
 |Â
show 15 more comments
1 Answer
1
active
oldest
votes
up vote
0
down vote
Matrix Computations
Golub and Van Loan, 3e
$S$ 5.7.1, p. 270
Comparing flop counts for operations on $ntimes n$ matrices:
$frac23n^3 $ $qquad$ Gaussian elimination
$frac43n^3 $ $qquad$ Householder orthogonalization
$2n^3$ $qquad $ Modified Gram-Schmidt
$frac83n^3 $ $qquad$ Bidiagonalization
$12n^3$ $qquad$ Singular Value Decomposition
Three reasons to choose orthogonalization to solve square systems:
Flop counts exaggerate the Gaussian elimination advantage. When memory traffic and vectorization overheads are considered the $mathbfQmathbfR$ factorization is comparable in efficiency.
Orthogonalization methods have guaranteed stability, there is no "growth factor" to worry about as in Gaussian elimination.
In cases of ill-conditioning, the orthogonalization methods give an added measure of reliability. $mathbfQmathbfR$ with condition estimate is very dependable and, of course, SVD is unsurpassed when it comes to producing a meaningful solution to a nearly singular system.
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Matrix Computations
Golub and Van Loan, 3e
$S$ 5.7.1, p. 270
Comparing flop counts for operations on $ntimes n$ matrices:
$frac23n^3 $ $qquad$ Gaussian elimination
$frac43n^3 $ $qquad$ Householder orthogonalization
$2n^3$ $qquad $ Modified Gram-Schmidt
$frac83n^3 $ $qquad$ Bidiagonalization
$12n^3$ $qquad$ Singular Value Decomposition
Three reasons to choose orthogonalization to solve square systems:
Flop counts exaggerate the Gaussian elimination advantage. When memory traffic and vectorization overheads are considered the $mathbfQmathbfR$ factorization is comparable in efficiency.
Orthogonalization methods have guaranteed stability, there is no "growth factor" to worry about as in Gaussian elimination.
In cases of ill-conditioning, the orthogonalization methods give an added measure of reliability. $mathbfQmathbfR$ with condition estimate is very dependable and, of course, SVD is unsurpassed when it comes to producing a meaningful solution to a nearly singular system.
add a comment |Â
up vote
0
down vote
Matrix Computations
Golub and Van Loan, 3e
$S$ 5.7.1, p. 270
Comparing flop counts for operations on $ntimes n$ matrices:
$frac23n^3 $ $qquad$ Gaussian elimination
$frac43n^3 $ $qquad$ Householder orthogonalization
$2n^3$ $qquad $ Modified Gram-Schmidt
$frac83n^3 $ $qquad$ Bidiagonalization
$12n^3$ $qquad$ Singular Value Decomposition
Three reasons to choose orthogonalization to solve square systems:
Flop counts exaggerate the Gaussian elimination advantage. When memory traffic and vectorization overheads are considered the $mathbfQmathbfR$ factorization is comparable in efficiency.
Orthogonalization methods have guaranteed stability, there is no "growth factor" to worry about as in Gaussian elimination.
In cases of ill-conditioning, the orthogonalization methods give an added measure of reliability. $mathbfQmathbfR$ with condition estimate is very dependable and, of course, SVD is unsurpassed when it comes to producing a meaningful solution to a nearly singular system.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Matrix Computations
Golub and Van Loan, 3e
$S$ 5.7.1, p. 270
Comparing flop counts for operations on $ntimes n$ matrices:
$frac23n^3 $ $qquad$ Gaussian elimination
$frac43n^3 $ $qquad$ Householder orthogonalization
$2n^3$ $qquad $ Modified Gram-Schmidt
$frac83n^3 $ $qquad$ Bidiagonalization
$12n^3$ $qquad$ Singular Value Decomposition
Three reasons to choose orthogonalization to solve square systems:
Flop counts exaggerate the Gaussian elimination advantage. When memory traffic and vectorization overheads are considered the $mathbfQmathbfR$ factorization is comparable in efficiency.
Orthogonalization methods have guaranteed stability, there is no "growth factor" to worry about as in Gaussian elimination.
In cases of ill-conditioning, the orthogonalization methods give an added measure of reliability. $mathbfQmathbfR$ with condition estimate is very dependable and, of course, SVD is unsurpassed when it comes to producing a meaningful solution to a nearly singular system.
Matrix Computations
Golub and Van Loan, 3e
$S$ 5.7.1, p. 270
Comparing flop counts for operations on $ntimes n$ matrices:
$frac23n^3 $ $qquad$ Gaussian elimination
$frac43n^3 $ $qquad$ Householder orthogonalization
$2n^3$ $qquad $ Modified Gram-Schmidt
$frac83n^3 $ $qquad$ Bidiagonalization
$12n^3$ $qquad$ Singular Value Decomposition
Three reasons to choose orthogonalization to solve square systems:
Flop counts exaggerate the Gaussian elimination advantage. When memory traffic and vectorization overheads are considered the $mathbfQmathbfR$ factorization is comparable in efficiency.
Orthogonalization methods have guaranteed stability, there is no "growth factor" to worry about as in Gaussian elimination.
In cases of ill-conditioning, the orthogonalization methods give an added measure of reliability. $mathbfQmathbfR$ with condition estimate is very dependable and, of course, SVD is unsurpassed when it comes to producing a meaningful solution to a nearly singular system.
answered Mar 10 '17 at 22:19
dantopa
6,01831640
6,01831640
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2040363%2fcomparing-lu-or-qr-decompositions-for-solving-least-squares%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
why does it seem that QR is much better? $sum _1 ^n (2k-1) = mathcalO (n^2)$. So both are $mathcalO (n^2)$
â Paul
Dec 2 '16 at 13:48
The exact number must be bigger, since $2nm>n^2$ and we are adding 2 summs for $LU$ instead of 1
â GRS
Dec 2 '16 at 13:51
@user251257 I think it's the opposite. LU is twice as expensive: $O(2nm)+2O(n^2)$ as opposed to QR: $O(n^2)+O(n^2)$
â GRS
Dec 2 '16 at 14:00
Which size is $Q$?
â user251257
Dec 2 '16 at 14:02
Also the way you come up with $R^-1 Q^T y$ isn't sound.
â user251257
Dec 2 '16 at 14:03