Comparing LU or QR decompositions for solving least squares

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
4
down vote

favorite
2













Let $X in R^mtimes n$ with $m>n$. We aim to solve $y=Xbeta$ where $hatbeta$ is the least square estimator. The least squares solution for
$hatbeta = (X^TX)^-1X^Ty$ can be obtained using QR decomposition
on $X$ and $LU$ decomposition on $X^TX$. The aim to compare these.




I noticed that we can use Cholesky decomposition instead of $LU$, since $X^TX$ is symmetric and positive definite.



Using $LU$ we have:



$hatbeta = (X^TX)^-1X^Ty=(LU)^-1X^Ty$, solve $a=X^Ty$ which is order $O(2nm)$, then $L^-1b=a$ at cost $sum_1^k=n (2k-1)$ and finally $U^-1a$ at the same cost of $sum_1^k=n (2k-1)$.



I didn't count the cost of computing $L^-1$ and $U^-1$.



Using $QR$ we have:
$hatbeta = (X^TX)^-1X^Ty=((QR)^TQR)^-1R^TQ^Ty=R^-1Q^Ty$, where we solve $Q^Ty=a$ at cost $O(n^2)$ and $R^-1a$ with cost $sum_1^k=n (2k-1)$.



Comparing the decompositions:
It seems that QR decomposition is much better than LU. I think the cost of computing QR is higher than LU, which is why we could prefer to use LU. On the other hand if we are given the decompositions, we should use QR.



$SVD$ decomposition:
Is there any advantage to use SVD decomposition?







share|cite|improve this question




















  • why does it seem that QR is much better? $sum _1 ^n (2k-1) = mathcalO (n^2)$. So both are $mathcalO (n^2)$
    – Paul
    Dec 2 '16 at 13:48











  • The exact number must be bigger, since $2nm>n^2$ and we are adding 2 summs for $LU$ instead of 1
    – GRS
    Dec 2 '16 at 13:51










  • @user251257 I think it's the opposite. LU is twice as expensive: $O(2nm)+2O(n^2)$ as opposed to QR: $O(n^2)+O(n^2)$
    – GRS
    Dec 2 '16 at 14:00










  • Which size is $Q$?
    – user251257
    Dec 2 '16 at 14:02










  • Also the way you come up with $R^-1 Q^T y$ isn't sound.
    – user251257
    Dec 2 '16 at 14:03














up vote
4
down vote

favorite
2













Let $X in R^mtimes n$ with $m>n$. We aim to solve $y=Xbeta$ where $hatbeta$ is the least square estimator. The least squares solution for
$hatbeta = (X^TX)^-1X^Ty$ can be obtained using QR decomposition
on $X$ and $LU$ decomposition on $X^TX$. The aim to compare these.




I noticed that we can use Cholesky decomposition instead of $LU$, since $X^TX$ is symmetric and positive definite.



Using $LU$ we have:



$hatbeta = (X^TX)^-1X^Ty=(LU)^-1X^Ty$, solve $a=X^Ty$ which is order $O(2nm)$, then $L^-1b=a$ at cost $sum_1^k=n (2k-1)$ and finally $U^-1a$ at the same cost of $sum_1^k=n (2k-1)$.



I didn't count the cost of computing $L^-1$ and $U^-1$.



Using $QR$ we have:
$hatbeta = (X^TX)^-1X^Ty=((QR)^TQR)^-1R^TQ^Ty=R^-1Q^Ty$, where we solve $Q^Ty=a$ at cost $O(n^2)$ and $R^-1a$ with cost $sum_1^k=n (2k-1)$.



Comparing the decompositions:
It seems that QR decomposition is much better than LU. I think the cost of computing QR is higher than LU, which is why we could prefer to use LU. On the other hand if we are given the decompositions, we should use QR.



$SVD$ decomposition:
Is there any advantage to use SVD decomposition?







share|cite|improve this question




















  • why does it seem that QR is much better? $sum _1 ^n (2k-1) = mathcalO (n^2)$. So both are $mathcalO (n^2)$
    – Paul
    Dec 2 '16 at 13:48











  • The exact number must be bigger, since $2nm>n^2$ and we are adding 2 summs for $LU$ instead of 1
    – GRS
    Dec 2 '16 at 13:51










  • @user251257 I think it's the opposite. LU is twice as expensive: $O(2nm)+2O(n^2)$ as opposed to QR: $O(n^2)+O(n^2)$
    – GRS
    Dec 2 '16 at 14:00










  • Which size is $Q$?
    – user251257
    Dec 2 '16 at 14:02










  • Also the way you come up with $R^-1 Q^T y$ isn't sound.
    – user251257
    Dec 2 '16 at 14:03












up vote
4
down vote

favorite
2









up vote
4
down vote

favorite
2






2






Let $X in R^mtimes n$ with $m>n$. We aim to solve $y=Xbeta$ where $hatbeta$ is the least square estimator. The least squares solution for
$hatbeta = (X^TX)^-1X^Ty$ can be obtained using QR decomposition
on $X$ and $LU$ decomposition on $X^TX$. The aim to compare these.




I noticed that we can use Cholesky decomposition instead of $LU$, since $X^TX$ is symmetric and positive definite.



Using $LU$ we have:



$hatbeta = (X^TX)^-1X^Ty=(LU)^-1X^Ty$, solve $a=X^Ty$ which is order $O(2nm)$, then $L^-1b=a$ at cost $sum_1^k=n (2k-1)$ and finally $U^-1a$ at the same cost of $sum_1^k=n (2k-1)$.



I didn't count the cost of computing $L^-1$ and $U^-1$.



Using $QR$ we have:
$hatbeta = (X^TX)^-1X^Ty=((QR)^TQR)^-1R^TQ^Ty=R^-1Q^Ty$, where we solve $Q^Ty=a$ at cost $O(n^2)$ and $R^-1a$ with cost $sum_1^k=n (2k-1)$.



Comparing the decompositions:
It seems that QR decomposition is much better than LU. I think the cost of computing QR is higher than LU, which is why we could prefer to use LU. On the other hand if we are given the decompositions, we should use QR.



$SVD$ decomposition:
Is there any advantage to use SVD decomposition?







share|cite|improve this question













Let $X in R^mtimes n$ with $m>n$. We aim to solve $y=Xbeta$ where $hatbeta$ is the least square estimator. The least squares solution for
$hatbeta = (X^TX)^-1X^Ty$ can be obtained using QR decomposition
on $X$ and $LU$ decomposition on $X^TX$. The aim to compare these.




I noticed that we can use Cholesky decomposition instead of $LU$, since $X^TX$ is symmetric and positive definite.



Using $LU$ we have:



$hatbeta = (X^TX)^-1X^Ty=(LU)^-1X^Ty$, solve $a=X^Ty$ which is order $O(2nm)$, then $L^-1b=a$ at cost $sum_1^k=n (2k-1)$ and finally $U^-1a$ at the same cost of $sum_1^k=n (2k-1)$.



I didn't count the cost of computing $L^-1$ and $U^-1$.



Using $QR$ we have:
$hatbeta = (X^TX)^-1X^Ty=((QR)^TQR)^-1R^TQ^Ty=R^-1Q^Ty$, where we solve $Q^Ty=a$ at cost $O(n^2)$ and $R^-1a$ with cost $sum_1^k=n (2k-1)$.



Comparing the decompositions:
It seems that QR decomposition is much better than LU. I think the cost of computing QR is higher than LU, which is why we could prefer to use LU. On the other hand if we are given the decompositions, we should use QR.



$SVD$ decomposition:
Is there any advantage to use SVD decomposition?









share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Dec 2 '16 at 13:39









GRS

1,130721




1,130721











  • why does it seem that QR is much better? $sum _1 ^n (2k-1) = mathcalO (n^2)$. So both are $mathcalO (n^2)$
    – Paul
    Dec 2 '16 at 13:48











  • The exact number must be bigger, since $2nm>n^2$ and we are adding 2 summs for $LU$ instead of 1
    – GRS
    Dec 2 '16 at 13:51










  • @user251257 I think it's the opposite. LU is twice as expensive: $O(2nm)+2O(n^2)$ as opposed to QR: $O(n^2)+O(n^2)$
    – GRS
    Dec 2 '16 at 14:00










  • Which size is $Q$?
    – user251257
    Dec 2 '16 at 14:02










  • Also the way you come up with $R^-1 Q^T y$ isn't sound.
    – user251257
    Dec 2 '16 at 14:03
















  • why does it seem that QR is much better? $sum _1 ^n (2k-1) = mathcalO (n^2)$. So both are $mathcalO (n^2)$
    – Paul
    Dec 2 '16 at 13:48











  • The exact number must be bigger, since $2nm>n^2$ and we are adding 2 summs for $LU$ instead of 1
    – GRS
    Dec 2 '16 at 13:51










  • @user251257 I think it's the opposite. LU is twice as expensive: $O(2nm)+2O(n^2)$ as opposed to QR: $O(n^2)+O(n^2)$
    – GRS
    Dec 2 '16 at 14:00










  • Which size is $Q$?
    – user251257
    Dec 2 '16 at 14:02










  • Also the way you come up with $R^-1 Q^T y$ isn't sound.
    – user251257
    Dec 2 '16 at 14:03















why does it seem that QR is much better? $sum _1 ^n (2k-1) = mathcalO (n^2)$. So both are $mathcalO (n^2)$
– Paul
Dec 2 '16 at 13:48





why does it seem that QR is much better? $sum _1 ^n (2k-1) = mathcalO (n^2)$. So both are $mathcalO (n^2)$
– Paul
Dec 2 '16 at 13:48













The exact number must be bigger, since $2nm>n^2$ and we are adding 2 summs for $LU$ instead of 1
– GRS
Dec 2 '16 at 13:51




The exact number must be bigger, since $2nm>n^2$ and we are adding 2 summs for $LU$ instead of 1
– GRS
Dec 2 '16 at 13:51












@user251257 I think it's the opposite. LU is twice as expensive: $O(2nm)+2O(n^2)$ as opposed to QR: $O(n^2)+O(n^2)$
– GRS
Dec 2 '16 at 14:00




@user251257 I think it's the opposite. LU is twice as expensive: $O(2nm)+2O(n^2)$ as opposed to QR: $O(n^2)+O(n^2)$
– GRS
Dec 2 '16 at 14:00












Which size is $Q$?
– user251257
Dec 2 '16 at 14:02




Which size is $Q$?
– user251257
Dec 2 '16 at 14:02












Also the way you come up with $R^-1 Q^T y$ isn't sound.
– user251257
Dec 2 '16 at 14:03




Also the way you come up with $R^-1 Q^T y$ isn't sound.
– user251257
Dec 2 '16 at 14:03










1 Answer
1






active

oldest

votes

















up vote
0
down vote













Matrix Computations



Golub and Van Loan, 3e



$S$ 5.7.1, p. 270



Comparing flop counts for operations on $ntimes n$ matrices:



$frac23n^3 $ $qquad$ Gaussian elimination



$frac43n^3 $ $qquad$ Householder orthogonalization



$2n^3$ $qquad $ Modified Gram-Schmidt



$frac83n^3 $ $qquad$ Bidiagonalization



$12n^3$ $qquad$ Singular Value Decomposition



Three reasons to choose orthogonalization to solve square systems:



  1. Flop counts exaggerate the Gaussian elimination advantage. When memory traffic and vectorization overheads are considered the $mathbfQmathbfR$ factorization is comparable in efficiency.


  2. Orthogonalization methods have guaranteed stability, there is no "growth factor" to worry about as in Gaussian elimination.


  3. In cases of ill-conditioning, the orthogonalization methods give an added measure of reliability. $mathbfQmathbfR$ with condition estimate is very dependable and, of course, SVD is unsurpassed when it comes to producing a meaningful solution to a nearly singular system.






share|cite|improve this answer




















    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "69"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2040363%2fcomparing-lu-or-qr-decompositions-for-solving-least-squares%23new-answer', 'question_page');

    );

    Post as a guest






























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    Matrix Computations



    Golub and Van Loan, 3e



    $S$ 5.7.1, p. 270



    Comparing flop counts for operations on $ntimes n$ matrices:



    $frac23n^3 $ $qquad$ Gaussian elimination



    $frac43n^3 $ $qquad$ Householder orthogonalization



    $2n^3$ $qquad $ Modified Gram-Schmidt



    $frac83n^3 $ $qquad$ Bidiagonalization



    $12n^3$ $qquad$ Singular Value Decomposition



    Three reasons to choose orthogonalization to solve square systems:



    1. Flop counts exaggerate the Gaussian elimination advantage. When memory traffic and vectorization overheads are considered the $mathbfQmathbfR$ factorization is comparable in efficiency.


    2. Orthogonalization methods have guaranteed stability, there is no "growth factor" to worry about as in Gaussian elimination.


    3. In cases of ill-conditioning, the orthogonalization methods give an added measure of reliability. $mathbfQmathbfR$ with condition estimate is very dependable and, of course, SVD is unsurpassed when it comes to producing a meaningful solution to a nearly singular system.






    share|cite|improve this answer
























      up vote
      0
      down vote













      Matrix Computations



      Golub and Van Loan, 3e



      $S$ 5.7.1, p. 270



      Comparing flop counts for operations on $ntimes n$ matrices:



      $frac23n^3 $ $qquad$ Gaussian elimination



      $frac43n^3 $ $qquad$ Householder orthogonalization



      $2n^3$ $qquad $ Modified Gram-Schmidt



      $frac83n^3 $ $qquad$ Bidiagonalization



      $12n^3$ $qquad$ Singular Value Decomposition



      Three reasons to choose orthogonalization to solve square systems:



      1. Flop counts exaggerate the Gaussian elimination advantage. When memory traffic and vectorization overheads are considered the $mathbfQmathbfR$ factorization is comparable in efficiency.


      2. Orthogonalization methods have guaranteed stability, there is no "growth factor" to worry about as in Gaussian elimination.


      3. In cases of ill-conditioning, the orthogonalization methods give an added measure of reliability. $mathbfQmathbfR$ with condition estimate is very dependable and, of course, SVD is unsurpassed when it comes to producing a meaningful solution to a nearly singular system.






      share|cite|improve this answer






















        up vote
        0
        down vote










        up vote
        0
        down vote









        Matrix Computations



        Golub and Van Loan, 3e



        $S$ 5.7.1, p. 270



        Comparing flop counts for operations on $ntimes n$ matrices:



        $frac23n^3 $ $qquad$ Gaussian elimination



        $frac43n^3 $ $qquad$ Householder orthogonalization



        $2n^3$ $qquad $ Modified Gram-Schmidt



        $frac83n^3 $ $qquad$ Bidiagonalization



        $12n^3$ $qquad$ Singular Value Decomposition



        Three reasons to choose orthogonalization to solve square systems:



        1. Flop counts exaggerate the Gaussian elimination advantage. When memory traffic and vectorization overheads are considered the $mathbfQmathbfR$ factorization is comparable in efficiency.


        2. Orthogonalization methods have guaranteed stability, there is no "growth factor" to worry about as in Gaussian elimination.


        3. In cases of ill-conditioning, the orthogonalization methods give an added measure of reliability. $mathbfQmathbfR$ with condition estimate is very dependable and, of course, SVD is unsurpassed when it comes to producing a meaningful solution to a nearly singular system.






        share|cite|improve this answer












        Matrix Computations



        Golub and Van Loan, 3e



        $S$ 5.7.1, p. 270



        Comparing flop counts for operations on $ntimes n$ matrices:



        $frac23n^3 $ $qquad$ Gaussian elimination



        $frac43n^3 $ $qquad$ Householder orthogonalization



        $2n^3$ $qquad $ Modified Gram-Schmidt



        $frac83n^3 $ $qquad$ Bidiagonalization



        $12n^3$ $qquad$ Singular Value Decomposition



        Three reasons to choose orthogonalization to solve square systems:



        1. Flop counts exaggerate the Gaussian elimination advantage. When memory traffic and vectorization overheads are considered the $mathbfQmathbfR$ factorization is comparable in efficiency.


        2. Orthogonalization methods have guaranteed stability, there is no "growth factor" to worry about as in Gaussian elimination.


        3. In cases of ill-conditioning, the orthogonalization methods give an added measure of reliability. $mathbfQmathbfR$ with condition estimate is very dependable and, of course, SVD is unsurpassed when it comes to producing a meaningful solution to a nearly singular system.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered Mar 10 '17 at 22:19









        dantopa

        6,01831640




        6,01831640



























             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2040363%2fcomparing-lu-or-qr-decompositions-for-solving-least-squares%23new-answer', 'question_page');

            );

            Post as a guest