Gradient descent vs. system of equations

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












Given the matrices $mathbft_Mtimes 1$ and $mathbfQ_Mtimes N$, we want to find $mathbfp_Ntimes 1$ that minimizes $epsilon = ||mathbft - mathbfQmathbfp||_2$.



In order to do so, we could use the gradient descent method. My question is, assuming that we get the best possible result with the gradient descent method, will it be equivalent to clearing $mathbfp$ from the equation $mathbft = mathbfQmathbfp$ as



$mathbfQ^T mathbft = mathbfQ^T mathbfQmathbfp$,



$(mathbfQ^T mathbfQ)^-1 mathbfQ^T mathbft = (mathbfQ^T mathbfQ)^-1 (mathbfQ^T mathbfQ) mathbfp$,



and so
$mathbfp = (mathbfQ^T mathbfQ)^-1 mathbfQ^T mathbft$?



I see that by premultiplying by $mathbfQ^T$ we are 'colapsing' our system of equations (M equations, N unknowns) to N equations with N unknowns, but I can't see how this will minimize $epsilon$ (in case it does).










share|cite|improve this question





















  • Minimizing the error is equivalent to minimizing the the squared error, which is $epsilon^2 = |t-Qp|^2 = t^Tt-2t^TQp+p^TQ^TQp$. The minimum is attained when its gradient, $2Q^TQp-2Q^Tt$, equals zero, which occurs when $p=(Q^TQ)^-1Q^Tt$. Thus the two approaches are equivalent.
    – Rahul
    Sep 7 at 9:20















up vote
0
down vote

favorite












Given the matrices $mathbft_Mtimes 1$ and $mathbfQ_Mtimes N$, we want to find $mathbfp_Ntimes 1$ that minimizes $epsilon = ||mathbft - mathbfQmathbfp||_2$.



In order to do so, we could use the gradient descent method. My question is, assuming that we get the best possible result with the gradient descent method, will it be equivalent to clearing $mathbfp$ from the equation $mathbft = mathbfQmathbfp$ as



$mathbfQ^T mathbft = mathbfQ^T mathbfQmathbfp$,



$(mathbfQ^T mathbfQ)^-1 mathbfQ^T mathbft = (mathbfQ^T mathbfQ)^-1 (mathbfQ^T mathbfQ) mathbfp$,



and so
$mathbfp = (mathbfQ^T mathbfQ)^-1 mathbfQ^T mathbft$?



I see that by premultiplying by $mathbfQ^T$ we are 'colapsing' our system of equations (M equations, N unknowns) to N equations with N unknowns, but I can't see how this will minimize $epsilon$ (in case it does).










share|cite|improve this question





















  • Minimizing the error is equivalent to minimizing the the squared error, which is $epsilon^2 = |t-Qp|^2 = t^Tt-2t^TQp+p^TQ^TQp$. The minimum is attained when its gradient, $2Q^TQp-2Q^Tt$, equals zero, which occurs when $p=(Q^TQ)^-1Q^Tt$. Thus the two approaches are equivalent.
    – Rahul
    Sep 7 at 9:20













up vote
0
down vote

favorite









up vote
0
down vote

favorite











Given the matrices $mathbft_Mtimes 1$ and $mathbfQ_Mtimes N$, we want to find $mathbfp_Ntimes 1$ that minimizes $epsilon = ||mathbft - mathbfQmathbfp||_2$.



In order to do so, we could use the gradient descent method. My question is, assuming that we get the best possible result with the gradient descent method, will it be equivalent to clearing $mathbfp$ from the equation $mathbft = mathbfQmathbfp$ as



$mathbfQ^T mathbft = mathbfQ^T mathbfQmathbfp$,



$(mathbfQ^T mathbfQ)^-1 mathbfQ^T mathbft = (mathbfQ^T mathbfQ)^-1 (mathbfQ^T mathbfQ) mathbfp$,



and so
$mathbfp = (mathbfQ^T mathbfQ)^-1 mathbfQ^T mathbft$?



I see that by premultiplying by $mathbfQ^T$ we are 'colapsing' our system of equations (M equations, N unknowns) to N equations with N unknowns, but I can't see how this will minimize $epsilon$ (in case it does).










share|cite|improve this question













Given the matrices $mathbft_Mtimes 1$ and $mathbfQ_Mtimes N$, we want to find $mathbfp_Ntimes 1$ that minimizes $epsilon = ||mathbft - mathbfQmathbfp||_2$.



In order to do so, we could use the gradient descent method. My question is, assuming that we get the best possible result with the gradient descent method, will it be equivalent to clearing $mathbfp$ from the equation $mathbft = mathbfQmathbfp$ as



$mathbfQ^T mathbft = mathbfQ^T mathbfQmathbfp$,



$(mathbfQ^T mathbfQ)^-1 mathbfQ^T mathbft = (mathbfQ^T mathbfQ)^-1 (mathbfQ^T mathbfQ) mathbfp$,



and so
$mathbfp = (mathbfQ^T mathbfQ)^-1 mathbfQ^T mathbft$?



I see that by premultiplying by $mathbfQ^T$ we are 'colapsing' our system of equations (M equations, N unknowns) to N equations with N unknowns, but I can't see how this will minimize $epsilon$ (in case it does).







linear-algebra gradient-descent






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Sep 7 at 9:12









Carlos Navarro Astiasarán

454




454











  • Minimizing the error is equivalent to minimizing the the squared error, which is $epsilon^2 = |t-Qp|^2 = t^Tt-2t^TQp+p^TQ^TQp$. The minimum is attained when its gradient, $2Q^TQp-2Q^Tt$, equals zero, which occurs when $p=(Q^TQ)^-1Q^Tt$. Thus the two approaches are equivalent.
    – Rahul
    Sep 7 at 9:20

















  • Minimizing the error is equivalent to minimizing the the squared error, which is $epsilon^2 = |t-Qp|^2 = t^Tt-2t^TQp+p^TQ^TQp$. The minimum is attained when its gradient, $2Q^TQp-2Q^Tt$, equals zero, which occurs when $p=(Q^TQ)^-1Q^Tt$. Thus the two approaches are equivalent.
    – Rahul
    Sep 7 at 9:20
















Minimizing the error is equivalent to minimizing the the squared error, which is $epsilon^2 = |t-Qp|^2 = t^Tt-2t^TQp+p^TQ^TQp$. The minimum is attained when its gradient, $2Q^TQp-2Q^Tt$, equals zero, which occurs when $p=(Q^TQ)^-1Q^Tt$. Thus the two approaches are equivalent.
– Rahul
Sep 7 at 9:20





Minimizing the error is equivalent to minimizing the the squared error, which is $epsilon^2 = |t-Qp|^2 = t^Tt-2t^TQp+p^TQ^TQp$. The minimum is attained when its gradient, $2Q^TQp-2Q^Tt$, equals zero, which occurs when $p=(Q^TQ)^-1Q^Tt$. Thus the two approaches are equivalent.
– Rahul
Sep 7 at 9:20
















active

oldest

votes











Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2908425%2fgradient-descent-vs-system-of-equations%23new-answer', 'question_page');

);

Post as a guest



































active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2908425%2fgradient-descent-vs-system-of-equations%23new-answer', 'question_page');

);

Post as a guest













































































這個網誌中的熱門文章

How to combine Bézier curves to a surface?

Mutual Information Always Non-negative

Why am i infinitely getting the same tweet with the Twitter Search API?