Where am I going wrong in solving $fracpartialpartial mathbf w(mathbf y - mathbf Xmathbf w)^T(mathbf y - mathbf X mathbf w) = 0$?
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I have the following equation which I wish to solve:
$$fracpartialpartial mathbf w(mathbf y - mathbf Xmathbf w)^T(mathbf y - mathbf X mathbf w) = 0$$
Here $mathbf y_n*1, mathbf X_n*2,mathbf w_2*1,$
My solution (done on paper because MathJax is a bit difficult for me to use):
Also, is my reasoning for step 4 correct?
multivariable-calculus vector-analysis matrix-calculus
add a comment |Â
up vote
0
down vote
favorite
I have the following equation which I wish to solve:
$$fracpartialpartial mathbf w(mathbf y - mathbf Xmathbf w)^T(mathbf y - mathbf X mathbf w) = 0$$
Here $mathbf y_n*1, mathbf X_n*2,mathbf w_2*1,$
My solution (done on paper because MathJax is a bit difficult for me to use):
Also, is my reasoning for step 4 correct?
multivariable-calculus vector-analysis matrix-calculus
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have the following equation which I wish to solve:
$$fracpartialpartial mathbf w(mathbf y - mathbf Xmathbf w)^T(mathbf y - mathbf X mathbf w) = 0$$
Here $mathbf y_n*1, mathbf X_n*2,mathbf w_2*1,$
My solution (done on paper because MathJax is a bit difficult for me to use):
Also, is my reasoning for step 4 correct?
multivariable-calculus vector-analysis matrix-calculus
I have the following equation which I wish to solve:
$$fracpartialpartial mathbf w(mathbf y - mathbf Xmathbf w)^T(mathbf y - mathbf X mathbf w) = 0$$
Here $mathbf y_n*1, mathbf X_n*2,mathbf w_2*1,$
My solution (done on paper because MathJax is a bit difficult for me to use):
Also, is my reasoning for step 4 correct?
multivariable-calculus vector-analysis matrix-calculus
edited Aug 16 at 8:02
asked Aug 16 at 7:45
rjmessibarca
225414
225414
add a comment |Â
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
1
down vote
accepted
Line $3$ to line $4$, note that
$$
fracpartialpartial w (y^TXw) = X^Ty,
$$
then you'll get the right answer
$$
hatw = (X^TX)^-1X^Ty.
$$
Explicit derivation:
Note that
$$
y^TXw = w_1sum_i=1^ny_i + w_2sum_i=1^ny_ix_1i+cdots+w_psum_i=1^ny_ix_pi,
$$
taking derivative w.r.t vector $w$, $w in mathbbR^p$, will result in a gradient, i.e., vector with $p$ rows and $1$ column, namely
$$
beginpmatrix
sum y_i \
sum y_i x_1i\
vdots \
sum y_i x_pi
endpmatrix,
$$
where the $j$th row is the derivative of $y^TXw$ w.r.t. $w_j$.
Now, as $X^T$ is $ptimes n$ and $y$ is $n times 1$, hence $X^Ty$ is $p times 1$ as required.
Can you please explain why? I thought it would be y'X. I can see from the order of the matrices that my solution is wrong and that yours is correct. But in general, how do I solve such problems involving matrix calculus. My current method is assuming each element of the matrix and then finding the partial derivative
â rjmessibarca
Aug 19 at 16:08
@rjmessibarca Please see the edited answer.
â V. Vancak
Aug 19 at 21:20
add a comment |Â
up vote
1
down vote
No your reasoning in step 4 is wrong. For example if $X$ is a square matrix, $mathbfX^T mathbfX$ will not be a scalar. Therefore your result is wrong. Do note that $$fracpartialpartial mathbfw left(mathbfw^T mathbfX^T mathbfX mathbfw right) = 2 mathbfX^T mathbfX mathbfw$$
I am sure that you can get to the right answer from here.
$X$ is an $n times 2$ matrix, so $X^TX$ is $2times2$.
â Jaap Scherphuis
Aug 16 at 8:19
Yes, so not a scalar.
â Jan
Aug 16 at 8:20
1
Yes, I'm just disagreeing with the "If X is a square matrix" bit in your answer. It does not have to be square, though $X^TX$ will be.
â Jaap Scherphuis
Aug 16 at 8:22
Ooh yeah I know about that, I was just giving an example to show why this was not true but I'll clarify.
â Jan
Aug 16 at 8:26
@Jan How did you get that result? BTW, I used the expression that you mentioned in step 5. But still answer is wrong.
â rjmessibarca
Aug 16 at 8:31
add a comment |Â
up vote
1
down vote
Let $M = mathbfy - mathbfX mathbfw$
and $f = M^T M = M : M$.
We will utilize the following the identities
- Trace and Frobenius product relation $$A:B=rm tr(A^TB)$$ or $$A^T:B=rm tr(AB)$$
- Cyclic property of Trace/Frobenius product $$eqalign
A:BC
&= AC^T:B cr
&= B^TA:C cr
&= text etc. cr
$$
Now, we obtain the differential first and thereafter we obtain the gradient.
So,
beginalign
df &= left( d M: M right) + left( M : dM right)\
&= 2M : dM \
&= 2M : left( - mathbfX d mathbfw right) \
&= - 2mathbfX^T M : d mathbfw hspace8mm textnote: utilized cyclic property of Frobenius product \
&= - 2mathbfX^T left( mathbfy - mathbfX mathbfw right) : d mathbfw .
endalign
Thus, the gradient reads
beginalign
fracpartialpartial mathbfw f
= - 2mathbfX^T left( mathbfy - mathbfX mathbfw right) .
endalign
Then you can set the gradient to $0$ and obtain your $$mathbfw = left( mathbfX^T mathbfX right) ^-1 mathbfX^T mathbfy$$
I did not ask for how to solve. I asked what is wrong with my solution.
â rjmessibarca
Aug 16 at 10:12
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
Line $3$ to line $4$, note that
$$
fracpartialpartial w (y^TXw) = X^Ty,
$$
then you'll get the right answer
$$
hatw = (X^TX)^-1X^Ty.
$$
Explicit derivation:
Note that
$$
y^TXw = w_1sum_i=1^ny_i + w_2sum_i=1^ny_ix_1i+cdots+w_psum_i=1^ny_ix_pi,
$$
taking derivative w.r.t vector $w$, $w in mathbbR^p$, will result in a gradient, i.e., vector with $p$ rows and $1$ column, namely
$$
beginpmatrix
sum y_i \
sum y_i x_1i\
vdots \
sum y_i x_pi
endpmatrix,
$$
where the $j$th row is the derivative of $y^TXw$ w.r.t. $w_j$.
Now, as $X^T$ is $ptimes n$ and $y$ is $n times 1$, hence $X^Ty$ is $p times 1$ as required.
Can you please explain why? I thought it would be y'X. I can see from the order of the matrices that my solution is wrong and that yours is correct. But in general, how do I solve such problems involving matrix calculus. My current method is assuming each element of the matrix and then finding the partial derivative
â rjmessibarca
Aug 19 at 16:08
@rjmessibarca Please see the edited answer.
â V. Vancak
Aug 19 at 21:20
add a comment |Â
up vote
1
down vote
accepted
Line $3$ to line $4$, note that
$$
fracpartialpartial w (y^TXw) = X^Ty,
$$
then you'll get the right answer
$$
hatw = (X^TX)^-1X^Ty.
$$
Explicit derivation:
Note that
$$
y^TXw = w_1sum_i=1^ny_i + w_2sum_i=1^ny_ix_1i+cdots+w_psum_i=1^ny_ix_pi,
$$
taking derivative w.r.t vector $w$, $w in mathbbR^p$, will result in a gradient, i.e., vector with $p$ rows and $1$ column, namely
$$
beginpmatrix
sum y_i \
sum y_i x_1i\
vdots \
sum y_i x_pi
endpmatrix,
$$
where the $j$th row is the derivative of $y^TXw$ w.r.t. $w_j$.
Now, as $X^T$ is $ptimes n$ and $y$ is $n times 1$, hence $X^Ty$ is $p times 1$ as required.
Can you please explain why? I thought it would be y'X. I can see from the order of the matrices that my solution is wrong and that yours is correct. But in general, how do I solve such problems involving matrix calculus. My current method is assuming each element of the matrix and then finding the partial derivative
â rjmessibarca
Aug 19 at 16:08
@rjmessibarca Please see the edited answer.
â V. Vancak
Aug 19 at 21:20
add a comment |Â
up vote
1
down vote
accepted
up vote
1
down vote
accepted
Line $3$ to line $4$, note that
$$
fracpartialpartial w (y^TXw) = X^Ty,
$$
then you'll get the right answer
$$
hatw = (X^TX)^-1X^Ty.
$$
Explicit derivation:
Note that
$$
y^TXw = w_1sum_i=1^ny_i + w_2sum_i=1^ny_ix_1i+cdots+w_psum_i=1^ny_ix_pi,
$$
taking derivative w.r.t vector $w$, $w in mathbbR^p$, will result in a gradient, i.e., vector with $p$ rows and $1$ column, namely
$$
beginpmatrix
sum y_i \
sum y_i x_1i\
vdots \
sum y_i x_pi
endpmatrix,
$$
where the $j$th row is the derivative of $y^TXw$ w.r.t. $w_j$.
Now, as $X^T$ is $ptimes n$ and $y$ is $n times 1$, hence $X^Ty$ is $p times 1$ as required.
Line $3$ to line $4$, note that
$$
fracpartialpartial w (y^TXw) = X^Ty,
$$
then you'll get the right answer
$$
hatw = (X^TX)^-1X^Ty.
$$
Explicit derivation:
Note that
$$
y^TXw = w_1sum_i=1^ny_i + w_2sum_i=1^ny_ix_1i+cdots+w_psum_i=1^ny_ix_pi,
$$
taking derivative w.r.t vector $w$, $w in mathbbR^p$, will result in a gradient, i.e., vector with $p$ rows and $1$ column, namely
$$
beginpmatrix
sum y_i \
sum y_i x_1i\
vdots \
sum y_i x_pi
endpmatrix,
$$
where the $j$th row is the derivative of $y^TXw$ w.r.t. $w_j$.
Now, as $X^T$ is $ptimes n$ and $y$ is $n times 1$, hence $X^Ty$ is $p times 1$ as required.
edited Aug 19 at 21:19
answered Aug 18 at 14:48
V. Vancak
9,9502926
9,9502926
Can you please explain why? I thought it would be y'X. I can see from the order of the matrices that my solution is wrong and that yours is correct. But in general, how do I solve such problems involving matrix calculus. My current method is assuming each element of the matrix and then finding the partial derivative
â rjmessibarca
Aug 19 at 16:08
@rjmessibarca Please see the edited answer.
â V. Vancak
Aug 19 at 21:20
add a comment |Â
Can you please explain why? I thought it would be y'X. I can see from the order of the matrices that my solution is wrong and that yours is correct. But in general, how do I solve such problems involving matrix calculus. My current method is assuming each element of the matrix and then finding the partial derivative
â rjmessibarca
Aug 19 at 16:08
@rjmessibarca Please see the edited answer.
â V. Vancak
Aug 19 at 21:20
Can you please explain why? I thought it would be y'X. I can see from the order of the matrices that my solution is wrong and that yours is correct. But in general, how do I solve such problems involving matrix calculus. My current method is assuming each element of the matrix and then finding the partial derivative
â rjmessibarca
Aug 19 at 16:08
Can you please explain why? I thought it would be y'X. I can see from the order of the matrices that my solution is wrong and that yours is correct. But in general, how do I solve such problems involving matrix calculus. My current method is assuming each element of the matrix and then finding the partial derivative
â rjmessibarca
Aug 19 at 16:08
@rjmessibarca Please see the edited answer.
â V. Vancak
Aug 19 at 21:20
@rjmessibarca Please see the edited answer.
â V. Vancak
Aug 19 at 21:20
add a comment |Â
up vote
1
down vote
No your reasoning in step 4 is wrong. For example if $X$ is a square matrix, $mathbfX^T mathbfX$ will not be a scalar. Therefore your result is wrong. Do note that $$fracpartialpartial mathbfw left(mathbfw^T mathbfX^T mathbfX mathbfw right) = 2 mathbfX^T mathbfX mathbfw$$
I am sure that you can get to the right answer from here.
$X$ is an $n times 2$ matrix, so $X^TX$ is $2times2$.
â Jaap Scherphuis
Aug 16 at 8:19
Yes, so not a scalar.
â Jan
Aug 16 at 8:20
1
Yes, I'm just disagreeing with the "If X is a square matrix" bit in your answer. It does not have to be square, though $X^TX$ will be.
â Jaap Scherphuis
Aug 16 at 8:22
Ooh yeah I know about that, I was just giving an example to show why this was not true but I'll clarify.
â Jan
Aug 16 at 8:26
@Jan How did you get that result? BTW, I used the expression that you mentioned in step 5. But still answer is wrong.
â rjmessibarca
Aug 16 at 8:31
add a comment |Â
up vote
1
down vote
No your reasoning in step 4 is wrong. For example if $X$ is a square matrix, $mathbfX^T mathbfX$ will not be a scalar. Therefore your result is wrong. Do note that $$fracpartialpartial mathbfw left(mathbfw^T mathbfX^T mathbfX mathbfw right) = 2 mathbfX^T mathbfX mathbfw$$
I am sure that you can get to the right answer from here.
$X$ is an $n times 2$ matrix, so $X^TX$ is $2times2$.
â Jaap Scherphuis
Aug 16 at 8:19
Yes, so not a scalar.
â Jan
Aug 16 at 8:20
1
Yes, I'm just disagreeing with the "If X is a square matrix" bit in your answer. It does not have to be square, though $X^TX$ will be.
â Jaap Scherphuis
Aug 16 at 8:22
Ooh yeah I know about that, I was just giving an example to show why this was not true but I'll clarify.
â Jan
Aug 16 at 8:26
@Jan How did you get that result? BTW, I used the expression that you mentioned in step 5. But still answer is wrong.
â rjmessibarca
Aug 16 at 8:31
add a comment |Â
up vote
1
down vote
up vote
1
down vote
No your reasoning in step 4 is wrong. For example if $X$ is a square matrix, $mathbfX^T mathbfX$ will not be a scalar. Therefore your result is wrong. Do note that $$fracpartialpartial mathbfw left(mathbfw^T mathbfX^T mathbfX mathbfw right) = 2 mathbfX^T mathbfX mathbfw$$
I am sure that you can get to the right answer from here.
No your reasoning in step 4 is wrong. For example if $X$ is a square matrix, $mathbfX^T mathbfX$ will not be a scalar. Therefore your result is wrong. Do note that $$fracpartialpartial mathbfw left(mathbfw^T mathbfX^T mathbfX mathbfw right) = 2 mathbfX^T mathbfX mathbfw$$
I am sure that you can get to the right answer from here.
edited Aug 16 at 8:27
answered Aug 16 at 8:17
Jan
559416
559416
$X$ is an $n times 2$ matrix, so $X^TX$ is $2times2$.
â Jaap Scherphuis
Aug 16 at 8:19
Yes, so not a scalar.
â Jan
Aug 16 at 8:20
1
Yes, I'm just disagreeing with the "If X is a square matrix" bit in your answer. It does not have to be square, though $X^TX$ will be.
â Jaap Scherphuis
Aug 16 at 8:22
Ooh yeah I know about that, I was just giving an example to show why this was not true but I'll clarify.
â Jan
Aug 16 at 8:26
@Jan How did you get that result? BTW, I used the expression that you mentioned in step 5. But still answer is wrong.
â rjmessibarca
Aug 16 at 8:31
add a comment |Â
$X$ is an $n times 2$ matrix, so $X^TX$ is $2times2$.
â Jaap Scherphuis
Aug 16 at 8:19
Yes, so not a scalar.
â Jan
Aug 16 at 8:20
1
Yes, I'm just disagreeing with the "If X is a square matrix" bit in your answer. It does not have to be square, though $X^TX$ will be.
â Jaap Scherphuis
Aug 16 at 8:22
Ooh yeah I know about that, I was just giving an example to show why this was not true but I'll clarify.
â Jan
Aug 16 at 8:26
@Jan How did you get that result? BTW, I used the expression that you mentioned in step 5. But still answer is wrong.
â rjmessibarca
Aug 16 at 8:31
$X$ is an $n times 2$ matrix, so $X^TX$ is $2times2$.
â Jaap Scherphuis
Aug 16 at 8:19
$X$ is an $n times 2$ matrix, so $X^TX$ is $2times2$.
â Jaap Scherphuis
Aug 16 at 8:19
Yes, so not a scalar.
â Jan
Aug 16 at 8:20
Yes, so not a scalar.
â Jan
Aug 16 at 8:20
1
1
Yes, I'm just disagreeing with the "If X is a square matrix" bit in your answer. It does not have to be square, though $X^TX$ will be.
â Jaap Scherphuis
Aug 16 at 8:22
Yes, I'm just disagreeing with the "If X is a square matrix" bit in your answer. It does not have to be square, though $X^TX$ will be.
â Jaap Scherphuis
Aug 16 at 8:22
Ooh yeah I know about that, I was just giving an example to show why this was not true but I'll clarify.
â Jan
Aug 16 at 8:26
Ooh yeah I know about that, I was just giving an example to show why this was not true but I'll clarify.
â Jan
Aug 16 at 8:26
@Jan How did you get that result? BTW, I used the expression that you mentioned in step 5. But still answer is wrong.
â rjmessibarca
Aug 16 at 8:31
@Jan How did you get that result? BTW, I used the expression that you mentioned in step 5. But still answer is wrong.
â rjmessibarca
Aug 16 at 8:31
add a comment |Â
up vote
1
down vote
Let $M = mathbfy - mathbfX mathbfw$
and $f = M^T M = M : M$.
We will utilize the following the identities
- Trace and Frobenius product relation $$A:B=rm tr(A^TB)$$ or $$A^T:B=rm tr(AB)$$
- Cyclic property of Trace/Frobenius product $$eqalign
A:BC
&= AC^T:B cr
&= B^TA:C cr
&= text etc. cr
$$
Now, we obtain the differential first and thereafter we obtain the gradient.
So,
beginalign
df &= left( d M: M right) + left( M : dM right)\
&= 2M : dM \
&= 2M : left( - mathbfX d mathbfw right) \
&= - 2mathbfX^T M : d mathbfw hspace8mm textnote: utilized cyclic property of Frobenius product \
&= - 2mathbfX^T left( mathbfy - mathbfX mathbfw right) : d mathbfw .
endalign
Thus, the gradient reads
beginalign
fracpartialpartial mathbfw f
= - 2mathbfX^T left( mathbfy - mathbfX mathbfw right) .
endalign
Then you can set the gradient to $0$ and obtain your $$mathbfw = left( mathbfX^T mathbfX right) ^-1 mathbfX^T mathbfy$$
I did not ask for how to solve. I asked what is wrong with my solution.
â rjmessibarca
Aug 16 at 10:12
add a comment |Â
up vote
1
down vote
Let $M = mathbfy - mathbfX mathbfw$
and $f = M^T M = M : M$.
We will utilize the following the identities
- Trace and Frobenius product relation $$A:B=rm tr(A^TB)$$ or $$A^T:B=rm tr(AB)$$
- Cyclic property of Trace/Frobenius product $$eqalign
A:BC
&= AC^T:B cr
&= B^TA:C cr
&= text etc. cr
$$
Now, we obtain the differential first and thereafter we obtain the gradient.
So,
beginalign
df &= left( d M: M right) + left( M : dM right)\
&= 2M : dM \
&= 2M : left( - mathbfX d mathbfw right) \
&= - 2mathbfX^T M : d mathbfw hspace8mm textnote: utilized cyclic property of Frobenius product \
&= - 2mathbfX^T left( mathbfy - mathbfX mathbfw right) : d mathbfw .
endalign
Thus, the gradient reads
beginalign
fracpartialpartial mathbfw f
= - 2mathbfX^T left( mathbfy - mathbfX mathbfw right) .
endalign
Then you can set the gradient to $0$ and obtain your $$mathbfw = left( mathbfX^T mathbfX right) ^-1 mathbfX^T mathbfy$$
I did not ask for how to solve. I asked what is wrong with my solution.
â rjmessibarca
Aug 16 at 10:12
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Let $M = mathbfy - mathbfX mathbfw$
and $f = M^T M = M : M$.
We will utilize the following the identities
- Trace and Frobenius product relation $$A:B=rm tr(A^TB)$$ or $$A^T:B=rm tr(AB)$$
- Cyclic property of Trace/Frobenius product $$eqalign
A:BC
&= AC^T:B cr
&= B^TA:C cr
&= text etc. cr
$$
Now, we obtain the differential first and thereafter we obtain the gradient.
So,
beginalign
df &= left( d M: M right) + left( M : dM right)\
&= 2M : dM \
&= 2M : left( - mathbfX d mathbfw right) \
&= - 2mathbfX^T M : d mathbfw hspace8mm textnote: utilized cyclic property of Frobenius product \
&= - 2mathbfX^T left( mathbfy - mathbfX mathbfw right) : d mathbfw .
endalign
Thus, the gradient reads
beginalign
fracpartialpartial mathbfw f
= - 2mathbfX^T left( mathbfy - mathbfX mathbfw right) .
endalign
Then you can set the gradient to $0$ and obtain your $$mathbfw = left( mathbfX^T mathbfX right) ^-1 mathbfX^T mathbfy$$
Let $M = mathbfy - mathbfX mathbfw$
and $f = M^T M = M : M$.
We will utilize the following the identities
- Trace and Frobenius product relation $$A:B=rm tr(A^TB)$$ or $$A^T:B=rm tr(AB)$$
- Cyclic property of Trace/Frobenius product $$eqalign
A:BC
&= AC^T:B cr
&= B^TA:C cr
&= text etc. cr
$$
Now, we obtain the differential first and thereafter we obtain the gradient.
So,
beginalign
df &= left( d M: M right) + left( M : dM right)\
&= 2M : dM \
&= 2M : left( - mathbfX d mathbfw right) \
&= - 2mathbfX^T M : d mathbfw hspace8mm textnote: utilized cyclic property of Frobenius product \
&= - 2mathbfX^T left( mathbfy - mathbfX mathbfw right) : d mathbfw .
endalign
Thus, the gradient reads
beginalign
fracpartialpartial mathbfw f
= - 2mathbfX^T left( mathbfy - mathbfX mathbfw right) .
endalign
Then you can set the gradient to $0$ and obtain your $$mathbfw = left( mathbfX^T mathbfX right) ^-1 mathbfX^T mathbfy$$
edited Aug 16 at 9:01
answered Aug 16 at 8:29
user550103
549213
549213
I did not ask for how to solve. I asked what is wrong with my solution.
â rjmessibarca
Aug 16 at 10:12
add a comment |Â
I did not ask for how to solve. I asked what is wrong with my solution.
â rjmessibarca
Aug 16 at 10:12
I did not ask for how to solve. I asked what is wrong with my solution.
â rjmessibarca
Aug 16 at 10:12
I did not ask for how to solve. I asked what is wrong with my solution.
â rjmessibarca
Aug 16 at 10:12
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2884513%2fwhere-am-i-going-wrong-in-solving-frac-partial-partial-mathbf-w-mathbf-y%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password