Least squares approximation (matrices)
Clash Royale CLAN TAG#URR8PPP
up vote
3
down vote
favorite
IâÂÂm quite stuck on the question below. It keeps coming up in my exam and I donâÂÂt know how to do it! Please help me! Thanks!
Show that the matrix $P = A (A^tA)^-1 A^t$ represents an orthogonal projection onto $mathcalR(A)$. Hence or otherwise explain why $x^star = (A^t A)^-1 A^t b$ represents the least squares solution of the matrix equation $Ax=b$.
linear-algebra matrices
add a comment |Â
up vote
3
down vote
favorite
IâÂÂm quite stuck on the question below. It keeps coming up in my exam and I donâÂÂt know how to do it! Please help me! Thanks!
Show that the matrix $P = A (A^tA)^-1 A^t$ represents an orthogonal projection onto $mathcalR(A)$. Hence or otherwise explain why $x^star = (A^t A)^-1 A^t b$ represents the least squares solution of the matrix equation $Ax=b$.
linear-algebra matrices
So is the difficulty in showing it's an orthogonal projection, or in explaining why it's a least squares solution, or both?
â Gerry Myerson
May 20 '11 at 12:34
Did you look, e.g., here?
â cardinal
May 20 '11 at 12:40
Yep I had a look on wiki. The problem is that I donâÂÂt really understand the concept behind it. It just looks like a bunch of notation to me. IâÂÂm not really sure anymore what an orthogonal projection means - isnâÂÂt it idempotent and symmetric? Do you need to show the range's and nullspaces are equal etc. and for the X* bit do you just work out Ax* (using x* given) and then show its =b? Maybe itâÂÂs the way its worded thatâÂÂs confusing me... What exactly does a projection even do? All I know is P(v)=v!
â user4645
May 20 '11 at 13:41
add a comment |Â
up vote
3
down vote
favorite
up vote
3
down vote
favorite
IâÂÂm quite stuck on the question below. It keeps coming up in my exam and I donâÂÂt know how to do it! Please help me! Thanks!
Show that the matrix $P = A (A^tA)^-1 A^t$ represents an orthogonal projection onto $mathcalR(A)$. Hence or otherwise explain why $x^star = (A^t A)^-1 A^t b$ represents the least squares solution of the matrix equation $Ax=b$.
linear-algebra matrices
IâÂÂm quite stuck on the question below. It keeps coming up in my exam and I donâÂÂt know how to do it! Please help me! Thanks!
Show that the matrix $P = A (A^tA)^-1 A^t$ represents an orthogonal projection onto $mathcalR(A)$. Hence or otherwise explain why $x^star = (A^t A)^-1 A^t b$ represents the least squares solution of the matrix equation $Ax=b$.
linear-algebra matrices
edited Aug 24 at 6:04
Michael Hardy
205k23187463
205k23187463
asked May 20 '11 at 10:56
user4645
14317
14317
So is the difficulty in showing it's an orthogonal projection, or in explaining why it's a least squares solution, or both?
â Gerry Myerson
May 20 '11 at 12:34
Did you look, e.g., here?
â cardinal
May 20 '11 at 12:40
Yep I had a look on wiki. The problem is that I donâÂÂt really understand the concept behind it. It just looks like a bunch of notation to me. IâÂÂm not really sure anymore what an orthogonal projection means - isnâÂÂt it idempotent and symmetric? Do you need to show the range's and nullspaces are equal etc. and for the X* bit do you just work out Ax* (using x* given) and then show its =b? Maybe itâÂÂs the way its worded thatâÂÂs confusing me... What exactly does a projection even do? All I know is P(v)=v!
â user4645
May 20 '11 at 13:41
add a comment |Â
So is the difficulty in showing it's an orthogonal projection, or in explaining why it's a least squares solution, or both?
â Gerry Myerson
May 20 '11 at 12:34
Did you look, e.g., here?
â cardinal
May 20 '11 at 12:40
Yep I had a look on wiki. The problem is that I donâÂÂt really understand the concept behind it. It just looks like a bunch of notation to me. IâÂÂm not really sure anymore what an orthogonal projection means - isnâÂÂt it idempotent and symmetric? Do you need to show the range's and nullspaces are equal etc. and for the X* bit do you just work out Ax* (using x* given) and then show its =b? Maybe itâÂÂs the way its worded thatâÂÂs confusing me... What exactly does a projection even do? All I know is P(v)=v!
â user4645
May 20 '11 at 13:41
So is the difficulty in showing it's an orthogonal projection, or in explaining why it's a least squares solution, or both?
â Gerry Myerson
May 20 '11 at 12:34
So is the difficulty in showing it's an orthogonal projection, or in explaining why it's a least squares solution, or both?
â Gerry Myerson
May 20 '11 at 12:34
Did you look, e.g., here?
â cardinal
May 20 '11 at 12:40
Did you look, e.g., here?
â cardinal
May 20 '11 at 12:40
Yep I had a look on wiki. The problem is that I donâÂÂt really understand the concept behind it. It just looks like a bunch of notation to me. IâÂÂm not really sure anymore what an orthogonal projection means - isnâÂÂt it idempotent and symmetric? Do you need to show the range's and nullspaces are equal etc. and for the X* bit do you just work out Ax* (using x* given) and then show its =b? Maybe itâÂÂs the way its worded thatâÂÂs confusing me... What exactly does a projection even do? All I know is P(v)=v!
â user4645
May 20 '11 at 13:41
Yep I had a look on wiki. The problem is that I donâÂÂt really understand the concept behind it. It just looks like a bunch of notation to me. IâÂÂm not really sure anymore what an orthogonal projection means - isnâÂÂt it idempotent and symmetric? Do you need to show the range's and nullspaces are equal etc. and for the X* bit do you just work out Ax* (using x* given) and then show its =b? Maybe itâÂÂs the way its worded thatâÂÂs confusing me... What exactly does a projection even do? All I know is P(v)=v!
â user4645
May 20 '11 at 13:41
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
4
down vote
accepted
Well, okay, assuming $A$ is overdetermined ($m times n$ where $m > n$). The column of vectors of $A$ spans $mathcal R(A)$, assuming $A$ has full rank the column vectors form a basis.
The matrix $A^T A$ will be symmetric ($(A^TA)^T = A^T A$), and if $A$ has full
rank, invertible. If $A$ does not have full rank, it will not be invertible.
Now, naming the column vectors in $A$ $a_1, a_2, dots, a_n$ and $x = (x_1, x_2 dots, x_n)^T$ we can write $Ax$ as:
$$Ax = x_1 a_1 + x_2 a_2 + dots + x_n a_n = sum_i=1^n a_i x_i$$
and we want this to be equal to $b$. Assuming there is no such solution, we want to minimize
$$| b - Ax | = left| b - sum_i=1^n a_i x_i right|$$
To minimize this, we want to find the projection of $b$ onto $mathcal R(A)$.This projection can be expressed as $sum_i=1^n a_i x_i$, since the $a_i$ span $mathcal R(A)$. Why does this minimze the above equation? Well, simply because this is the closest we can get - we can't express elements not in $mathcal R(A)$ with elements in $mathcal R(A)$.
Now, assume $x_1, x_2, dots, x_n$ are the coefficients to $a_1, a_2, dots a_n$ such that $sum a_i x_i$ is the orthogonal projection of $b$ onto $mathcal R(A)$. Then $b$ can written as
$$b = Ax + y$$
for some $y$ which is orthogonal to $mathcal R(A)$. Thus $y = b - Ax$. Since $y$ is orthogonal to $mathcal R(A)$, it is orthogonal to every $a_i$:
$$
leftlangle a_i , underbraceb - Ax_=y rightrangle = 0
Longleftrightarrow
a_i^T(b-Ax)=0 ~~~~ forall i = 1, dots, n
$$
This can be written in matrix form as $A^T(b-Ax)=0$ taking the equations above for all $i$ at once, which can be rewritten as $A^Tb = A^TAx$, or if $A^TA$ is invertible, $x = (A^TA)^-1A^Tb$.
So, basically, to derive $x = (A^TA)^-1A^T b$, consider the condition that $b-Ax$ should be orthogonal to the column vectors of $a_i$ one at a time, then stack these equations "on top of each other" in a matrix equation.
A very detailed answer... thanks! Im going to work through this and get back to you if im still stuck! :)
â user4645
May 22 '11 at 13:13
Okay Ive gone through it and understood the steps. I was wondering though what part answered part (a) and what answered part (b) because this proof seems to answer both! :)
â user4645
May 23 '11 at 10:20
Yeah, the answers are kind of intertwined. Part b is mostly answered by the part after $| b - Ax | = left| b - sum_i=1^n a_i x_i right|$, which of course can be made more rigorous (decompose $b$ in a parts parallel and orthogonal, respectively, to $mathcal R(A)$). The full answer of part b follows after part a is shown to be true, which is done by what comes after.
â Calle
May 23 '11 at 10:59
Okay - understood! :)
â user4645
May 25 '11 at 10:36
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
accepted
Well, okay, assuming $A$ is overdetermined ($m times n$ where $m > n$). The column of vectors of $A$ spans $mathcal R(A)$, assuming $A$ has full rank the column vectors form a basis.
The matrix $A^T A$ will be symmetric ($(A^TA)^T = A^T A$), and if $A$ has full
rank, invertible. If $A$ does not have full rank, it will not be invertible.
Now, naming the column vectors in $A$ $a_1, a_2, dots, a_n$ and $x = (x_1, x_2 dots, x_n)^T$ we can write $Ax$ as:
$$Ax = x_1 a_1 + x_2 a_2 + dots + x_n a_n = sum_i=1^n a_i x_i$$
and we want this to be equal to $b$. Assuming there is no such solution, we want to minimize
$$| b - Ax | = left| b - sum_i=1^n a_i x_i right|$$
To minimize this, we want to find the projection of $b$ onto $mathcal R(A)$.This projection can be expressed as $sum_i=1^n a_i x_i$, since the $a_i$ span $mathcal R(A)$. Why does this minimze the above equation? Well, simply because this is the closest we can get - we can't express elements not in $mathcal R(A)$ with elements in $mathcal R(A)$.
Now, assume $x_1, x_2, dots, x_n$ are the coefficients to $a_1, a_2, dots a_n$ such that $sum a_i x_i$ is the orthogonal projection of $b$ onto $mathcal R(A)$. Then $b$ can written as
$$b = Ax + y$$
for some $y$ which is orthogonal to $mathcal R(A)$. Thus $y = b - Ax$. Since $y$ is orthogonal to $mathcal R(A)$, it is orthogonal to every $a_i$:
$$
leftlangle a_i , underbraceb - Ax_=y rightrangle = 0
Longleftrightarrow
a_i^T(b-Ax)=0 ~~~~ forall i = 1, dots, n
$$
This can be written in matrix form as $A^T(b-Ax)=0$ taking the equations above for all $i$ at once, which can be rewritten as $A^Tb = A^TAx$, or if $A^TA$ is invertible, $x = (A^TA)^-1A^Tb$.
So, basically, to derive $x = (A^TA)^-1A^T b$, consider the condition that $b-Ax$ should be orthogonal to the column vectors of $a_i$ one at a time, then stack these equations "on top of each other" in a matrix equation.
A very detailed answer... thanks! Im going to work through this and get back to you if im still stuck! :)
â user4645
May 22 '11 at 13:13
Okay Ive gone through it and understood the steps. I was wondering though what part answered part (a) and what answered part (b) because this proof seems to answer both! :)
â user4645
May 23 '11 at 10:20
Yeah, the answers are kind of intertwined. Part b is mostly answered by the part after $| b - Ax | = left| b - sum_i=1^n a_i x_i right|$, which of course can be made more rigorous (decompose $b$ in a parts parallel and orthogonal, respectively, to $mathcal R(A)$). The full answer of part b follows after part a is shown to be true, which is done by what comes after.
â Calle
May 23 '11 at 10:59
Okay - understood! :)
â user4645
May 25 '11 at 10:36
add a comment |Â
up vote
4
down vote
accepted
Well, okay, assuming $A$ is overdetermined ($m times n$ where $m > n$). The column of vectors of $A$ spans $mathcal R(A)$, assuming $A$ has full rank the column vectors form a basis.
The matrix $A^T A$ will be symmetric ($(A^TA)^T = A^T A$), and if $A$ has full
rank, invertible. If $A$ does not have full rank, it will not be invertible.
Now, naming the column vectors in $A$ $a_1, a_2, dots, a_n$ and $x = (x_1, x_2 dots, x_n)^T$ we can write $Ax$ as:
$$Ax = x_1 a_1 + x_2 a_2 + dots + x_n a_n = sum_i=1^n a_i x_i$$
and we want this to be equal to $b$. Assuming there is no such solution, we want to minimize
$$| b - Ax | = left| b - sum_i=1^n a_i x_i right|$$
To minimize this, we want to find the projection of $b$ onto $mathcal R(A)$.This projection can be expressed as $sum_i=1^n a_i x_i$, since the $a_i$ span $mathcal R(A)$. Why does this minimze the above equation? Well, simply because this is the closest we can get - we can't express elements not in $mathcal R(A)$ with elements in $mathcal R(A)$.
Now, assume $x_1, x_2, dots, x_n$ are the coefficients to $a_1, a_2, dots a_n$ such that $sum a_i x_i$ is the orthogonal projection of $b$ onto $mathcal R(A)$. Then $b$ can written as
$$b = Ax + y$$
for some $y$ which is orthogonal to $mathcal R(A)$. Thus $y = b - Ax$. Since $y$ is orthogonal to $mathcal R(A)$, it is orthogonal to every $a_i$:
$$
leftlangle a_i , underbraceb - Ax_=y rightrangle = 0
Longleftrightarrow
a_i^T(b-Ax)=0 ~~~~ forall i = 1, dots, n
$$
This can be written in matrix form as $A^T(b-Ax)=0$ taking the equations above for all $i$ at once, which can be rewritten as $A^Tb = A^TAx$, or if $A^TA$ is invertible, $x = (A^TA)^-1A^Tb$.
So, basically, to derive $x = (A^TA)^-1A^T b$, consider the condition that $b-Ax$ should be orthogonal to the column vectors of $a_i$ one at a time, then stack these equations "on top of each other" in a matrix equation.
A very detailed answer... thanks! Im going to work through this and get back to you if im still stuck! :)
â user4645
May 22 '11 at 13:13
Okay Ive gone through it and understood the steps. I was wondering though what part answered part (a) and what answered part (b) because this proof seems to answer both! :)
â user4645
May 23 '11 at 10:20
Yeah, the answers are kind of intertwined. Part b is mostly answered by the part after $| b - Ax | = left| b - sum_i=1^n a_i x_i right|$, which of course can be made more rigorous (decompose $b$ in a parts parallel and orthogonal, respectively, to $mathcal R(A)$). The full answer of part b follows after part a is shown to be true, which is done by what comes after.
â Calle
May 23 '11 at 10:59
Okay - understood! :)
â user4645
May 25 '11 at 10:36
add a comment |Â
up vote
4
down vote
accepted
up vote
4
down vote
accepted
Well, okay, assuming $A$ is overdetermined ($m times n$ where $m > n$). The column of vectors of $A$ spans $mathcal R(A)$, assuming $A$ has full rank the column vectors form a basis.
The matrix $A^T A$ will be symmetric ($(A^TA)^T = A^T A$), and if $A$ has full
rank, invertible. If $A$ does not have full rank, it will not be invertible.
Now, naming the column vectors in $A$ $a_1, a_2, dots, a_n$ and $x = (x_1, x_2 dots, x_n)^T$ we can write $Ax$ as:
$$Ax = x_1 a_1 + x_2 a_2 + dots + x_n a_n = sum_i=1^n a_i x_i$$
and we want this to be equal to $b$. Assuming there is no such solution, we want to minimize
$$| b - Ax | = left| b - sum_i=1^n a_i x_i right|$$
To minimize this, we want to find the projection of $b$ onto $mathcal R(A)$.This projection can be expressed as $sum_i=1^n a_i x_i$, since the $a_i$ span $mathcal R(A)$. Why does this minimze the above equation? Well, simply because this is the closest we can get - we can't express elements not in $mathcal R(A)$ with elements in $mathcal R(A)$.
Now, assume $x_1, x_2, dots, x_n$ are the coefficients to $a_1, a_2, dots a_n$ such that $sum a_i x_i$ is the orthogonal projection of $b$ onto $mathcal R(A)$. Then $b$ can written as
$$b = Ax + y$$
for some $y$ which is orthogonal to $mathcal R(A)$. Thus $y = b - Ax$. Since $y$ is orthogonal to $mathcal R(A)$, it is orthogonal to every $a_i$:
$$
leftlangle a_i , underbraceb - Ax_=y rightrangle = 0
Longleftrightarrow
a_i^T(b-Ax)=0 ~~~~ forall i = 1, dots, n
$$
This can be written in matrix form as $A^T(b-Ax)=0$ taking the equations above for all $i$ at once, which can be rewritten as $A^Tb = A^TAx$, or if $A^TA$ is invertible, $x = (A^TA)^-1A^Tb$.
So, basically, to derive $x = (A^TA)^-1A^T b$, consider the condition that $b-Ax$ should be orthogonal to the column vectors of $a_i$ one at a time, then stack these equations "on top of each other" in a matrix equation.
Well, okay, assuming $A$ is overdetermined ($m times n$ where $m > n$). The column of vectors of $A$ spans $mathcal R(A)$, assuming $A$ has full rank the column vectors form a basis.
The matrix $A^T A$ will be symmetric ($(A^TA)^T = A^T A$), and if $A$ has full
rank, invertible. If $A$ does not have full rank, it will not be invertible.
Now, naming the column vectors in $A$ $a_1, a_2, dots, a_n$ and $x = (x_1, x_2 dots, x_n)^T$ we can write $Ax$ as:
$$Ax = x_1 a_1 + x_2 a_2 + dots + x_n a_n = sum_i=1^n a_i x_i$$
and we want this to be equal to $b$. Assuming there is no such solution, we want to minimize
$$| b - Ax | = left| b - sum_i=1^n a_i x_i right|$$
To minimize this, we want to find the projection of $b$ onto $mathcal R(A)$.This projection can be expressed as $sum_i=1^n a_i x_i$, since the $a_i$ span $mathcal R(A)$. Why does this minimze the above equation? Well, simply because this is the closest we can get - we can't express elements not in $mathcal R(A)$ with elements in $mathcal R(A)$.
Now, assume $x_1, x_2, dots, x_n$ are the coefficients to $a_1, a_2, dots a_n$ such that $sum a_i x_i$ is the orthogonal projection of $b$ onto $mathcal R(A)$. Then $b$ can written as
$$b = Ax + y$$
for some $y$ which is orthogonal to $mathcal R(A)$. Thus $y = b - Ax$. Since $y$ is orthogonal to $mathcal R(A)$, it is orthogonal to every $a_i$:
$$
leftlangle a_i , underbraceb - Ax_=y rightrangle = 0
Longleftrightarrow
a_i^T(b-Ax)=0 ~~~~ forall i = 1, dots, n
$$
This can be written in matrix form as $A^T(b-Ax)=0$ taking the equations above for all $i$ at once, which can be rewritten as $A^Tb = A^TAx$, or if $A^TA$ is invertible, $x = (A^TA)^-1A^Tb$.
So, basically, to derive $x = (A^TA)^-1A^T b$, consider the condition that $b-Ax$ should be orthogonal to the column vectors of $a_i$ one at a time, then stack these equations "on top of each other" in a matrix equation.
edited May 20 '11 at 14:40
answered May 20 '11 at 14:33
Calle
6,22212242
6,22212242
A very detailed answer... thanks! Im going to work through this and get back to you if im still stuck! :)
â user4645
May 22 '11 at 13:13
Okay Ive gone through it and understood the steps. I was wondering though what part answered part (a) and what answered part (b) because this proof seems to answer both! :)
â user4645
May 23 '11 at 10:20
Yeah, the answers are kind of intertwined. Part b is mostly answered by the part after $| b - Ax | = left| b - sum_i=1^n a_i x_i right|$, which of course can be made more rigorous (decompose $b$ in a parts parallel and orthogonal, respectively, to $mathcal R(A)$). The full answer of part b follows after part a is shown to be true, which is done by what comes after.
â Calle
May 23 '11 at 10:59
Okay - understood! :)
â user4645
May 25 '11 at 10:36
add a comment |Â
A very detailed answer... thanks! Im going to work through this and get back to you if im still stuck! :)
â user4645
May 22 '11 at 13:13
Okay Ive gone through it and understood the steps. I was wondering though what part answered part (a) and what answered part (b) because this proof seems to answer both! :)
â user4645
May 23 '11 at 10:20
Yeah, the answers are kind of intertwined. Part b is mostly answered by the part after $| b - Ax | = left| b - sum_i=1^n a_i x_i right|$, which of course can be made more rigorous (decompose $b$ in a parts parallel and orthogonal, respectively, to $mathcal R(A)$). The full answer of part b follows after part a is shown to be true, which is done by what comes after.
â Calle
May 23 '11 at 10:59
Okay - understood! :)
â user4645
May 25 '11 at 10:36
A very detailed answer... thanks! Im going to work through this and get back to you if im still stuck! :)
â user4645
May 22 '11 at 13:13
A very detailed answer... thanks! Im going to work through this and get back to you if im still stuck! :)
â user4645
May 22 '11 at 13:13
Okay Ive gone through it and understood the steps. I was wondering though what part answered part (a) and what answered part (b) because this proof seems to answer both! :)
â user4645
May 23 '11 at 10:20
Okay Ive gone through it and understood the steps. I was wondering though what part answered part (a) and what answered part (b) because this proof seems to answer both! :)
â user4645
May 23 '11 at 10:20
Yeah, the answers are kind of intertwined. Part b is mostly answered by the part after $| b - Ax | = left| b - sum_i=1^n a_i x_i right|$, which of course can be made more rigorous (decompose $b$ in a parts parallel and orthogonal, respectively, to $mathcal R(A)$). The full answer of part b follows after part a is shown to be true, which is done by what comes after.
â Calle
May 23 '11 at 10:59
Yeah, the answers are kind of intertwined. Part b is mostly answered by the part after $| b - Ax | = left| b - sum_i=1^n a_i x_i right|$, which of course can be made more rigorous (decompose $b$ in a parts parallel and orthogonal, respectively, to $mathcal R(A)$). The full answer of part b follows after part a is shown to be true, which is done by what comes after.
â Calle
May 23 '11 at 10:59
Okay - understood! :)
â user4645
May 25 '11 at 10:36
Okay - understood! :)
â user4645
May 25 '11 at 10:36
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f40249%2fleast-squares-approximation-matrices%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
So is the difficulty in showing it's an orthogonal projection, or in explaining why it's a least squares solution, or both?
â Gerry Myerson
May 20 '11 at 12:34
Did you look, e.g., here?
â cardinal
May 20 '11 at 12:40
Yep I had a look on wiki. The problem is that I donâÂÂt really understand the concept behind it. It just looks like a bunch of notation to me. IâÂÂm not really sure anymore what an orthogonal projection means - isnâÂÂt it idempotent and symmetric? Do you need to show the range's and nullspaces are equal etc. and for the X* bit do you just work out Ax* (using x* given) and then show its =b? Maybe itâÂÂs the way its worded thatâÂÂs confusing me... What exactly does a projection even do? All I know is P(v)=v!
â user4645
May 20 '11 at 13:41