Expected squared error of ensemble (bagging)
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I am working through the Deep Learning book chapter on regularization (https://www.deeplearningbook.org/contents/regularization.html#pf5).
On page 253 there is a derivation of the expected squared error of the ensemble predictor where the predictor is composed of $k$ regression models.
We are given that each model makes an error $epsilon_i$ on each example with errors drawn from a zero-mean multivariate normal distribution. We are also given that the variance is $E(epsilon_i^2)=v$ and the covariance is $E(epsilon_i epsilon_j)=c$.
Hence the average prediction of all of the ensemble models is $frac1ksum_i epsilon_i$.
My question regards the following claim that the expected squared prediction error of the ensemble is thus:
$beginalign
E((frac1ksum_i epsilon_i)^2)&=frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))\
&=frac1kv + frack-1kc.
endalign$
I am fine with the first line but how do we from $frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))$ to $frac1kv + frack-1kc$?
It seems like (assuming we have $N$ examples) the correct equality should be
$$frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))=fracNk^2v +fracN(N-1)k^2c.$$
I know that to make the equality match the equality given in the book my $N$s should be $k$s... what am I missing here? We are summing over examples not models.
covariance variance expected-value
add a comment |Â
up vote
0
down vote
favorite
I am working through the Deep Learning book chapter on regularization (https://www.deeplearningbook.org/contents/regularization.html#pf5).
On page 253 there is a derivation of the expected squared error of the ensemble predictor where the predictor is composed of $k$ regression models.
We are given that each model makes an error $epsilon_i$ on each example with errors drawn from a zero-mean multivariate normal distribution. We are also given that the variance is $E(epsilon_i^2)=v$ and the covariance is $E(epsilon_i epsilon_j)=c$.
Hence the average prediction of all of the ensemble models is $frac1ksum_i epsilon_i$.
My question regards the following claim that the expected squared prediction error of the ensemble is thus:
$beginalign
E((frac1ksum_i epsilon_i)^2)&=frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))\
&=frac1kv + frack-1kc.
endalign$
I am fine with the first line but how do we from $frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))$ to $frac1kv + frack-1kc$?
It seems like (assuming we have $N$ examples) the correct equality should be
$$frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))=fracNk^2v +fracN(N-1)k^2c.$$
I know that to make the equality match the equality given in the book my $N$s should be $k$s... what am I missing here? We are summing over examples not models.
covariance variance expected-value
Where you write "the average prediction of all of the ensemble models", I think you meant "the average prediction error of all of the ensemble models"?
â joriki
Aug 31 at 6:22
You can get properly sized parentheses that adjust to their content by preceding them withleft
andright
.
â joriki
Aug 31 at 6:23
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am working through the Deep Learning book chapter on regularization (https://www.deeplearningbook.org/contents/regularization.html#pf5).
On page 253 there is a derivation of the expected squared error of the ensemble predictor where the predictor is composed of $k$ regression models.
We are given that each model makes an error $epsilon_i$ on each example with errors drawn from a zero-mean multivariate normal distribution. We are also given that the variance is $E(epsilon_i^2)=v$ and the covariance is $E(epsilon_i epsilon_j)=c$.
Hence the average prediction of all of the ensemble models is $frac1ksum_i epsilon_i$.
My question regards the following claim that the expected squared prediction error of the ensemble is thus:
$beginalign
E((frac1ksum_i epsilon_i)^2)&=frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))\
&=frac1kv + frack-1kc.
endalign$
I am fine with the first line but how do we from $frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))$ to $frac1kv + frack-1kc$?
It seems like (assuming we have $N$ examples) the correct equality should be
$$frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))=fracNk^2v +fracN(N-1)k^2c.$$
I know that to make the equality match the equality given in the book my $N$s should be $k$s... what am I missing here? We are summing over examples not models.
covariance variance expected-value
I am working through the Deep Learning book chapter on regularization (https://www.deeplearningbook.org/contents/regularization.html#pf5).
On page 253 there is a derivation of the expected squared error of the ensemble predictor where the predictor is composed of $k$ regression models.
We are given that each model makes an error $epsilon_i$ on each example with errors drawn from a zero-mean multivariate normal distribution. We are also given that the variance is $E(epsilon_i^2)=v$ and the covariance is $E(epsilon_i epsilon_j)=c$.
Hence the average prediction of all of the ensemble models is $frac1ksum_i epsilon_i$.
My question regards the following claim that the expected squared prediction error of the ensemble is thus:
$beginalign
E((frac1ksum_i epsilon_i)^2)&=frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))\
&=frac1kv + frack-1kc.
endalign$
I am fine with the first line but how do we from $frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))$ to $frac1kv + frack-1kc$?
It seems like (assuming we have $N$ examples) the correct equality should be
$$frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))=fracNk^2v +fracN(N-1)k^2c.$$
I know that to make the equality match the equality given in the book my $N$s should be $k$s... what am I missing here? We are summing over examples not models.
covariance variance expected-value
covariance variance expected-value
asked Aug 31 at 4:25
ClownInTheMoon
1,119418
1,119418
Where you write "the average prediction of all of the ensemble models", I think you meant "the average prediction error of all of the ensemble models"?
â joriki
Aug 31 at 6:22
You can get properly sized parentheses that adjust to their content by preceding them withleft
andright
.
â joriki
Aug 31 at 6:23
add a comment |Â
Where you write "the average prediction of all of the ensemble models", I think you meant "the average prediction error of all of the ensemble models"?
â joriki
Aug 31 at 6:22
You can get properly sized parentheses that adjust to their content by preceding them withleft
andright
.
â joriki
Aug 31 at 6:23
Where you write "the average prediction of all of the ensemble models", I think you meant "the average prediction error of all of the ensemble models"?
â joriki
Aug 31 at 6:22
Where you write "the average prediction of all of the ensemble models", I think you meant "the average prediction error of all of the ensemble models"?
â joriki
Aug 31 at 6:22
You can get properly sized parentheses that adjust to their content by preceding them with
left
and right
.â joriki
Aug 31 at 6:23
You can get properly sized parentheses that adjust to their content by preceding them with
left
and right
.â joriki
Aug 31 at 6:23
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
0
down vote
I think the problem stems from the very confusing formulation in the text that âeach model makes an error $epsilon_i$ on each exampleâ, which leaves it unclear whether the index $i$ refers to models or examples. As I interpret the text, this index does indeed refer to models, and the expectation is taken with respect to examples, which are not indexed but considered to be statistically distributed. So the sums over constants to lead to factors of $k$ and $k(k-1)$.
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
I think the problem stems from the very confusing formulation in the text that âeach model makes an error $epsilon_i$ on each exampleâ, which leaves it unclear whether the index $i$ refers to models or examples. As I interpret the text, this index does indeed refer to models, and the expectation is taken with respect to examples, which are not indexed but considered to be statistically distributed. So the sums over constants to lead to factors of $k$ and $k(k-1)$.
add a comment |Â
up vote
0
down vote
I think the problem stems from the very confusing formulation in the text that âeach model makes an error $epsilon_i$ on each exampleâ, which leaves it unclear whether the index $i$ refers to models or examples. As I interpret the text, this index does indeed refer to models, and the expectation is taken with respect to examples, which are not indexed but considered to be statistically distributed. So the sums over constants to lead to factors of $k$ and $k(k-1)$.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
I think the problem stems from the very confusing formulation in the text that âeach model makes an error $epsilon_i$ on each exampleâ, which leaves it unclear whether the index $i$ refers to models or examples. As I interpret the text, this index does indeed refer to models, and the expectation is taken with respect to examples, which are not indexed but considered to be statistically distributed. So the sums over constants to lead to factors of $k$ and $k(k-1)$.
I think the problem stems from the very confusing formulation in the text that âeach model makes an error $epsilon_i$ on each exampleâ, which leaves it unclear whether the index $i$ refers to models or examples. As I interpret the text, this index does indeed refer to models, and the expectation is taken with respect to examples, which are not indexed but considered to be statistically distributed. So the sums over constants to lead to factors of $k$ and $k(k-1)$.
answered Aug 31 at 6:33
joriki
167k10180333
167k10180333
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2900315%2fexpected-squared-error-of-ensemble-bagging%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Where you write "the average prediction of all of the ensemble models", I think you meant "the average prediction error of all of the ensemble models"?
â joriki
Aug 31 at 6:22
You can get properly sized parentheses that adjust to their content by preceding them with
left
andright
.â joriki
Aug 31 at 6:23