Expected squared error of ensemble (bagging)

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












I am working through the Deep Learning book chapter on regularization (https://www.deeplearningbook.org/contents/regularization.html#pf5).



On page 253 there is a derivation of the expected squared error of the ensemble predictor where the predictor is composed of $k$ regression models.



We are given that each model makes an error $epsilon_i$ on each example with errors drawn from a zero-mean multivariate normal distribution. We are also given that the variance is $E(epsilon_i^2)=v$ and the covariance is $E(epsilon_i epsilon_j)=c$.



Hence the average prediction of all of the ensemble models is $frac1ksum_i epsilon_i$.



My question regards the following claim that the expected squared prediction error of the ensemble is thus:



$beginalign
E((frac1ksum_i epsilon_i)^2)&=frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))\
&=frac1kv + frack-1kc.
endalign$



I am fine with the first line but how do we from $frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))$ to $frac1kv + frack-1kc$?



It seems like (assuming we have $N$ examples) the correct equality should be



$$frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))=fracNk^2v +fracN(N-1)k^2c.$$



I know that to make the equality match the equality given in the book my $N$s should be $k$s... what am I missing here? We are summing over examples not models.










share|cite|improve this question





















  • Where you write "the average prediction of all of the ensemble models", I think you meant "the average prediction error of all of the ensemble models"?
    – joriki
    Aug 31 at 6:22










  • You can get properly sized parentheses that adjust to their content by preceding them with left and right.
    – joriki
    Aug 31 at 6:23














up vote
0
down vote

favorite












I am working through the Deep Learning book chapter on regularization (https://www.deeplearningbook.org/contents/regularization.html#pf5).



On page 253 there is a derivation of the expected squared error of the ensemble predictor where the predictor is composed of $k$ regression models.



We are given that each model makes an error $epsilon_i$ on each example with errors drawn from a zero-mean multivariate normal distribution. We are also given that the variance is $E(epsilon_i^2)=v$ and the covariance is $E(epsilon_i epsilon_j)=c$.



Hence the average prediction of all of the ensemble models is $frac1ksum_i epsilon_i$.



My question regards the following claim that the expected squared prediction error of the ensemble is thus:



$beginalign
E((frac1ksum_i epsilon_i)^2)&=frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))\
&=frac1kv + frack-1kc.
endalign$



I am fine with the first line but how do we from $frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))$ to $frac1kv + frack-1kc$?



It seems like (assuming we have $N$ examples) the correct equality should be



$$frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))=fracNk^2v +fracN(N-1)k^2c.$$



I know that to make the equality match the equality given in the book my $N$s should be $k$s... what am I missing here? We are summing over examples not models.










share|cite|improve this question





















  • Where you write "the average prediction of all of the ensemble models", I think you meant "the average prediction error of all of the ensemble models"?
    – joriki
    Aug 31 at 6:22










  • You can get properly sized parentheses that adjust to their content by preceding them with left and right.
    – joriki
    Aug 31 at 6:23












up vote
0
down vote

favorite









up vote
0
down vote

favorite











I am working through the Deep Learning book chapter on regularization (https://www.deeplearningbook.org/contents/regularization.html#pf5).



On page 253 there is a derivation of the expected squared error of the ensemble predictor where the predictor is composed of $k$ regression models.



We are given that each model makes an error $epsilon_i$ on each example with errors drawn from a zero-mean multivariate normal distribution. We are also given that the variance is $E(epsilon_i^2)=v$ and the covariance is $E(epsilon_i epsilon_j)=c$.



Hence the average prediction of all of the ensemble models is $frac1ksum_i epsilon_i$.



My question regards the following claim that the expected squared prediction error of the ensemble is thus:



$beginalign
E((frac1ksum_i epsilon_i)^2)&=frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))\
&=frac1kv + frack-1kc.
endalign$



I am fine with the first line but how do we from $frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))$ to $frac1kv + frack-1kc$?



It seems like (assuming we have $N$ examples) the correct equality should be



$$frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))=fracNk^2v +fracN(N-1)k^2c.$$



I know that to make the equality match the equality given in the book my $N$s should be $k$s... what am I missing here? We are summing over examples not models.










share|cite|improve this question













I am working through the Deep Learning book chapter on regularization (https://www.deeplearningbook.org/contents/regularization.html#pf5).



On page 253 there is a derivation of the expected squared error of the ensemble predictor where the predictor is composed of $k$ regression models.



We are given that each model makes an error $epsilon_i$ on each example with errors drawn from a zero-mean multivariate normal distribution. We are also given that the variance is $E(epsilon_i^2)=v$ and the covariance is $E(epsilon_i epsilon_j)=c$.



Hence the average prediction of all of the ensemble models is $frac1ksum_i epsilon_i$.



My question regards the following claim that the expected squared prediction error of the ensemble is thus:



$beginalign
E((frac1ksum_i epsilon_i)^2)&=frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))\
&=frac1kv + frack-1kc.
endalign$



I am fine with the first line but how do we from $frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))$ to $frac1kv + frack-1kc$?



It seems like (assuming we have $N$ examples) the correct equality should be



$$frac1k^2E(sum_i(epsilon_i^2+sum_jneq i epsilon_i epsilon_j))=fracNk^2v +fracN(N-1)k^2c.$$



I know that to make the equality match the equality given in the book my $N$s should be $k$s... what am I missing here? We are summing over examples not models.







covariance variance expected-value






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Aug 31 at 4:25









ClownInTheMoon

1,119418




1,119418











  • Where you write "the average prediction of all of the ensemble models", I think you meant "the average prediction error of all of the ensemble models"?
    – joriki
    Aug 31 at 6:22










  • You can get properly sized parentheses that adjust to their content by preceding them with left and right.
    – joriki
    Aug 31 at 6:23
















  • Where you write "the average prediction of all of the ensemble models", I think you meant "the average prediction error of all of the ensemble models"?
    – joriki
    Aug 31 at 6:22










  • You can get properly sized parentheses that adjust to their content by preceding them with left and right.
    – joriki
    Aug 31 at 6:23















Where you write "the average prediction of all of the ensemble models", I think you meant "the average prediction error of all of the ensemble models"?
– joriki
Aug 31 at 6:22




Where you write "the average prediction of all of the ensemble models", I think you meant "the average prediction error of all of the ensemble models"?
– joriki
Aug 31 at 6:22












You can get properly sized parentheses that adjust to their content by preceding them with left and right.
– joriki
Aug 31 at 6:23




You can get properly sized parentheses that adjust to their content by preceding them with left and right.
– joriki
Aug 31 at 6:23










1 Answer
1






active

oldest

votes

















up vote
0
down vote













I think the problem stems from the very confusing formulation in the text that “each model makes an error $epsilon_i$ on each example”, which leaves it unclear whether the index $i$ refers to models or examples. As I interpret the text, this index does indeed refer to models, and the expectation is taken with respect to examples, which are not indexed but considered to be statistically distributed. So the sums over constants to lead to factors of $k$ and $k(k-1)$.






share|cite|improve this answer




















    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "69"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2900315%2fexpected-squared-error-of-ensemble-bagging%23new-answer', 'question_page');

    );

    Post as a guest






























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    I think the problem stems from the very confusing formulation in the text that “each model makes an error $epsilon_i$ on each example”, which leaves it unclear whether the index $i$ refers to models or examples. As I interpret the text, this index does indeed refer to models, and the expectation is taken with respect to examples, which are not indexed but considered to be statistically distributed. So the sums over constants to lead to factors of $k$ and $k(k-1)$.






    share|cite|improve this answer
























      up vote
      0
      down vote













      I think the problem stems from the very confusing formulation in the text that “each model makes an error $epsilon_i$ on each example”, which leaves it unclear whether the index $i$ refers to models or examples. As I interpret the text, this index does indeed refer to models, and the expectation is taken with respect to examples, which are not indexed but considered to be statistically distributed. So the sums over constants to lead to factors of $k$ and $k(k-1)$.






      share|cite|improve this answer






















        up vote
        0
        down vote










        up vote
        0
        down vote









        I think the problem stems from the very confusing formulation in the text that “each model makes an error $epsilon_i$ on each example”, which leaves it unclear whether the index $i$ refers to models or examples. As I interpret the text, this index does indeed refer to models, and the expectation is taken with respect to examples, which are not indexed but considered to be statistically distributed. So the sums over constants to lead to factors of $k$ and $k(k-1)$.






        share|cite|improve this answer












        I think the problem stems from the very confusing formulation in the text that “each model makes an error $epsilon_i$ on each example”, which leaves it unclear whether the index $i$ refers to models or examples. As I interpret the text, this index does indeed refer to models, and the expectation is taken with respect to examples, which are not indexed but considered to be statistically distributed. So the sums over constants to lead to factors of $k$ and $k(k-1)$.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered Aug 31 at 6:33









        joriki

        167k10180333




        167k10180333



























             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2900315%2fexpected-squared-error-of-ensemble-bagging%23new-answer', 'question_page');

            );

            Post as a guest













































































            這個網誌中的熱門文章

            How to combine Bézier curves to a surface?

            Mutual Information Always Non-negative

            Why am i infinitely getting the same tweet with the Twitter Search API?