What is Expected Prediction Error (EPE) a function of?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
2
down vote

favorite
1












Context



I am self-studying Elements of Statistical Learning (2nd ed), by Friedman, Hastie & Tibshirani. I have a question with regards to what the EPE, as defined in this text, is a function of. Namely, on page 10 (equation 2.9), EPE is defined as:



$textEPE(f) = textE_X, Y(Y - f(X))^2$



This implies that EPE is a function of the learner, $f$, and that it is an expectation over the joint distribution of $X$ and $Y$.



I accept and understand this definition, since it distinguishes EPE from mean squared error (MSE), which is defined for some $x_0 in Omega_X$ as:



$textMSE(x_0) = textE_mathcalT(y_0 - hatf(x_0))^2$



Where $mathcalT$ represents the training set, and $hatf$ represents the corresponding learner. Note that the above equation assumes the relationship between $X$ and $Y$ is deterministic -- $y_0$ is assumed to be a constant.



Question



Where my confusion arises is in the use of EPE on page 18 (equation 2.27). The context of its use is this: the relationship between $Y$ (the dependent variable) and $X$ (the independent variable) is assumed to be linear in $X$:



$Y = X^Tbeta + epsilon$, where $epsilon sim mathcalN(0, sigma^2)$ independently of $X$.



Then, for some arbitrary test point $x_0$, equation 2.27 is stated in the following manner:



$textEPE(x_0) = textE_y_0textE_mathcalT(y_0 - haty_0)^2$, where $mathcalT$ is the training set.



My issue with the above is that I cannot see how the use of EPE in equation 2.27 is equivalent to that of equation 2.9; is there a way to show that they are? Or is the latter notation incorrect, and EPE is only a function of the learner?



Related question



I have just taken a look at the "related questions" after posting this one, and have found the following question. It appears to express a similar sentiment as mine. The accepted answer, as I understand it, seemingly implies the usage of EPE in equation 2.27 is inconsistent. If this is the case, and the two usages cannot be consolidated, I suppose this question can be taken down as a duplicate (? New here.).







share|cite|improve this question


















  • 1




    From what I remember of this topic, it is correct to say that the two uses of $textEPE$ are not the same. For the record, I've given up on trying to understand the notation in Elements of Statistical Learning. Inconsistencies galore.
    – Clarinetist
    Aug 20 at 14:00














up vote
2
down vote

favorite
1












Context



I am self-studying Elements of Statistical Learning (2nd ed), by Friedman, Hastie & Tibshirani. I have a question with regards to what the EPE, as defined in this text, is a function of. Namely, on page 10 (equation 2.9), EPE is defined as:



$textEPE(f) = textE_X, Y(Y - f(X))^2$



This implies that EPE is a function of the learner, $f$, and that it is an expectation over the joint distribution of $X$ and $Y$.



I accept and understand this definition, since it distinguishes EPE from mean squared error (MSE), which is defined for some $x_0 in Omega_X$ as:



$textMSE(x_0) = textE_mathcalT(y_0 - hatf(x_0))^2$



Where $mathcalT$ represents the training set, and $hatf$ represents the corresponding learner. Note that the above equation assumes the relationship between $X$ and $Y$ is deterministic -- $y_0$ is assumed to be a constant.



Question



Where my confusion arises is in the use of EPE on page 18 (equation 2.27). The context of its use is this: the relationship between $Y$ (the dependent variable) and $X$ (the independent variable) is assumed to be linear in $X$:



$Y = X^Tbeta + epsilon$, where $epsilon sim mathcalN(0, sigma^2)$ independently of $X$.



Then, for some arbitrary test point $x_0$, equation 2.27 is stated in the following manner:



$textEPE(x_0) = textE_y_0textE_mathcalT(y_0 - haty_0)^2$, where $mathcalT$ is the training set.



My issue with the above is that I cannot see how the use of EPE in equation 2.27 is equivalent to that of equation 2.9; is there a way to show that they are? Or is the latter notation incorrect, and EPE is only a function of the learner?



Related question



I have just taken a look at the "related questions" after posting this one, and have found the following question. It appears to express a similar sentiment as mine. The accepted answer, as I understand it, seemingly implies the usage of EPE in equation 2.27 is inconsistent. If this is the case, and the two usages cannot be consolidated, I suppose this question can be taken down as a duplicate (? New here.).







share|cite|improve this question


















  • 1




    From what I remember of this topic, it is correct to say that the two uses of $textEPE$ are not the same. For the record, I've given up on trying to understand the notation in Elements of Statistical Learning. Inconsistencies galore.
    – Clarinetist
    Aug 20 at 14:00












up vote
2
down vote

favorite
1









up vote
2
down vote

favorite
1






1





Context



I am self-studying Elements of Statistical Learning (2nd ed), by Friedman, Hastie & Tibshirani. I have a question with regards to what the EPE, as defined in this text, is a function of. Namely, on page 10 (equation 2.9), EPE is defined as:



$textEPE(f) = textE_X, Y(Y - f(X))^2$



This implies that EPE is a function of the learner, $f$, and that it is an expectation over the joint distribution of $X$ and $Y$.



I accept and understand this definition, since it distinguishes EPE from mean squared error (MSE), which is defined for some $x_0 in Omega_X$ as:



$textMSE(x_0) = textE_mathcalT(y_0 - hatf(x_0))^2$



Where $mathcalT$ represents the training set, and $hatf$ represents the corresponding learner. Note that the above equation assumes the relationship between $X$ and $Y$ is deterministic -- $y_0$ is assumed to be a constant.



Question



Where my confusion arises is in the use of EPE on page 18 (equation 2.27). The context of its use is this: the relationship between $Y$ (the dependent variable) and $X$ (the independent variable) is assumed to be linear in $X$:



$Y = X^Tbeta + epsilon$, where $epsilon sim mathcalN(0, sigma^2)$ independently of $X$.



Then, for some arbitrary test point $x_0$, equation 2.27 is stated in the following manner:



$textEPE(x_0) = textE_y_0textE_mathcalT(y_0 - haty_0)^2$, where $mathcalT$ is the training set.



My issue with the above is that I cannot see how the use of EPE in equation 2.27 is equivalent to that of equation 2.9; is there a way to show that they are? Or is the latter notation incorrect, and EPE is only a function of the learner?



Related question



I have just taken a look at the "related questions" after posting this one, and have found the following question. It appears to express a similar sentiment as mine. The accepted answer, as I understand it, seemingly implies the usage of EPE in equation 2.27 is inconsistent. If this is the case, and the two usages cannot be consolidated, I suppose this question can be taken down as a duplicate (? New here.).







share|cite|improve this question














Context



I am self-studying Elements of Statistical Learning (2nd ed), by Friedman, Hastie & Tibshirani. I have a question with regards to what the EPE, as defined in this text, is a function of. Namely, on page 10 (equation 2.9), EPE is defined as:



$textEPE(f) = textE_X, Y(Y - f(X))^2$



This implies that EPE is a function of the learner, $f$, and that it is an expectation over the joint distribution of $X$ and $Y$.



I accept and understand this definition, since it distinguishes EPE from mean squared error (MSE), which is defined for some $x_0 in Omega_X$ as:



$textMSE(x_0) = textE_mathcalT(y_0 - hatf(x_0))^2$



Where $mathcalT$ represents the training set, and $hatf$ represents the corresponding learner. Note that the above equation assumes the relationship between $X$ and $Y$ is deterministic -- $y_0$ is assumed to be a constant.



Question



Where my confusion arises is in the use of EPE on page 18 (equation 2.27). The context of its use is this: the relationship between $Y$ (the dependent variable) and $X$ (the independent variable) is assumed to be linear in $X$:



$Y = X^Tbeta + epsilon$, where $epsilon sim mathcalN(0, sigma^2)$ independently of $X$.



Then, for some arbitrary test point $x_0$, equation 2.27 is stated in the following manner:



$textEPE(x_0) = textE_y_0textE_mathcalT(y_0 - haty_0)^2$, where $mathcalT$ is the training set.



My issue with the above is that I cannot see how the use of EPE in equation 2.27 is equivalent to that of equation 2.9; is there a way to show that they are? Or is the latter notation incorrect, and EPE is only a function of the learner?



Related question



I have just taken a look at the "related questions" after posting this one, and have found the following question. It appears to express a similar sentiment as mine. The accepted answer, as I understand it, seemingly implies the usage of EPE in equation 2.27 is inconsistent. If this is the case, and the two usages cannot be consolidated, I suppose this question can be taken down as a duplicate (? New here.).









share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited 22 hours ago

























asked Aug 20 at 13:25









Nurmister

267




267







  • 1




    From what I remember of this topic, it is correct to say that the two uses of $textEPE$ are not the same. For the record, I've given up on trying to understand the notation in Elements of Statistical Learning. Inconsistencies galore.
    – Clarinetist
    Aug 20 at 14:00












  • 1




    From what I remember of this topic, it is correct to say that the two uses of $textEPE$ are not the same. For the record, I've given up on trying to understand the notation in Elements of Statistical Learning. Inconsistencies galore.
    – Clarinetist
    Aug 20 at 14:00







1




1




From what I remember of this topic, it is correct to say that the two uses of $textEPE$ are not the same. For the record, I've given up on trying to understand the notation in Elements of Statistical Learning. Inconsistencies galore.
– Clarinetist
Aug 20 at 14:00




From what I remember of this topic, it is correct to say that the two uses of $textEPE$ are not the same. For the record, I've given up on trying to understand the notation in Elements of Statistical Learning. Inconsistencies galore.
– Clarinetist
Aug 20 at 14:00










1 Answer
1






active

oldest

votes

















up vote
0
down vote



accepted










The two usages of EPE are unlike each other; the latter usage of EPE is closer to that of an "enhanced MSE". Namely, in Equation 2.27 the expectation is of the difference between the predicted and true value of the dependent variable, conditional on $X=x_0$ and also over the distribution of T. The "conditional on $X=x_0$" part distinguishes it from MSE as I have understood it to be defined. It is needed here because the relationship between X and Y is not deterministic.






share|cite|improve this answer




















    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "69"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2888788%2fwhat-is-expected-prediction-error-epe-a-function-of%23new-answer', 'question_page');

    );

    Post as a guest






























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote



    accepted










    The two usages of EPE are unlike each other; the latter usage of EPE is closer to that of an "enhanced MSE". Namely, in Equation 2.27 the expectation is of the difference between the predicted and true value of the dependent variable, conditional on $X=x_0$ and also over the distribution of T. The "conditional on $X=x_0$" part distinguishes it from MSE as I have understood it to be defined. It is needed here because the relationship between X and Y is not deterministic.






    share|cite|improve this answer
























      up vote
      0
      down vote



      accepted










      The two usages of EPE are unlike each other; the latter usage of EPE is closer to that of an "enhanced MSE". Namely, in Equation 2.27 the expectation is of the difference between the predicted and true value of the dependent variable, conditional on $X=x_0$ and also over the distribution of T. The "conditional on $X=x_0$" part distinguishes it from MSE as I have understood it to be defined. It is needed here because the relationship between X and Y is not deterministic.






      share|cite|improve this answer






















        up vote
        0
        down vote



        accepted







        up vote
        0
        down vote



        accepted






        The two usages of EPE are unlike each other; the latter usage of EPE is closer to that of an "enhanced MSE". Namely, in Equation 2.27 the expectation is of the difference between the predicted and true value of the dependent variable, conditional on $X=x_0$ and also over the distribution of T. The "conditional on $X=x_0$" part distinguishes it from MSE as I have understood it to be defined. It is needed here because the relationship between X and Y is not deterministic.






        share|cite|improve this answer












        The two usages of EPE are unlike each other; the latter usage of EPE is closer to that of an "enhanced MSE". Namely, in Equation 2.27 the expectation is of the difference between the predicted and true value of the dependent variable, conditional on $X=x_0$ and also over the distribution of T. The "conditional on $X=x_0$" part distinguishes it from MSE as I have understood it to be defined. It is needed here because the relationship between X and Y is not deterministic.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered 22 hours ago









        Nurmister

        267




        267






















             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2888788%2fwhat-is-expected-prediction-error-epe-a-function-of%23new-answer', 'question_page');

            );

            Post as a guest













































































            這個網誌中的熱門文章

            How to combine Bézier curves to a surface?

            Mutual Information Always Non-negative

            Why am i infinitely getting the same tweet with the Twitter Search API?