Questions concerning the power of the standard deviation

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












The formula for standard deviation is



$$S_x = sqrtfrac1n-1sum_i=1^n(x_i-barx)^2$$



I learn that $68$% of the values fall within $S_x$, $95$% of the values fall within $2S_x$, and $99.7$% of the values fall within $3S_x$.



My question is that why is it the second power? Can it also be $(x_i-barx)^4$, or any other even powers?



What is the reason behind the second power? Is it just easy to use? Or is here any other meaning to it?










share|cite|improve this question

























    up vote
    1
    down vote

    favorite












    The formula for standard deviation is



    $$S_x = sqrtfrac1n-1sum_i=1^n(x_i-barx)^2$$



    I learn that $68$% of the values fall within $S_x$, $95$% of the values fall within $2S_x$, and $99.7$% of the values fall within $3S_x$.



    My question is that why is it the second power? Can it also be $(x_i-barx)^4$, or any other even powers?



    What is the reason behind the second power? Is it just easy to use? Or is here any other meaning to it?










    share|cite|improve this question























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      The formula for standard deviation is



      $$S_x = sqrtfrac1n-1sum_i=1^n(x_i-barx)^2$$



      I learn that $68$% of the values fall within $S_x$, $95$% of the values fall within $2S_x$, and $99.7$% of the values fall within $3S_x$.



      My question is that why is it the second power? Can it also be $(x_i-barx)^4$, or any other even powers?



      What is the reason behind the second power? Is it just easy to use? Or is here any other meaning to it?










      share|cite|improve this question













      The formula for standard deviation is



      $$S_x = sqrtfrac1n-1sum_i=1^n(x_i-barx)^2$$



      I learn that $68$% of the values fall within $S_x$, $95$% of the values fall within $2S_x$, and $99.7$% of the values fall within $3S_x$.



      My question is that why is it the second power? Can it also be $(x_i-barx)^4$, or any other even powers?



      What is the reason behind the second power? Is it just easy to use? Or is here any other meaning to it?







      statistics






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Sep 10 at 20:58









      Larry

      370117




      370117




















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          Some reasons to define the variance and standard deviation the way they're defined:



          With this definition, the mean minimizes the variance, meaning: If we compute the mean square deviation from some value $mu$, it's minimal if $mu$ is the mean:



          begineqnarray*
          f(mu)&=&sum_i(x_i-mu)^2;,\
          f'(mu)&=&-2sum_i(x_i-mu);,\
          f'(mu)=0&Leftrightarrow&mu=frac1nsum_ix_i;.
          endeqnarray*



          This doesn't work the same way with higher even powers, e.g.:



          begineqnarray*
          f(mu)&=&sum_i(x_i-mu)^4;,\
          f'(mu)&=&-4sum_i(x_i-mu)^3;,\
          f'(mu)=0&Leftrightarrow&sum_i(x_i-mu)^3=0;,
          endeqnarray*



          a cubic equation for $mu$ without a natural interpretation. Thus, the median minimizes the mean absolute deviation, and the mean minimizes the mean square deviation, whereas the number minimizing the mean quartic deviation isn't known to have any nice properties.



          The variance of independent random variables is additive:



          begineqnarray*
          mathsfVar(X+Y)&=&mathsf Eleft[(x+y-bar x-bar y)^2right]\
          &=&
          mathsf Eleft[(x-bar x)^2right]+mathsf Eleft[(y-bar y)^2right]+2mathsf Eleft[xy-bar xy-xbar y+bar xbar yright]
          \
          &=&
          mathsf Eleft[(x-bar x)^2right]+mathsf Eleft[(y-bar y)^2right]+2(bar xbar y-bar xbar y-bar xbar y+bar xbar y)
          \
          &=&
          mathsf Eleft[(x-bar x)^2right]+mathsf Eleft[(y-bar y)^2right]
          \
          &=&
          mathsfVar(X)+mathsfVar(Y);.
          endeqnarray*



          This, too, wouldn't work with higher even powers. This sort of additivity is at the heart of important theorems like the central limit theorem.






          share|cite|improve this answer



























            up vote
            2
            down vote













            Those statement about $68%$, $95%,$ and $99.7%$ apply to the normal distribution, but certainly do not apply to all distributions.



            Defining the variance by using $n-1$ in the denominator, where $n$ is the sample size, is done only when using the sample variance to estimate the population variance or otherwise drawing inferences about the population by using a random sample. The population variance is $operatorname E((X-mu)^2)$ where $mu=operatorname E(X),$ and if the population consists of $n$ equally probablye outcomes, then the standard deviation is given by a formula that looks like what you wrote except that it has $n$ where you have $n-1.$



            The reason the second power is used in measuring dispersion is that if $X_1,ldots,X_n$ are independent, then
            $$
            operatornamevar(X_1+cdots+X_n) = operatornamevar(X_1)+cdots + operatornamevar(X_n).
            $$
            You need that whenever you apply the central limit theorem.






            share|cite|improve this answer



























              up vote
              0
              down vote













              Standard deviation is one way to measure the spread of some data. You could certainly introduce another measure of spread that used 4th powers and took the fourth root. It would have different properties, and might not be useful.



              For example with data that is normally distributed, the property you cite about 68% and 95% would not hold with such a different measure of spread.



              There are genuine reasons to work with a measure of spread that involves squaring the residuals like standard deviation/error does. I don't know that I could be successful at explaining them in a short SE post. Maybe someone else will though.






              share|cite|improve this answer




















                Your Answer




                StackExchange.ifUsing("editor", function ()
                return StackExchange.using("mathjaxEditing", function ()
                StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
                StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
                );
                );
                , "mathjax-editing");

                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "69"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                convertImagesToLinks: true,
                noModals: false,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                noCode: true, onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );













                 

                draft saved


                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2912354%2fquestions-concerning-the-power-of-the-standard-deviation%23new-answer', 'question_page');

                );

                Post as a guest






























                3 Answers
                3






                active

                oldest

                votes








                3 Answers
                3






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes








                up vote
                1
                down vote



                accepted










                Some reasons to define the variance and standard deviation the way they're defined:



                With this definition, the mean minimizes the variance, meaning: If we compute the mean square deviation from some value $mu$, it's minimal if $mu$ is the mean:



                begineqnarray*
                f(mu)&=&sum_i(x_i-mu)^2;,\
                f'(mu)&=&-2sum_i(x_i-mu);,\
                f'(mu)=0&Leftrightarrow&mu=frac1nsum_ix_i;.
                endeqnarray*



                This doesn't work the same way with higher even powers, e.g.:



                begineqnarray*
                f(mu)&=&sum_i(x_i-mu)^4;,\
                f'(mu)&=&-4sum_i(x_i-mu)^3;,\
                f'(mu)=0&Leftrightarrow&sum_i(x_i-mu)^3=0;,
                endeqnarray*



                a cubic equation for $mu$ without a natural interpretation. Thus, the median minimizes the mean absolute deviation, and the mean minimizes the mean square deviation, whereas the number minimizing the mean quartic deviation isn't known to have any nice properties.



                The variance of independent random variables is additive:



                begineqnarray*
                mathsfVar(X+Y)&=&mathsf Eleft[(x+y-bar x-bar y)^2right]\
                &=&
                mathsf Eleft[(x-bar x)^2right]+mathsf Eleft[(y-bar y)^2right]+2mathsf Eleft[xy-bar xy-xbar y+bar xbar yright]
                \
                &=&
                mathsf Eleft[(x-bar x)^2right]+mathsf Eleft[(y-bar y)^2right]+2(bar xbar y-bar xbar y-bar xbar y+bar xbar y)
                \
                &=&
                mathsf Eleft[(x-bar x)^2right]+mathsf Eleft[(y-bar y)^2right]
                \
                &=&
                mathsfVar(X)+mathsfVar(Y);.
                endeqnarray*



                This, too, wouldn't work with higher even powers. This sort of additivity is at the heart of important theorems like the central limit theorem.






                share|cite|improve this answer
























                  up vote
                  1
                  down vote



                  accepted










                  Some reasons to define the variance and standard deviation the way they're defined:



                  With this definition, the mean minimizes the variance, meaning: If we compute the mean square deviation from some value $mu$, it's minimal if $mu$ is the mean:



                  begineqnarray*
                  f(mu)&=&sum_i(x_i-mu)^2;,\
                  f'(mu)&=&-2sum_i(x_i-mu);,\
                  f'(mu)=0&Leftrightarrow&mu=frac1nsum_ix_i;.
                  endeqnarray*



                  This doesn't work the same way with higher even powers, e.g.:



                  begineqnarray*
                  f(mu)&=&sum_i(x_i-mu)^4;,\
                  f'(mu)&=&-4sum_i(x_i-mu)^3;,\
                  f'(mu)=0&Leftrightarrow&sum_i(x_i-mu)^3=0;,
                  endeqnarray*



                  a cubic equation for $mu$ without a natural interpretation. Thus, the median minimizes the mean absolute deviation, and the mean minimizes the mean square deviation, whereas the number minimizing the mean quartic deviation isn't known to have any nice properties.



                  The variance of independent random variables is additive:



                  begineqnarray*
                  mathsfVar(X+Y)&=&mathsf Eleft[(x+y-bar x-bar y)^2right]\
                  &=&
                  mathsf Eleft[(x-bar x)^2right]+mathsf Eleft[(y-bar y)^2right]+2mathsf Eleft[xy-bar xy-xbar y+bar xbar yright]
                  \
                  &=&
                  mathsf Eleft[(x-bar x)^2right]+mathsf Eleft[(y-bar y)^2right]+2(bar xbar y-bar xbar y-bar xbar y+bar xbar y)
                  \
                  &=&
                  mathsf Eleft[(x-bar x)^2right]+mathsf Eleft[(y-bar y)^2right]
                  \
                  &=&
                  mathsfVar(X)+mathsfVar(Y);.
                  endeqnarray*



                  This, too, wouldn't work with higher even powers. This sort of additivity is at the heart of important theorems like the central limit theorem.






                  share|cite|improve this answer






















                    up vote
                    1
                    down vote



                    accepted







                    up vote
                    1
                    down vote



                    accepted






                    Some reasons to define the variance and standard deviation the way they're defined:



                    With this definition, the mean minimizes the variance, meaning: If we compute the mean square deviation from some value $mu$, it's minimal if $mu$ is the mean:



                    begineqnarray*
                    f(mu)&=&sum_i(x_i-mu)^2;,\
                    f'(mu)&=&-2sum_i(x_i-mu);,\
                    f'(mu)=0&Leftrightarrow&mu=frac1nsum_ix_i;.
                    endeqnarray*



                    This doesn't work the same way with higher even powers, e.g.:



                    begineqnarray*
                    f(mu)&=&sum_i(x_i-mu)^4;,\
                    f'(mu)&=&-4sum_i(x_i-mu)^3;,\
                    f'(mu)=0&Leftrightarrow&sum_i(x_i-mu)^3=0;,
                    endeqnarray*



                    a cubic equation for $mu$ without a natural interpretation. Thus, the median minimizes the mean absolute deviation, and the mean minimizes the mean square deviation, whereas the number minimizing the mean quartic deviation isn't known to have any nice properties.



                    The variance of independent random variables is additive:



                    begineqnarray*
                    mathsfVar(X+Y)&=&mathsf Eleft[(x+y-bar x-bar y)^2right]\
                    &=&
                    mathsf Eleft[(x-bar x)^2right]+mathsf Eleft[(y-bar y)^2right]+2mathsf Eleft[xy-bar xy-xbar y+bar xbar yright]
                    \
                    &=&
                    mathsf Eleft[(x-bar x)^2right]+mathsf Eleft[(y-bar y)^2right]+2(bar xbar y-bar xbar y-bar xbar y+bar xbar y)
                    \
                    &=&
                    mathsf Eleft[(x-bar x)^2right]+mathsf Eleft[(y-bar y)^2right]
                    \
                    &=&
                    mathsfVar(X)+mathsfVar(Y);.
                    endeqnarray*



                    This, too, wouldn't work with higher even powers. This sort of additivity is at the heart of important theorems like the central limit theorem.






                    share|cite|improve this answer












                    Some reasons to define the variance and standard deviation the way they're defined:



                    With this definition, the mean minimizes the variance, meaning: If we compute the mean square deviation from some value $mu$, it's minimal if $mu$ is the mean:



                    begineqnarray*
                    f(mu)&=&sum_i(x_i-mu)^2;,\
                    f'(mu)&=&-2sum_i(x_i-mu);,\
                    f'(mu)=0&Leftrightarrow&mu=frac1nsum_ix_i;.
                    endeqnarray*



                    This doesn't work the same way with higher even powers, e.g.:



                    begineqnarray*
                    f(mu)&=&sum_i(x_i-mu)^4;,\
                    f'(mu)&=&-4sum_i(x_i-mu)^3;,\
                    f'(mu)=0&Leftrightarrow&sum_i(x_i-mu)^3=0;,
                    endeqnarray*



                    a cubic equation for $mu$ without a natural interpretation. Thus, the median minimizes the mean absolute deviation, and the mean minimizes the mean square deviation, whereas the number minimizing the mean quartic deviation isn't known to have any nice properties.



                    The variance of independent random variables is additive:



                    begineqnarray*
                    mathsfVar(X+Y)&=&mathsf Eleft[(x+y-bar x-bar y)^2right]\
                    &=&
                    mathsf Eleft[(x-bar x)^2right]+mathsf Eleft[(y-bar y)^2right]+2mathsf Eleft[xy-bar xy-xbar y+bar xbar yright]
                    \
                    &=&
                    mathsf Eleft[(x-bar x)^2right]+mathsf Eleft[(y-bar y)^2right]+2(bar xbar y-bar xbar y-bar xbar y+bar xbar y)
                    \
                    &=&
                    mathsf Eleft[(x-bar x)^2right]+mathsf Eleft[(y-bar y)^2right]
                    \
                    &=&
                    mathsfVar(X)+mathsfVar(Y);.
                    endeqnarray*



                    This, too, wouldn't work with higher even powers. This sort of additivity is at the heart of important theorems like the central limit theorem.







                    share|cite|improve this answer












                    share|cite|improve this answer



                    share|cite|improve this answer










                    answered Sep 10 at 21:29









                    joriki

                    169k10181337




                    169k10181337




















                        up vote
                        2
                        down vote













                        Those statement about $68%$, $95%,$ and $99.7%$ apply to the normal distribution, but certainly do not apply to all distributions.



                        Defining the variance by using $n-1$ in the denominator, where $n$ is the sample size, is done only when using the sample variance to estimate the population variance or otherwise drawing inferences about the population by using a random sample. The population variance is $operatorname E((X-mu)^2)$ where $mu=operatorname E(X),$ and if the population consists of $n$ equally probablye outcomes, then the standard deviation is given by a formula that looks like what you wrote except that it has $n$ where you have $n-1.$



                        The reason the second power is used in measuring dispersion is that if $X_1,ldots,X_n$ are independent, then
                        $$
                        operatornamevar(X_1+cdots+X_n) = operatornamevar(X_1)+cdots + operatornamevar(X_n).
                        $$
                        You need that whenever you apply the central limit theorem.






                        share|cite|improve this answer
























                          up vote
                          2
                          down vote













                          Those statement about $68%$, $95%,$ and $99.7%$ apply to the normal distribution, but certainly do not apply to all distributions.



                          Defining the variance by using $n-1$ in the denominator, where $n$ is the sample size, is done only when using the sample variance to estimate the population variance or otherwise drawing inferences about the population by using a random sample. The population variance is $operatorname E((X-mu)^2)$ where $mu=operatorname E(X),$ and if the population consists of $n$ equally probablye outcomes, then the standard deviation is given by a formula that looks like what you wrote except that it has $n$ where you have $n-1.$



                          The reason the second power is used in measuring dispersion is that if $X_1,ldots,X_n$ are independent, then
                          $$
                          operatornamevar(X_1+cdots+X_n) = operatornamevar(X_1)+cdots + operatornamevar(X_n).
                          $$
                          You need that whenever you apply the central limit theorem.






                          share|cite|improve this answer






















                            up vote
                            2
                            down vote










                            up vote
                            2
                            down vote









                            Those statement about $68%$, $95%,$ and $99.7%$ apply to the normal distribution, but certainly do not apply to all distributions.



                            Defining the variance by using $n-1$ in the denominator, where $n$ is the sample size, is done only when using the sample variance to estimate the population variance or otherwise drawing inferences about the population by using a random sample. The population variance is $operatorname E((X-mu)^2)$ where $mu=operatorname E(X),$ and if the population consists of $n$ equally probablye outcomes, then the standard deviation is given by a formula that looks like what you wrote except that it has $n$ where you have $n-1.$



                            The reason the second power is used in measuring dispersion is that if $X_1,ldots,X_n$ are independent, then
                            $$
                            operatornamevar(X_1+cdots+X_n) = operatornamevar(X_1)+cdots + operatornamevar(X_n).
                            $$
                            You need that whenever you apply the central limit theorem.






                            share|cite|improve this answer












                            Those statement about $68%$, $95%,$ and $99.7%$ apply to the normal distribution, but certainly do not apply to all distributions.



                            Defining the variance by using $n-1$ in the denominator, where $n$ is the sample size, is done only when using the sample variance to estimate the population variance or otherwise drawing inferences about the population by using a random sample. The population variance is $operatorname E((X-mu)^2)$ where $mu=operatorname E(X),$ and if the population consists of $n$ equally probablye outcomes, then the standard deviation is given by a formula that looks like what you wrote except that it has $n$ where you have $n-1.$



                            The reason the second power is used in measuring dispersion is that if $X_1,ldots,X_n$ are independent, then
                            $$
                            operatornamevar(X_1+cdots+X_n) = operatornamevar(X_1)+cdots + operatornamevar(X_n).
                            $$
                            You need that whenever you apply the central limit theorem.







                            share|cite|improve this answer












                            share|cite|improve this answer



                            share|cite|improve this answer










                            answered Sep 11 at 0:17









                            Michael Hardy

                            206k23187466




                            206k23187466




















                                up vote
                                0
                                down vote













                                Standard deviation is one way to measure the spread of some data. You could certainly introduce another measure of spread that used 4th powers and took the fourth root. It would have different properties, and might not be useful.



                                For example with data that is normally distributed, the property you cite about 68% and 95% would not hold with such a different measure of spread.



                                There are genuine reasons to work with a measure of spread that involves squaring the residuals like standard deviation/error does. I don't know that I could be successful at explaining them in a short SE post. Maybe someone else will though.






                                share|cite|improve this answer
























                                  up vote
                                  0
                                  down vote













                                  Standard deviation is one way to measure the spread of some data. You could certainly introduce another measure of spread that used 4th powers and took the fourth root. It would have different properties, and might not be useful.



                                  For example with data that is normally distributed, the property you cite about 68% and 95% would not hold with such a different measure of spread.



                                  There are genuine reasons to work with a measure of spread that involves squaring the residuals like standard deviation/error does. I don't know that I could be successful at explaining them in a short SE post. Maybe someone else will though.






                                  share|cite|improve this answer






















                                    up vote
                                    0
                                    down vote










                                    up vote
                                    0
                                    down vote









                                    Standard deviation is one way to measure the spread of some data. You could certainly introduce another measure of spread that used 4th powers and took the fourth root. It would have different properties, and might not be useful.



                                    For example with data that is normally distributed, the property you cite about 68% and 95% would not hold with such a different measure of spread.



                                    There are genuine reasons to work with a measure of spread that involves squaring the residuals like standard deviation/error does. I don't know that I could be successful at explaining them in a short SE post. Maybe someone else will though.






                                    share|cite|improve this answer












                                    Standard deviation is one way to measure the spread of some data. You could certainly introduce another measure of spread that used 4th powers and took the fourth root. It would have different properties, and might not be useful.



                                    For example with data that is normally distributed, the property you cite about 68% and 95% would not hold with such a different measure of spread.



                                    There are genuine reasons to work with a measure of spread that involves squaring the residuals like standard deviation/error does. I don't know that I could be successful at explaining them in a short SE post. Maybe someone else will though.







                                    share|cite|improve this answer












                                    share|cite|improve this answer



                                    share|cite|improve this answer










                                    answered Sep 10 at 21:04









                                    alex.jordan

                                    37.2k559117




                                    37.2k559117



























                                         

                                        draft saved


                                        draft discarded















































                                         


                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2912354%2fquestions-concerning-the-power-of-the-standard-deviation%23new-answer', 'question_page');

                                        );

                                        Post as a guest













































































                                        這個網誌中的熱門文章

                                        How to combine Bézier curves to a surface?

                                        Why am i infinitely getting the same tweet with the Twitter Search API?

                                        Carbon dioxide