Questions concerning the power of the standard deviation

up vote
1
down vote

favorite

The formula for standard deviation is

$$S_x = sqrtfrac1n-1sum_i=1^n(x_i-barx)^2$$

I learn that $68$% of the values fall within $S_x$, $95$% of the values fall within $2S_x$, and $99.7$% of the values fall within $3S_x$.

My question is that why is it the second power? Can it also be $(x_i-barx)^4$, or any other even powers?

What is the reason behind the second power? Is it just easy to use? Or is here any other meaning to it?

asked Sep 10 at 20:58

Larry

370117

add a commentÂ |Â

up vote
1
down vote

favorite

The formula for standard deviation is

$$S_x = sqrtfrac1n-1sum_i=1^n(x_i-barx)^2$$

I learn that $68$% of the values fall within $S_x$, $95$% of the values fall within $2S_x$, and $99.7$% of the values fall within $3S_x$.

My question is that why is it the second power? Can it also be $(x_i-barx)^4$, or any other even powers?

What is the reason behind the second power? Is it just easy to use? Or is here any other meaning to it?

asked Sep 10 at 20:58

Larry

370117

add a commentÂ |Â

up vote
1
down vote

favorite

The formula for standard deviation is

$$S_x = sqrtfrac1n-1sum_i=1^n(x_i-barx)^2$$

I learn that $68$% of the values fall within $S_x$, $95$% of the values fall within $2S_x$, and $99.7$% of the values fall within $3S_x$.

My question is that why is it the second power? Can it also be $(x_i-barx)^4$, or any other even powers?

What is the reason behind the second power? Is it just easy to use? Or is here any other meaning to it?

asked Sep 10 at 20:58

Larry

370117

The formula for standard deviation is

$$S_x = sqrtfrac1n-1sum_i=1^n(x_i-barx)^2$$

I learn that $68$% of the values fall within $S_x$, $95$% of the values fall within $2S_x$, and $99.7$% of the values fall within $3S_x$.

My question is that why is it the second power? Can it also be $(x_i-barx)^4$, or any other even powers?

What is the reason behind the second power? Is it just easy to use? Or is here any other meaning to it?

statistics

asked Sep 10 at 20:58

Larry

370117

asked Sep 10 at 20:58

Larry

370117

asked Sep 10 at 20:58

Larry

370117

asked Sep 10 at 20:58

Larry

370117

asked Sep 10 at 20:58

Larry

370117

add a commentÂ |Â

3 Answers
3

active

oldest

votes

up vote
1
down vote

accepted

Some reasons to define the variance and standard deviation the way they're defined:

With this definition, the mean minimizes the variance, meaning: If we compute the mean square deviation from some value $mu$, it's minimal if $mu$ is the mean:

begineqnarray*
f(mu)&=&sum_i(x_i-mu)^2;,\
f'(mu)&=&-2sum_i(x_i-mu);,\
f'(mu)=0&Leftrightarrow&mu=frac1nsum_ix_i;.
endeqnarray*

This doesn't work the same way with higher even powers, e.g.:

begineqnarray*
f(mu)&=&sum_i(x_i-mu)^4;,\
f'(mu)&=&-4sum_i(x_i-mu)^3;,\
f'(mu)=0&Leftrightarrow&sum_i(x_i-mu)^3=0;,
endeqnarray*

a cubic equation for $mu$ without a natural interpretation. Thus, the median minimizes the mean absolute deviation, and the mean minimizes the mean square deviation, whereas the number minimizing the mean quartic deviation isn't known to have any nice properties.

The variance of independent random variables is additive:

begineqnarray*
mathsfVar(X+Y)&=&mathsf Eleft[(x+y-bar x-bar y)^2right]\
&=&
mathsf Eleft[(x-bar x)^2right]+mathsf Eleft[(y-bar y)^2right]+2mathsf Eleft[xy-bar xy-xbar y+bar xbar yright]
\
&=&
mathsf Eleft[(x-bar x)^2right]+mathsf Eleft[(y-bar y)^2right]+2(bar xbar y-bar xbar y-bar xbar y+bar xbar y)
\
&=&
mathsf Eleft[(x-bar x)^2right]+mathsf Eleft[(y-bar y)^2right]
\
&=&
mathsfVar(X)+mathsfVar(Y);.
endeqnarray*

This, too, wouldn't work with higher even powers. This sort of additivity is at the heart of important theorems like the central limit theorem.

answered Sep 10 at 21:29

joriki

169k10181337

add a commentÂ |Â

up vote
2
down vote

Those statement about $68%$, $95%,$ and $99.7%$ apply to the normal distribution, but certainly do not apply to all distributions.

Defining the variance by using $n-1$ in the denominator, where $n$ is the sample size, is done only when using the sample variance to estimate the population variance or otherwise drawing inferences about the population by using a random sample. The population variance is $operatorname E((X-mu)^2)$ where $mu=operatorname E(X),$ and if the population consists of $n$ equally probablye outcomes, then the standard deviation is given by a formula that looks like what you wrote except that it has $n$ where you have $n-1.$

The reason the second power is used in measuring dispersion is that if $X_1,ldots,X_n$ are independent, then
$$
operatornamevar(X_1+cdots+X_n) = operatornamevar(X_1)+cdots + operatornamevar(X_n).
$$
You need that whenever you apply the central limit theorem.

answered Sep 11 at 0:17

Michael Hardy

206k23187466

add a commentÂ |Â

up vote
0
down vote

Standard deviation is one way to measure the spread of some data. You could certainly introduce another measure of spread that used 4th powers and took the fourth root. It would have different properties, and might not be useful.

For example with data that is normally distributed, the property you cite about 68% and 95% would not hold with such a different measure of spread.

There are genuine reasons to work with a measure of spread that involves squaring the residuals like standard deviation/error does. I don't know that I could be successful at explaining them in a short SE post. Maybe someone else will though.

answered Sep 10 at 21:04

alex.jordan

37.2k559117

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2912354%2fquestions-concerning-the-power-of-the-standard-deviation%23new-answer', 'question_page');

);

Post as a guest

Name

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
1
down vote

accepted

Some reasons to define the variance and standard deviation the way they're defined:

With this definition, the mean minimizes the variance, meaning: If we compute the mean square deviation from some value $mu$, it's minimal if $mu$ is the mean:

begineqnarray*
f(mu)&=&sum_i(x_i-mu)^2;,\
f'(mu)&=&-2sum_i(x_i-mu);,\
f'(mu)=0&Leftrightarrow&mu=frac1nsum_ix_i;.
endeqnarray*

This doesn't work the same way with higher even powers, e.g.:

begineqnarray*
f(mu)&=&sum_i(x_i-mu)^4;,\
f'(mu)&=&-4sum_i(x_i-mu)^3;,\
f'(mu)=0&Leftrightarrow&sum_i(x_i-mu)^3=0;,
endeqnarray*

The variance of independent random variables is additive:

This, too, wouldn't work with higher even powers. This sort of additivity is at the heart of important theorems like the central limit theorem.

answered Sep 10 at 21:29

joriki

169k10181337

add a commentÂ |Â

up vote
1
down vote

accepted

Some reasons to define the variance and standard deviation the way they're defined:

With this definition, the mean minimizes the variance, meaning: If we compute the mean square deviation from some value $mu$, it's minimal if $mu$ is the mean:

begineqnarray*
f(mu)&=&sum_i(x_i-mu)^2;,\
f'(mu)&=&-2sum_i(x_i-mu);,\
f'(mu)=0&Leftrightarrow&mu=frac1nsum_ix_i;.
endeqnarray*

This doesn't work the same way with higher even powers, e.g.:

begineqnarray*
f(mu)&=&sum_i(x_i-mu)^4;,\
f'(mu)&=&-4sum_i(x_i-mu)^3;,\
f'(mu)=0&Leftrightarrow&sum_i(x_i-mu)^3=0;,
endeqnarray*

The variance of independent random variables is additive:

This, too, wouldn't work with higher even powers. This sort of additivity is at the heart of important theorems like the central limit theorem.

answered Sep 10 at 21:29

joriki

169k10181337

add a commentÂ |Â

up vote
1
down vote

accepted

Some reasons to define the variance and standard deviation the way they're defined:

With this definition, the mean minimizes the variance, meaning: If we compute the mean square deviation from some value $mu$, it's minimal if $mu$ is the mean:

begineqnarray*
f(mu)&=&sum_i(x_i-mu)^2;,\
f'(mu)&=&-2sum_i(x_i-mu);,\
f'(mu)=0&Leftrightarrow&mu=frac1nsum_ix_i;.
endeqnarray*

This doesn't work the same way with higher even powers, e.g.:

begineqnarray*
f(mu)&=&sum_i(x_i-mu)^4;,\
f'(mu)&=&-4sum_i(x_i-mu)^3;,\
f'(mu)=0&Leftrightarrow&sum_i(x_i-mu)^3=0;,
endeqnarray*

The variance of independent random variables is additive:

This, too, wouldn't work with higher even powers. This sort of additivity is at the heart of important theorems like the central limit theorem.

answered Sep 10 at 21:29

joriki

169k10181337

Some reasons to define the variance and standard deviation the way they're defined:

With this definition, the mean minimizes the variance, meaning: If we compute the mean square deviation from some value $mu$, it's minimal if $mu$ is the mean:

begineqnarray*
f(mu)&=&sum_i(x_i-mu)^2;,\
f'(mu)&=&-2sum_i(x_i-mu);,\
f'(mu)=0&Leftrightarrow&mu=frac1nsum_ix_i;.
endeqnarray*

This doesn't work the same way with higher even powers, e.g.:

begineqnarray*
f(mu)&=&sum_i(x_i-mu)^4;,\
f'(mu)&=&-4sum_i(x_i-mu)^3;,\
f'(mu)=0&Leftrightarrow&sum_i(x_i-mu)^3=0;,
endeqnarray*

The variance of independent random variables is additive:

This, too, wouldn't work with higher even powers. This sort of additivity is at the heart of important theorems like the central limit theorem.

answered Sep 10 at 21:29

joriki

169k10181337

answered Sep 10 at 21:29

joriki

169k10181337

answered Sep 10 at 21:29

joriki

169k10181337

answered Sep 10 at 21:29

joriki

169k10181337

add a commentÂ |Â

up vote
2
down vote

Those statement about $68%$, $95%,$ and $99.7%$ apply to the normal distribution, but certainly do not apply to all distributions.

answered Sep 11 at 0:17

Michael Hardy

206k23187466

add a commentÂ |Â

up vote
2
down vote

Those statement about $68%$, $95%,$ and $99.7%$ apply to the normal distribution, but certainly do not apply to all distributions.

answered Sep 11 at 0:17

Michael Hardy

206k23187466

add a commentÂ |Â

up vote
2
down vote

Those statement about $68%$, $95%,$ and $99.7%$ apply to the normal distribution, but certainly do not apply to all distributions.

answered Sep 11 at 0:17

Michael Hardy

206k23187466

Those statement about $68%$, $95%,$ and $99.7%$ apply to the normal distribution, but certainly do not apply to all distributions.

answered Sep 11 at 0:17

Michael Hardy

206k23187466

answered Sep 11 at 0:17

Michael Hardy

206k23187466

answered Sep 11 at 0:17

Michael Hardy

206k23187466

answered Sep 11 at 0:17

Michael Hardy

206k23187466

add a commentÂ |Â

up vote
0
down vote

For example with data that is normally distributed, the property you cite about 68% and 95% would not hold with such a different measure of spread.

answered Sep 10 at 21:04

alex.jordan

37.2k559117

add a commentÂ |Â

up vote
0
down vote

For example with data that is normally distributed, the property you cite about 68% and 95% would not hold with such a different measure of spread.

answered Sep 10 at 21:04

alex.jordan

37.2k559117

add a commentÂ |Â

up vote
0
down vote

For example with data that is normally distributed, the property you cite about 68% and 95% would not hold with such a different measure of spread.

answered Sep 10 at 21:04

alex.jordan

37.2k559117

For example with data that is normally distributed, the property you cite about 68% and 95% would not hold with such a different measure of spread.

answered Sep 10 at 21:04

alex.jordan

37.2k559117

answered Sep 10 at 21:04

alex.jordan

37.2k559117

answered Sep 10 at 21:04

alex.jordan

37.2k559117

answered Sep 10 at 21:04

alex.jordan

37.2k559117

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

Vtyjkyuk