Standardization of non-normal features
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
5
down vote
favorite
Suppose we have several features (e.g. $geq20$) that do not follow a Gaussian distribution. Do we have to worry about the features not following a Gaussian distribution if we apply standardization on the data?
Namely, even if the features do not follow a normal distribution initially, aren't they made to follow Gaussian distribution after standardization with mean $0$ and variance $1$?
normal-distribution variance standardization
 |Â
show 3 more comments
up vote
5
down vote
favorite
Suppose we have several features (e.g. $geq20$) that do not follow a Gaussian distribution. Do we have to worry about the features not following a Gaussian distribution if we apply standardization on the data?
Namely, even if the features do not follow a normal distribution initially, aren't they made to follow Gaussian distribution after standardization with mean $0$ and variance $1$?
normal-distribution variance standardization
4
Your last statement is incorrect: standardization does not transform a dataset's distribution from non-normal to normal.
â Emil
Sep 3 at 9:51
@Emil After standardization, the mean and variance become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Correct me if I am wrong.
â Akash Dubey
Sep 3 at 9:57
4
Akash, think about what happens to the distribution: Subtracting the mean sets the location of the mean to $0$. Dividing by the standard deviation either compresses or stretches the distribution such that it becomes as narrow or wide as necessary for it to have a standard deviation of $1$. Where in this process did we change the shape? Why would a non-normal distribution suddenly become normal? See here for example for non-normal distributions that meet the criteria: stats.stackexchange.com/a/314003/176202
â Frans Rodenburg
Sep 3 at 10:34
1
The standard normal is a normal distribution with $mu=0$ and $sigma=1$, so to say that it is not normal makes no sense. Note that an arbitrary distribution with mean $0$ and standard deviation $1$ is not called a standard normal distribution.
â Frans Rodenburg
Sep 3 at 10:41
2
You do not have to thank people on CV, but you can show your appreciation by upvoting and accepting @Emil's answer. On a different note, if you comment on a thread, only the OP is notified. You can ping others by using @ followed by their username.
â Frans Rodenburg
Sep 3 at 10:53
 |Â
show 3 more comments
up vote
5
down vote
favorite
up vote
5
down vote
favorite
Suppose we have several features (e.g. $geq20$) that do not follow a Gaussian distribution. Do we have to worry about the features not following a Gaussian distribution if we apply standardization on the data?
Namely, even if the features do not follow a normal distribution initially, aren't they made to follow Gaussian distribution after standardization with mean $0$ and variance $1$?
normal-distribution variance standardization
Suppose we have several features (e.g. $geq20$) that do not follow a Gaussian distribution. Do we have to worry about the features not following a Gaussian distribution if we apply standardization on the data?
Namely, even if the features do not follow a normal distribution initially, aren't they made to follow Gaussian distribution after standardization with mean $0$ and variance $1$?
normal-distribution variance standardization
normal-distribution variance standardization
edited Sep 3 at 10:37
Frans Rodenburg
2,925422
2,925422
asked Sep 3 at 9:13
Akash Dubey
426
426
4
Your last statement is incorrect: standardization does not transform a dataset's distribution from non-normal to normal.
â Emil
Sep 3 at 9:51
@Emil After standardization, the mean and variance become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Correct me if I am wrong.
â Akash Dubey
Sep 3 at 9:57
4
Akash, think about what happens to the distribution: Subtracting the mean sets the location of the mean to $0$. Dividing by the standard deviation either compresses or stretches the distribution such that it becomes as narrow or wide as necessary for it to have a standard deviation of $1$. Where in this process did we change the shape? Why would a non-normal distribution suddenly become normal? See here for example for non-normal distributions that meet the criteria: stats.stackexchange.com/a/314003/176202
â Frans Rodenburg
Sep 3 at 10:34
1
The standard normal is a normal distribution with $mu=0$ and $sigma=1$, so to say that it is not normal makes no sense. Note that an arbitrary distribution with mean $0$ and standard deviation $1$ is not called a standard normal distribution.
â Frans Rodenburg
Sep 3 at 10:41
2
You do not have to thank people on CV, but you can show your appreciation by upvoting and accepting @Emil's answer. On a different note, if you comment on a thread, only the OP is notified. You can ping others by using @ followed by their username.
â Frans Rodenburg
Sep 3 at 10:53
 |Â
show 3 more comments
4
Your last statement is incorrect: standardization does not transform a dataset's distribution from non-normal to normal.
â Emil
Sep 3 at 9:51
@Emil After standardization, the mean and variance become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Correct me if I am wrong.
â Akash Dubey
Sep 3 at 9:57
4
Akash, think about what happens to the distribution: Subtracting the mean sets the location of the mean to $0$. Dividing by the standard deviation either compresses or stretches the distribution such that it becomes as narrow or wide as necessary for it to have a standard deviation of $1$. Where in this process did we change the shape? Why would a non-normal distribution suddenly become normal? See here for example for non-normal distributions that meet the criteria: stats.stackexchange.com/a/314003/176202
â Frans Rodenburg
Sep 3 at 10:34
1
The standard normal is a normal distribution with $mu=0$ and $sigma=1$, so to say that it is not normal makes no sense. Note that an arbitrary distribution with mean $0$ and standard deviation $1$ is not called a standard normal distribution.
â Frans Rodenburg
Sep 3 at 10:41
2
You do not have to thank people on CV, but you can show your appreciation by upvoting and accepting @Emil's answer. On a different note, if you comment on a thread, only the OP is notified. You can ping others by using @ followed by their username.
â Frans Rodenburg
Sep 3 at 10:53
4
4
Your last statement is incorrect: standardization does not transform a dataset's distribution from non-normal to normal.
â Emil
Sep 3 at 9:51
Your last statement is incorrect: standardization does not transform a dataset's distribution from non-normal to normal.
â Emil
Sep 3 at 9:51
@Emil After standardization, the mean and variance become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Correct me if I am wrong.
â Akash Dubey
Sep 3 at 9:57
@Emil After standardization, the mean and variance become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Correct me if I am wrong.
â Akash Dubey
Sep 3 at 9:57
4
4
Akash, think about what happens to the distribution: Subtracting the mean sets the location of the mean to $0$. Dividing by the standard deviation either compresses or stretches the distribution such that it becomes as narrow or wide as necessary for it to have a standard deviation of $1$. Where in this process did we change the shape? Why would a non-normal distribution suddenly become normal? See here for example for non-normal distributions that meet the criteria: stats.stackexchange.com/a/314003/176202
â Frans Rodenburg
Sep 3 at 10:34
Akash, think about what happens to the distribution: Subtracting the mean sets the location of the mean to $0$. Dividing by the standard deviation either compresses or stretches the distribution such that it becomes as narrow or wide as necessary for it to have a standard deviation of $1$. Where in this process did we change the shape? Why would a non-normal distribution suddenly become normal? See here for example for non-normal distributions that meet the criteria: stats.stackexchange.com/a/314003/176202
â Frans Rodenburg
Sep 3 at 10:34
1
1
The standard normal is a normal distribution with $mu=0$ and $sigma=1$, so to say that it is not normal makes no sense. Note that an arbitrary distribution with mean $0$ and standard deviation $1$ is not called a standard normal distribution.
â Frans Rodenburg
Sep 3 at 10:41
The standard normal is a normal distribution with $mu=0$ and $sigma=1$, so to say that it is not normal makes no sense. Note that an arbitrary distribution with mean $0$ and standard deviation $1$ is not called a standard normal distribution.
â Frans Rodenburg
Sep 3 at 10:41
2
2
You do not have to thank people on CV, but you can show your appreciation by upvoting and accepting @Emil's answer. On a different note, if you comment on a thread, only the OP is notified. You can ping others by using @ followed by their username.
â Frans Rodenburg
Sep 3 at 10:53
You do not have to thank people on CV, but you can show your appreciation by upvoting and accepting @Emil's answer. On a different note, if you comment on a thread, only the OP is notified. You can ping others by using @ followed by their username.
â Frans Rodenburg
Sep 3 at 10:53
 |Â
show 3 more comments
1 Answer
1
active
oldest
votes
up vote
13
down vote
accepted
The short answer: yes, you do need to worry about your data's distribution not being normal, because standardization does not transform the underlying distribution structure of the data. If $XsimmathcalN(mu, sigma^2)$ then you can transform this to a standard normal by standardizing: $Y:=(X-mu)/sigma simmathcalN(0,1)$. However, this is possible because $X$ already follows a normal distribution in the first place. If $X$ has a distribution other than normal, standardizing it in the same way as above will generally not make the data normally distributed.
A simple example of exponentially distributed data and its standardized version:
x <- rexp(5000, rate = 0.5)
y <- (x-mean(x))/sd(x)
par(mfrow = c(2,1))
hist(x, freq = FALSE, col = "blue", breaks = 100, xlim = c(min(x), quantile(x, 0.995)),
main = "Histogram of exponentially distributed data X with rate = 0.5")
hist(y, freq = FALSE, col = "yellow", breaks = 100, xlim = c(min(y), quantile(y, 0.995)),
main = "Histogram of standardized data Y = ( X-E(X) ) / StDev(X)")
Now if we check the mean and standard deviation of the original data $x$, we get
c(mean(x), sd(x))
[1] 2.044074 2.051816
whereas for the standardized data $y$, the corresponding results are
c(mean(y), sd(y))
[1] 7.136221e-17 1.000000
As you can see, the distribution of the data after standardization is decidedly not normal, even though the mean is (practically) 0 and the variance 1. In other words, if the features do not follow a normal distribution before standardization, they will not follow it after the standardization either.
I am lilltle confused here, Let our data follows any distribution initially, with any mean and variance but fter standardization, the mean and variance of the data become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Does a standard normal distribution does not follow normal distribution?
â Akash Dubey
Sep 3 at 10:24
10
"I also know that a random variable with mean 0 and var 1 follows standard normal distribution". This sentence is wrong. There are many different examples of a random variable that has mean 0 and var 1 but with a distribution that is not normal.
â Emil
Sep 3 at 10:39
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
13
down vote
accepted
The short answer: yes, you do need to worry about your data's distribution not being normal, because standardization does not transform the underlying distribution structure of the data. If $XsimmathcalN(mu, sigma^2)$ then you can transform this to a standard normal by standardizing: $Y:=(X-mu)/sigma simmathcalN(0,1)$. However, this is possible because $X$ already follows a normal distribution in the first place. If $X$ has a distribution other than normal, standardizing it in the same way as above will generally not make the data normally distributed.
A simple example of exponentially distributed data and its standardized version:
x <- rexp(5000, rate = 0.5)
y <- (x-mean(x))/sd(x)
par(mfrow = c(2,1))
hist(x, freq = FALSE, col = "blue", breaks = 100, xlim = c(min(x), quantile(x, 0.995)),
main = "Histogram of exponentially distributed data X with rate = 0.5")
hist(y, freq = FALSE, col = "yellow", breaks = 100, xlim = c(min(y), quantile(y, 0.995)),
main = "Histogram of standardized data Y = ( X-E(X) ) / StDev(X)")
Now if we check the mean and standard deviation of the original data $x$, we get
c(mean(x), sd(x))
[1] 2.044074 2.051816
whereas for the standardized data $y$, the corresponding results are
c(mean(y), sd(y))
[1] 7.136221e-17 1.000000
As you can see, the distribution of the data after standardization is decidedly not normal, even though the mean is (practically) 0 and the variance 1. In other words, if the features do not follow a normal distribution before standardization, they will not follow it after the standardization either.
I am lilltle confused here, Let our data follows any distribution initially, with any mean and variance but fter standardization, the mean and variance of the data become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Does a standard normal distribution does not follow normal distribution?
â Akash Dubey
Sep 3 at 10:24
10
"I also know that a random variable with mean 0 and var 1 follows standard normal distribution". This sentence is wrong. There are many different examples of a random variable that has mean 0 and var 1 but with a distribution that is not normal.
â Emil
Sep 3 at 10:39
add a comment |Â
up vote
13
down vote
accepted
The short answer: yes, you do need to worry about your data's distribution not being normal, because standardization does not transform the underlying distribution structure of the data. If $XsimmathcalN(mu, sigma^2)$ then you can transform this to a standard normal by standardizing: $Y:=(X-mu)/sigma simmathcalN(0,1)$. However, this is possible because $X$ already follows a normal distribution in the first place. If $X$ has a distribution other than normal, standardizing it in the same way as above will generally not make the data normally distributed.
A simple example of exponentially distributed data and its standardized version:
x <- rexp(5000, rate = 0.5)
y <- (x-mean(x))/sd(x)
par(mfrow = c(2,1))
hist(x, freq = FALSE, col = "blue", breaks = 100, xlim = c(min(x), quantile(x, 0.995)),
main = "Histogram of exponentially distributed data X with rate = 0.5")
hist(y, freq = FALSE, col = "yellow", breaks = 100, xlim = c(min(y), quantile(y, 0.995)),
main = "Histogram of standardized data Y = ( X-E(X) ) / StDev(X)")
Now if we check the mean and standard deviation of the original data $x$, we get
c(mean(x), sd(x))
[1] 2.044074 2.051816
whereas for the standardized data $y$, the corresponding results are
c(mean(y), sd(y))
[1] 7.136221e-17 1.000000
As you can see, the distribution of the data after standardization is decidedly not normal, even though the mean is (practically) 0 and the variance 1. In other words, if the features do not follow a normal distribution before standardization, they will not follow it after the standardization either.
I am lilltle confused here, Let our data follows any distribution initially, with any mean and variance but fter standardization, the mean and variance of the data become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Does a standard normal distribution does not follow normal distribution?
â Akash Dubey
Sep 3 at 10:24
10
"I also know that a random variable with mean 0 and var 1 follows standard normal distribution". This sentence is wrong. There are many different examples of a random variable that has mean 0 and var 1 but with a distribution that is not normal.
â Emil
Sep 3 at 10:39
add a comment |Â
up vote
13
down vote
accepted
up vote
13
down vote
accepted
The short answer: yes, you do need to worry about your data's distribution not being normal, because standardization does not transform the underlying distribution structure of the data. If $XsimmathcalN(mu, sigma^2)$ then you can transform this to a standard normal by standardizing: $Y:=(X-mu)/sigma simmathcalN(0,1)$. However, this is possible because $X$ already follows a normal distribution in the first place. If $X$ has a distribution other than normal, standardizing it in the same way as above will generally not make the data normally distributed.
A simple example of exponentially distributed data and its standardized version:
x <- rexp(5000, rate = 0.5)
y <- (x-mean(x))/sd(x)
par(mfrow = c(2,1))
hist(x, freq = FALSE, col = "blue", breaks = 100, xlim = c(min(x), quantile(x, 0.995)),
main = "Histogram of exponentially distributed data X with rate = 0.5")
hist(y, freq = FALSE, col = "yellow", breaks = 100, xlim = c(min(y), quantile(y, 0.995)),
main = "Histogram of standardized data Y = ( X-E(X) ) / StDev(X)")
Now if we check the mean and standard deviation of the original data $x$, we get
c(mean(x), sd(x))
[1] 2.044074 2.051816
whereas for the standardized data $y$, the corresponding results are
c(mean(y), sd(y))
[1] 7.136221e-17 1.000000
As you can see, the distribution of the data after standardization is decidedly not normal, even though the mean is (practically) 0 and the variance 1. In other words, if the features do not follow a normal distribution before standardization, they will not follow it after the standardization either.
The short answer: yes, you do need to worry about your data's distribution not being normal, because standardization does not transform the underlying distribution structure of the data. If $XsimmathcalN(mu, sigma^2)$ then you can transform this to a standard normal by standardizing: $Y:=(X-mu)/sigma simmathcalN(0,1)$. However, this is possible because $X$ already follows a normal distribution in the first place. If $X$ has a distribution other than normal, standardizing it in the same way as above will generally not make the data normally distributed.
A simple example of exponentially distributed data and its standardized version:
x <- rexp(5000, rate = 0.5)
y <- (x-mean(x))/sd(x)
par(mfrow = c(2,1))
hist(x, freq = FALSE, col = "blue", breaks = 100, xlim = c(min(x), quantile(x, 0.995)),
main = "Histogram of exponentially distributed data X with rate = 0.5")
hist(y, freq = FALSE, col = "yellow", breaks = 100, xlim = c(min(y), quantile(y, 0.995)),
main = "Histogram of standardized data Y = ( X-E(X) ) / StDev(X)")
Now if we check the mean and standard deviation of the original data $x$, we get
c(mean(x), sd(x))
[1] 2.044074 2.051816
whereas for the standardized data $y$, the corresponding results are
c(mean(y), sd(y))
[1] 7.136221e-17 1.000000
As you can see, the distribution of the data after standardization is decidedly not normal, even though the mean is (practically) 0 and the variance 1. In other words, if the features do not follow a normal distribution before standardization, they will not follow it after the standardization either.
edited Sep 3 at 18:39
answered Sep 3 at 10:15
Emil
6521512
6521512
I am lilltle confused here, Let our data follows any distribution initially, with any mean and variance but fter standardization, the mean and variance of the data become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Does a standard normal distribution does not follow normal distribution?
â Akash Dubey
Sep 3 at 10:24
10
"I also know that a random variable with mean 0 and var 1 follows standard normal distribution". This sentence is wrong. There are many different examples of a random variable that has mean 0 and var 1 but with a distribution that is not normal.
â Emil
Sep 3 at 10:39
add a comment |Â
I am lilltle confused here, Let our data follows any distribution initially, with any mean and variance but fter standardization, the mean and variance of the data become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Does a standard normal distribution does not follow normal distribution?
â Akash Dubey
Sep 3 at 10:24
10
"I also know that a random variable with mean 0 and var 1 follows standard normal distribution". This sentence is wrong. There are many different examples of a random variable that has mean 0 and var 1 but with a distribution that is not normal.
â Emil
Sep 3 at 10:39
I am lilltle confused here, Let our data follows any distribution initially, with any mean and variance but fter standardization, the mean and variance of the data become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Does a standard normal distribution does not follow normal distribution?
â Akash Dubey
Sep 3 at 10:24
I am lilltle confused here, Let our data follows any distribution initially, with any mean and variance but fter standardization, the mean and variance of the data become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Does a standard normal distribution does not follow normal distribution?
â Akash Dubey
Sep 3 at 10:24
10
10
"I also know that a random variable with mean 0 and var 1 follows standard normal distribution". This sentence is wrong. There are many different examples of a random variable that has mean 0 and var 1 but with a distribution that is not normal.
â Emil
Sep 3 at 10:39
"I also know that a random variable with mean 0 and var 1 follows standard normal distribution". This sentence is wrong. There are many different examples of a random variable that has mean 0 and var 1 but with a distribution that is not normal.
â Emil
Sep 3 at 10:39
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f365164%2fstandardization-of-non-normal-features%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
4
Your last statement is incorrect: standardization does not transform a dataset's distribution from non-normal to normal.
â Emil
Sep 3 at 9:51
@Emil After standardization, the mean and variance become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Correct me if I am wrong.
â Akash Dubey
Sep 3 at 9:57
4
Akash, think about what happens to the distribution: Subtracting the mean sets the location of the mean to $0$. Dividing by the standard deviation either compresses or stretches the distribution such that it becomes as narrow or wide as necessary for it to have a standard deviation of $1$. Where in this process did we change the shape? Why would a non-normal distribution suddenly become normal? See here for example for non-normal distributions that meet the criteria: stats.stackexchange.com/a/314003/176202
â Frans Rodenburg
Sep 3 at 10:34
1
The standard normal is a normal distribution with $mu=0$ and $sigma=1$, so to say that it is not normal makes no sense. Note that an arbitrary distribution with mean $0$ and standard deviation $1$ is not called a standard normal distribution.
â Frans Rodenburg
Sep 3 at 10:41
2
You do not have to thank people on CV, but you can show your appreciation by upvoting and accepting @Emil's answer. On a different note, if you comment on a thread, only the OP is notified. You can ping others by using @ followed by their username.
â Frans Rodenburg
Sep 3 at 10:53