Standardization of non-normal features

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
5
down vote

favorite












Suppose we have several features (e.g. $geq20$) that do not follow a Gaussian distribution. Do we have to worry about the features not following a Gaussian distribution if we apply standardization on the data?



Namely, even if the features do not follow a normal distribution initially, aren't they made to follow Gaussian distribution after standardization with mean $0$ and variance $1$?










share|cite|improve this question



















  • 4




    Your last statement is incorrect: standardization does not transform a dataset's distribution from non-normal to normal.
    – Emil
    Sep 3 at 9:51










  • @Emil After standardization, the mean and variance become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Correct me if I am wrong.
    – Akash Dubey
    Sep 3 at 9:57







  • 4




    Akash, think about what happens to the distribution: Subtracting the mean sets the location of the mean to $0$. Dividing by the standard deviation either compresses or stretches the distribution such that it becomes as narrow or wide as necessary for it to have a standard deviation of $1$. Where in this process did we change the shape? Why would a non-normal distribution suddenly become normal? See here for example for non-normal distributions that meet the criteria: stats.stackexchange.com/a/314003/176202
    – Frans Rodenburg
    Sep 3 at 10:34







  • 1




    The standard normal is a normal distribution with $mu=0$ and $sigma=1$, so to say that it is not normal makes no sense. Note that an arbitrary distribution with mean $0$ and standard deviation $1$ is not called a standard normal distribution.
    – Frans Rodenburg
    Sep 3 at 10:41






  • 2




    You do not have to thank people on CV, but you can show your appreciation by upvoting and accepting @Emil's answer. On a different note, if you comment on a thread, only the OP is notified. You can ping others by using @ followed by their username.
    – Frans Rodenburg
    Sep 3 at 10:53
















up vote
5
down vote

favorite












Suppose we have several features (e.g. $geq20$) that do not follow a Gaussian distribution. Do we have to worry about the features not following a Gaussian distribution if we apply standardization on the data?



Namely, even if the features do not follow a normal distribution initially, aren't they made to follow Gaussian distribution after standardization with mean $0$ and variance $1$?










share|cite|improve this question



















  • 4




    Your last statement is incorrect: standardization does not transform a dataset's distribution from non-normal to normal.
    – Emil
    Sep 3 at 9:51










  • @Emil After standardization, the mean and variance become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Correct me if I am wrong.
    – Akash Dubey
    Sep 3 at 9:57







  • 4




    Akash, think about what happens to the distribution: Subtracting the mean sets the location of the mean to $0$. Dividing by the standard deviation either compresses or stretches the distribution such that it becomes as narrow or wide as necessary for it to have a standard deviation of $1$. Where in this process did we change the shape? Why would a non-normal distribution suddenly become normal? See here for example for non-normal distributions that meet the criteria: stats.stackexchange.com/a/314003/176202
    – Frans Rodenburg
    Sep 3 at 10:34







  • 1




    The standard normal is a normal distribution with $mu=0$ and $sigma=1$, so to say that it is not normal makes no sense. Note that an arbitrary distribution with mean $0$ and standard deviation $1$ is not called a standard normal distribution.
    – Frans Rodenburg
    Sep 3 at 10:41






  • 2




    You do not have to thank people on CV, but you can show your appreciation by upvoting and accepting @Emil's answer. On a different note, if you comment on a thread, only the OP is notified. You can ping others by using @ followed by their username.
    – Frans Rodenburg
    Sep 3 at 10:53












up vote
5
down vote

favorite









up vote
5
down vote

favorite











Suppose we have several features (e.g. $geq20$) that do not follow a Gaussian distribution. Do we have to worry about the features not following a Gaussian distribution if we apply standardization on the data?



Namely, even if the features do not follow a normal distribution initially, aren't they made to follow Gaussian distribution after standardization with mean $0$ and variance $1$?










share|cite|improve this question















Suppose we have several features (e.g. $geq20$) that do not follow a Gaussian distribution. Do we have to worry about the features not following a Gaussian distribution if we apply standardization on the data?



Namely, even if the features do not follow a normal distribution initially, aren't they made to follow Gaussian distribution after standardization with mean $0$ and variance $1$?







normal-distribution variance standardization






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Sep 3 at 10:37









Frans Rodenburg

2,925422




2,925422










asked Sep 3 at 9:13









Akash Dubey

426




426







  • 4




    Your last statement is incorrect: standardization does not transform a dataset's distribution from non-normal to normal.
    – Emil
    Sep 3 at 9:51










  • @Emil After standardization, the mean and variance become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Correct me if I am wrong.
    – Akash Dubey
    Sep 3 at 9:57







  • 4




    Akash, think about what happens to the distribution: Subtracting the mean sets the location of the mean to $0$. Dividing by the standard deviation either compresses or stretches the distribution such that it becomes as narrow or wide as necessary for it to have a standard deviation of $1$. Where in this process did we change the shape? Why would a non-normal distribution suddenly become normal? See here for example for non-normal distributions that meet the criteria: stats.stackexchange.com/a/314003/176202
    – Frans Rodenburg
    Sep 3 at 10:34







  • 1




    The standard normal is a normal distribution with $mu=0$ and $sigma=1$, so to say that it is not normal makes no sense. Note that an arbitrary distribution with mean $0$ and standard deviation $1$ is not called a standard normal distribution.
    – Frans Rodenburg
    Sep 3 at 10:41






  • 2




    You do not have to thank people on CV, but you can show your appreciation by upvoting and accepting @Emil's answer. On a different note, if you comment on a thread, only the OP is notified. You can ping others by using @ followed by their username.
    – Frans Rodenburg
    Sep 3 at 10:53












  • 4




    Your last statement is incorrect: standardization does not transform a dataset's distribution from non-normal to normal.
    – Emil
    Sep 3 at 9:51










  • @Emil After standardization, the mean and variance become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Correct me if I am wrong.
    – Akash Dubey
    Sep 3 at 9:57







  • 4




    Akash, think about what happens to the distribution: Subtracting the mean sets the location of the mean to $0$. Dividing by the standard deviation either compresses or stretches the distribution such that it becomes as narrow or wide as necessary for it to have a standard deviation of $1$. Where in this process did we change the shape? Why would a non-normal distribution suddenly become normal? See here for example for non-normal distributions that meet the criteria: stats.stackexchange.com/a/314003/176202
    – Frans Rodenburg
    Sep 3 at 10:34







  • 1




    The standard normal is a normal distribution with $mu=0$ and $sigma=1$, so to say that it is not normal makes no sense. Note that an arbitrary distribution with mean $0$ and standard deviation $1$ is not called a standard normal distribution.
    – Frans Rodenburg
    Sep 3 at 10:41






  • 2




    You do not have to thank people on CV, but you can show your appreciation by upvoting and accepting @Emil's answer. On a different note, if you comment on a thread, only the OP is notified. You can ping others by using @ followed by their username.
    – Frans Rodenburg
    Sep 3 at 10:53







4




4




Your last statement is incorrect: standardization does not transform a dataset's distribution from non-normal to normal.
– Emil
Sep 3 at 9:51




Your last statement is incorrect: standardization does not transform a dataset's distribution from non-normal to normal.
– Emil
Sep 3 at 9:51












@Emil After standardization, the mean and variance become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Correct me if I am wrong.
– Akash Dubey
Sep 3 at 9:57





@Emil After standardization, the mean and variance become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Correct me if I am wrong.
– Akash Dubey
Sep 3 at 9:57





4




4




Akash, think about what happens to the distribution: Subtracting the mean sets the location of the mean to $0$. Dividing by the standard deviation either compresses or stretches the distribution such that it becomes as narrow or wide as necessary for it to have a standard deviation of $1$. Where in this process did we change the shape? Why would a non-normal distribution suddenly become normal? See here for example for non-normal distributions that meet the criteria: stats.stackexchange.com/a/314003/176202
– Frans Rodenburg
Sep 3 at 10:34





Akash, think about what happens to the distribution: Subtracting the mean sets the location of the mean to $0$. Dividing by the standard deviation either compresses or stretches the distribution such that it becomes as narrow or wide as necessary for it to have a standard deviation of $1$. Where in this process did we change the shape? Why would a non-normal distribution suddenly become normal? See here for example for non-normal distributions that meet the criteria: stats.stackexchange.com/a/314003/176202
– Frans Rodenburg
Sep 3 at 10:34





1




1




The standard normal is a normal distribution with $mu=0$ and $sigma=1$, so to say that it is not normal makes no sense. Note that an arbitrary distribution with mean $0$ and standard deviation $1$ is not called a standard normal distribution.
– Frans Rodenburg
Sep 3 at 10:41




The standard normal is a normal distribution with $mu=0$ and $sigma=1$, so to say that it is not normal makes no sense. Note that an arbitrary distribution with mean $0$ and standard deviation $1$ is not called a standard normal distribution.
– Frans Rodenburg
Sep 3 at 10:41




2




2




You do not have to thank people on CV, but you can show your appreciation by upvoting and accepting @Emil's answer. On a different note, if you comment on a thread, only the OP is notified. You can ping others by using @ followed by their username.
– Frans Rodenburg
Sep 3 at 10:53




You do not have to thank people on CV, but you can show your appreciation by upvoting and accepting @Emil's answer. On a different note, if you comment on a thread, only the OP is notified. You can ping others by using @ followed by their username.
– Frans Rodenburg
Sep 3 at 10:53










1 Answer
1






active

oldest

votes

















up vote
13
down vote



accepted










The short answer: yes, you do need to worry about your data's distribution not being normal, because standardization does not transform the underlying distribution structure of the data. If $XsimmathcalN(mu, sigma^2)$ then you can transform this to a standard normal by standardizing: $Y:=(X-mu)/sigma simmathcalN(0,1)$. However, this is possible because $X$ already follows a normal distribution in the first place. If $X$ has a distribution other than normal, standardizing it in the same way as above will generally not make the data normally distributed.



A simple example of exponentially distributed data and its standardized version:



x <- rexp(5000, rate = 0.5)
y <- (x-mean(x))/sd(x)
par(mfrow = c(2,1))
hist(x, freq = FALSE, col = "blue", breaks = 100, xlim = c(min(x), quantile(x, 0.995)),
main = "Histogram of exponentially distributed data X with rate = 0.5")
hist(y, freq = FALSE, col = "yellow", breaks = 100, xlim = c(min(y), quantile(y, 0.995)),
main = "Histogram of standardized data Y = ( X-E(X) ) / StDev(X)")



Now if we check the mean and standard deviation of the original data $x$, we get



c(mean(x), sd(x))
[1] 2.044074 2.051816


whereas for the standardized data $y$, the corresponding results are



c(mean(y), sd(y))
[1] 7.136221e-17 1.000000


As you can see, the distribution of the data after standardization is decidedly not normal, even though the mean is (practically) 0 and the variance 1. In other words, if the features do not follow a normal distribution before standardization, they will not follow it after the standardization either.






share|cite|improve this answer






















  • I am lilltle confused here, Let our data follows any distribution initially, with any mean and variance but fter standardization, the mean and variance of the data become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Does a standard normal distribution does not follow normal distribution?
    – Akash Dubey
    Sep 3 at 10:24






  • 10




    "I also know that a random variable with mean 0 and var 1 follows standard normal distribution". This sentence is wrong. There are many different examples of a random variable that has mean 0 and var 1 but with a distribution that is not normal.
    – Emil
    Sep 3 at 10:39










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f365164%2fstandardization-of-non-normal-features%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
13
down vote



accepted










The short answer: yes, you do need to worry about your data's distribution not being normal, because standardization does not transform the underlying distribution structure of the data. If $XsimmathcalN(mu, sigma^2)$ then you can transform this to a standard normal by standardizing: $Y:=(X-mu)/sigma simmathcalN(0,1)$. However, this is possible because $X$ already follows a normal distribution in the first place. If $X$ has a distribution other than normal, standardizing it in the same way as above will generally not make the data normally distributed.



A simple example of exponentially distributed data and its standardized version:



x <- rexp(5000, rate = 0.5)
y <- (x-mean(x))/sd(x)
par(mfrow = c(2,1))
hist(x, freq = FALSE, col = "blue", breaks = 100, xlim = c(min(x), quantile(x, 0.995)),
main = "Histogram of exponentially distributed data X with rate = 0.5")
hist(y, freq = FALSE, col = "yellow", breaks = 100, xlim = c(min(y), quantile(y, 0.995)),
main = "Histogram of standardized data Y = ( X-E(X) ) / StDev(X)")



Now if we check the mean and standard deviation of the original data $x$, we get



c(mean(x), sd(x))
[1] 2.044074 2.051816


whereas for the standardized data $y$, the corresponding results are



c(mean(y), sd(y))
[1] 7.136221e-17 1.000000


As you can see, the distribution of the data after standardization is decidedly not normal, even though the mean is (practically) 0 and the variance 1. In other words, if the features do not follow a normal distribution before standardization, they will not follow it after the standardization either.






share|cite|improve this answer






















  • I am lilltle confused here, Let our data follows any distribution initially, with any mean and variance but fter standardization, the mean and variance of the data become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Does a standard normal distribution does not follow normal distribution?
    – Akash Dubey
    Sep 3 at 10:24






  • 10




    "I also know that a random variable with mean 0 and var 1 follows standard normal distribution". This sentence is wrong. There are many different examples of a random variable that has mean 0 and var 1 but with a distribution that is not normal.
    – Emil
    Sep 3 at 10:39














up vote
13
down vote



accepted










The short answer: yes, you do need to worry about your data's distribution not being normal, because standardization does not transform the underlying distribution structure of the data. If $XsimmathcalN(mu, sigma^2)$ then you can transform this to a standard normal by standardizing: $Y:=(X-mu)/sigma simmathcalN(0,1)$. However, this is possible because $X$ already follows a normal distribution in the first place. If $X$ has a distribution other than normal, standardizing it in the same way as above will generally not make the data normally distributed.



A simple example of exponentially distributed data and its standardized version:



x <- rexp(5000, rate = 0.5)
y <- (x-mean(x))/sd(x)
par(mfrow = c(2,1))
hist(x, freq = FALSE, col = "blue", breaks = 100, xlim = c(min(x), quantile(x, 0.995)),
main = "Histogram of exponentially distributed data X with rate = 0.5")
hist(y, freq = FALSE, col = "yellow", breaks = 100, xlim = c(min(y), quantile(y, 0.995)),
main = "Histogram of standardized data Y = ( X-E(X) ) / StDev(X)")



Now if we check the mean and standard deviation of the original data $x$, we get



c(mean(x), sd(x))
[1] 2.044074 2.051816


whereas for the standardized data $y$, the corresponding results are



c(mean(y), sd(y))
[1] 7.136221e-17 1.000000


As you can see, the distribution of the data after standardization is decidedly not normal, even though the mean is (practically) 0 and the variance 1. In other words, if the features do not follow a normal distribution before standardization, they will not follow it after the standardization either.






share|cite|improve this answer






















  • I am lilltle confused here, Let our data follows any distribution initially, with any mean and variance but fter standardization, the mean and variance of the data become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Does a standard normal distribution does not follow normal distribution?
    – Akash Dubey
    Sep 3 at 10:24






  • 10




    "I also know that a random variable with mean 0 and var 1 follows standard normal distribution". This sentence is wrong. There are many different examples of a random variable that has mean 0 and var 1 but with a distribution that is not normal.
    – Emil
    Sep 3 at 10:39












up vote
13
down vote



accepted







up vote
13
down vote



accepted






The short answer: yes, you do need to worry about your data's distribution not being normal, because standardization does not transform the underlying distribution structure of the data. If $XsimmathcalN(mu, sigma^2)$ then you can transform this to a standard normal by standardizing: $Y:=(X-mu)/sigma simmathcalN(0,1)$. However, this is possible because $X$ already follows a normal distribution in the first place. If $X$ has a distribution other than normal, standardizing it in the same way as above will generally not make the data normally distributed.



A simple example of exponentially distributed data and its standardized version:



x <- rexp(5000, rate = 0.5)
y <- (x-mean(x))/sd(x)
par(mfrow = c(2,1))
hist(x, freq = FALSE, col = "blue", breaks = 100, xlim = c(min(x), quantile(x, 0.995)),
main = "Histogram of exponentially distributed data X with rate = 0.5")
hist(y, freq = FALSE, col = "yellow", breaks = 100, xlim = c(min(y), quantile(y, 0.995)),
main = "Histogram of standardized data Y = ( X-E(X) ) / StDev(X)")



Now if we check the mean and standard deviation of the original data $x$, we get



c(mean(x), sd(x))
[1] 2.044074 2.051816


whereas for the standardized data $y$, the corresponding results are



c(mean(y), sd(y))
[1] 7.136221e-17 1.000000


As you can see, the distribution of the data after standardization is decidedly not normal, even though the mean is (practically) 0 and the variance 1. In other words, if the features do not follow a normal distribution before standardization, they will not follow it after the standardization either.






share|cite|improve this answer














The short answer: yes, you do need to worry about your data's distribution not being normal, because standardization does not transform the underlying distribution structure of the data. If $XsimmathcalN(mu, sigma^2)$ then you can transform this to a standard normal by standardizing: $Y:=(X-mu)/sigma simmathcalN(0,1)$. However, this is possible because $X$ already follows a normal distribution in the first place. If $X$ has a distribution other than normal, standardizing it in the same way as above will generally not make the data normally distributed.



A simple example of exponentially distributed data and its standardized version:



x <- rexp(5000, rate = 0.5)
y <- (x-mean(x))/sd(x)
par(mfrow = c(2,1))
hist(x, freq = FALSE, col = "blue", breaks = 100, xlim = c(min(x), quantile(x, 0.995)),
main = "Histogram of exponentially distributed data X with rate = 0.5")
hist(y, freq = FALSE, col = "yellow", breaks = 100, xlim = c(min(y), quantile(y, 0.995)),
main = "Histogram of standardized data Y = ( X-E(X) ) / StDev(X)")



Now if we check the mean and standard deviation of the original data $x$, we get



c(mean(x), sd(x))
[1] 2.044074 2.051816


whereas for the standardized data $y$, the corresponding results are



c(mean(y), sd(y))
[1] 7.136221e-17 1.000000


As you can see, the distribution of the data after standardization is decidedly not normal, even though the mean is (practically) 0 and the variance 1. In other words, if the features do not follow a normal distribution before standardization, they will not follow it after the standardization either.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited Sep 3 at 18:39

























answered Sep 3 at 10:15









Emil

6521512




6521512











  • I am lilltle confused here, Let our data follows any distribution initially, with any mean and variance but fter standardization, the mean and variance of the data become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Does a standard normal distribution does not follow normal distribution?
    – Akash Dubey
    Sep 3 at 10:24






  • 10




    "I also know that a random variable with mean 0 and var 1 follows standard normal distribution". This sentence is wrong. There are many different examples of a random variable that has mean 0 and var 1 but with a distribution that is not normal.
    – Emil
    Sep 3 at 10:39
















  • I am lilltle confused here, Let our data follows any distribution initially, with any mean and variance but fter standardization, the mean and variance of the data become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Does a standard normal distribution does not follow normal distribution?
    – Akash Dubey
    Sep 3 at 10:24






  • 10




    "I also know that a random variable with mean 0 and var 1 follows standard normal distribution". This sentence is wrong. There are many different examples of a random variable that has mean 0 and var 1 but with a distribution that is not normal.
    – Emil
    Sep 3 at 10:39















I am lilltle confused here, Let our data follows any distribution initially, with any mean and variance but fter standardization, the mean and variance of the data become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Does a standard normal distribution does not follow normal distribution?
– Akash Dubey
Sep 3 at 10:24




I am lilltle confused here, Let our data follows any distribution initially, with any mean and variance but fter standardization, the mean and variance of the data become 0 and 1 respectively and i also know that a random variable with mean 0 and var 1 follows standard normal distribution. Does a standard normal distribution does not follow normal distribution?
– Akash Dubey
Sep 3 at 10:24




10




10




"I also know that a random variable with mean 0 and var 1 follows standard normal distribution". This sentence is wrong. There are many different examples of a random variable that has mean 0 and var 1 but with a distribution that is not normal.
– Emil
Sep 3 at 10:39




"I also know that a random variable with mean 0 and var 1 follows standard normal distribution". This sentence is wrong. There are many different examples of a random variable that has mean 0 and var 1 but with a distribution that is not normal.
– Emil
Sep 3 at 10:39

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f365164%2fstandardization-of-non-normal-features%23new-answer', 'question_page');

);

Post as a guest













































































這個網誌中的熱門文章

How to combine Bézier curves to a surface?

Mutual Information Always Non-negative

Why am i infinitely getting the same tweet with the Twitter Search API?