What are some situations when normalizing input data to zero mean, unit variance is not appropriate or not beneficial?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
7
down vote

favorite
1












I have seen normalization of input data to zero mean, unit variance many times in machine learning. Is this a good practice to be done all the time or are there times when it is not appropriate or not beneficial?










share|improve this question

























    up vote
    7
    down vote

    favorite
    1












    I have seen normalization of input data to zero mean, unit variance many times in machine learning. Is this a good practice to be done all the time or are there times when it is not appropriate or not beneficial?










    share|improve this question























      up vote
      7
      down vote

      favorite
      1









      up vote
      7
      down vote

      favorite
      1






      1





      I have seen normalization of input data to zero mean, unit variance many times in machine learning. Is this a good practice to be done all the time or are there times when it is not appropriate or not beneficial?










      share|improve this question













      I have seen normalization of input data to zero mean, unit variance many times in machine learning. Is this a good practice to be done all the time or are there times when it is not appropriate or not beneficial?







      machine-learning feature-scaling normalization






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Sep 3 at 6:02









      user781486

      2627




      2627




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          6
          down vote



          accepted










          A detailed answer to the question can be found here.




          [...]are there times when it is not appropriate or not beneficial?




          Short answer: Yes and No. Yes in the terms, that it can significantly change your output of e.g. clustering algorithms. No, on the other hand, if these changes are what you want to achieve. Or to put it in the words of the author of the mentioned source:




          Scaling features for clustering algorithms can substantially change the outcome. Imagine four clusters around the origin, each one in a different quadrant, all nicely scaled. Now, imagine the y-axis being stretched to ten times the length of the the x-axis. instead of four little quadrant-clusters, you're going to get the long squashed baguette of data chopped into four pieces along its length! (And, the important part is, you might prefer either of these!)




          The take-home-message of this is: always think carefully about what you want to achieve and what kind of data your algorithms prefer - it does matter!






          share|improve this answer






















          • PCA would, by the way, be one of the algorithms that do not want to be operated without normalization - just to highlight the other side of the story.
            – André
            Sep 3 at 8:42










          Your Answer




          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f37734%2fwhat-are-some-situations-when-normalizing-input-data-to-zero-mean-unit-variance%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          6
          down vote



          accepted










          A detailed answer to the question can be found here.




          [...]are there times when it is not appropriate or not beneficial?




          Short answer: Yes and No. Yes in the terms, that it can significantly change your output of e.g. clustering algorithms. No, on the other hand, if these changes are what you want to achieve. Or to put it in the words of the author of the mentioned source:




          Scaling features for clustering algorithms can substantially change the outcome. Imagine four clusters around the origin, each one in a different quadrant, all nicely scaled. Now, imagine the y-axis being stretched to ten times the length of the the x-axis. instead of four little quadrant-clusters, you're going to get the long squashed baguette of data chopped into four pieces along its length! (And, the important part is, you might prefer either of these!)




          The take-home-message of this is: always think carefully about what you want to achieve and what kind of data your algorithms prefer - it does matter!






          share|improve this answer






















          • PCA would, by the way, be one of the algorithms that do not want to be operated without normalization - just to highlight the other side of the story.
            – André
            Sep 3 at 8:42














          up vote
          6
          down vote



          accepted










          A detailed answer to the question can be found here.




          [...]are there times when it is not appropriate or not beneficial?




          Short answer: Yes and No. Yes in the terms, that it can significantly change your output of e.g. clustering algorithms. No, on the other hand, if these changes are what you want to achieve. Or to put it in the words of the author of the mentioned source:




          Scaling features for clustering algorithms can substantially change the outcome. Imagine four clusters around the origin, each one in a different quadrant, all nicely scaled. Now, imagine the y-axis being stretched to ten times the length of the the x-axis. instead of four little quadrant-clusters, you're going to get the long squashed baguette of data chopped into four pieces along its length! (And, the important part is, you might prefer either of these!)




          The take-home-message of this is: always think carefully about what you want to achieve and what kind of data your algorithms prefer - it does matter!






          share|improve this answer






















          • PCA would, by the way, be one of the algorithms that do not want to be operated without normalization - just to highlight the other side of the story.
            – André
            Sep 3 at 8:42












          up vote
          6
          down vote



          accepted







          up vote
          6
          down vote



          accepted






          A detailed answer to the question can be found here.




          [...]are there times when it is not appropriate or not beneficial?




          Short answer: Yes and No. Yes in the terms, that it can significantly change your output of e.g. clustering algorithms. No, on the other hand, if these changes are what you want to achieve. Or to put it in the words of the author of the mentioned source:




          Scaling features for clustering algorithms can substantially change the outcome. Imagine four clusters around the origin, each one in a different quadrant, all nicely scaled. Now, imagine the y-axis being stretched to ten times the length of the the x-axis. instead of four little quadrant-clusters, you're going to get the long squashed baguette of data chopped into four pieces along its length! (And, the important part is, you might prefer either of these!)




          The take-home-message of this is: always think carefully about what you want to achieve and what kind of data your algorithms prefer - it does matter!






          share|improve this answer














          A detailed answer to the question can be found here.




          [...]are there times when it is not appropriate or not beneficial?




          Short answer: Yes and No. Yes in the terms, that it can significantly change your output of e.g. clustering algorithms. No, on the other hand, if these changes are what you want to achieve. Or to put it in the words of the author of the mentioned source:




          Scaling features for clustering algorithms can substantially change the outcome. Imagine four clusters around the origin, each one in a different quadrant, all nicely scaled. Now, imagine the y-axis being stretched to ten times the length of the the x-axis. instead of four little quadrant-clusters, you're going to get the long squashed baguette of data chopped into four pieces along its length! (And, the important part is, you might prefer either of these!)




          The take-home-message of this is: always think carefully about what you want to achieve and what kind of data your algorithms prefer - it does matter!







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Sep 5 at 17:37

























          answered Sep 3 at 8:40









          André

          3508




          3508











          • PCA would, by the way, be one of the algorithms that do not want to be operated without normalization - just to highlight the other side of the story.
            – André
            Sep 3 at 8:42
















          • PCA would, by the way, be one of the algorithms that do not want to be operated without normalization - just to highlight the other side of the story.
            – André
            Sep 3 at 8:42















          PCA would, by the way, be one of the algorithms that do not want to be operated without normalization - just to highlight the other side of the story.
          – André
          Sep 3 at 8:42




          PCA would, by the way, be one of the algorithms that do not want to be operated without normalization - just to highlight the other side of the story.
          – André
          Sep 3 at 8:42

















           

          draft saved


          draft discarded















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f37734%2fwhat-are-some-situations-when-normalizing-input-data-to-zero-mean-unit-variance%23new-answer', 'question_page');

          );

          Post as a guest













































































          這個網誌中的熱門文章

          How to combine Bézier curves to a surface?

          Mutual Information Always Non-negative

          Why am i infinitely getting the same tweet with the Twitter Search API?