Why does Random Forest variable importance not sum to 100%?
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
1
down vote
favorite
The randomForest package in R has the importance() function to get both node impurity and mean premutation importance for variables. Why, when calculating mean permutation importance, do the results not sum to 100%?
Here's a simple reproducible example:
library(randomForest)
data(iris)
iris.rf <- randomForest(Species~., importance = TRUE, data = iris, ntrees=1000)
imp <- importance(iris.rf, type = 1)
sum_imp <- sum(imp)
sum_imp # != 100
Thanks
r random-forest importance
add a comment |Â
up vote
1
down vote
favorite
The randomForest package in R has the importance() function to get both node impurity and mean premutation importance for variables. Why, when calculating mean permutation importance, do the results not sum to 100%?
Here's a simple reproducible example:
library(randomForest)
data(iris)
iris.rf <- randomForest(Species~., importance = TRUE, data = iris, ntrees=1000)
imp <- importance(iris.rf, type = 1)
sum_imp <- sum(imp)
sum_imp # != 100
Thanks
r random-forest importance
1
Why do you assume it should sum to 1? I see no reason for that belief.
â Firebug
Sep 8 at 18:31
See Measures of variable importance in random forests
â Firebug
Sep 8 at 21:22
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
The randomForest package in R has the importance() function to get both node impurity and mean premutation importance for variables. Why, when calculating mean permutation importance, do the results not sum to 100%?
Here's a simple reproducible example:
library(randomForest)
data(iris)
iris.rf <- randomForest(Species~., importance = TRUE, data = iris, ntrees=1000)
imp <- importance(iris.rf, type = 1)
sum_imp <- sum(imp)
sum_imp # != 100
Thanks
r random-forest importance
The randomForest package in R has the importance() function to get both node impurity and mean premutation importance for variables. Why, when calculating mean permutation importance, do the results not sum to 100%?
Here's a simple reproducible example:
library(randomForest)
data(iris)
iris.rf <- randomForest(Species~., importance = TRUE, data = iris, ntrees=1000)
imp <- importance(iris.rf, type = 1)
sum_imp <- sum(imp)
sum_imp # != 100
Thanks
r random-forest importance
r random-forest importance
asked Sep 8 at 14:02
Micha
1083
1083
1
Why do you assume it should sum to 1? I see no reason for that belief.
â Firebug
Sep 8 at 18:31
See Measures of variable importance in random forests
â Firebug
Sep 8 at 21:22
add a comment |Â
1
Why do you assume it should sum to 1? I see no reason for that belief.
â Firebug
Sep 8 at 18:31
See Measures of variable importance in random forests
â Firebug
Sep 8 at 21:22
1
1
Why do you assume it should sum to 1? I see no reason for that belief.
â Firebug
Sep 8 at 18:31
Why do you assume it should sum to 1? I see no reason for that belief.
â Firebug
Sep 8 at 18:31
See Measures of variable importance in random forests
â Firebug
Sep 8 at 21:22
See Measures of variable importance in random forests
â Firebug
Sep 8 at 21:22
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
4
down vote
accepted
As far as I can tell, variable importance is measuring either: a) the percentage that the prediction error increases when the variable is removed, or b) the change in the purity of each node when the variable is removed. (Averaged over all trees in the forest.) Neither of these is a probability, so there's no reason they should add up to 100%.
You can, of course, divide by the sum of all importances to get a percentage, but I think that would create confusion: you now have a percentage of what exactly?
(Welcome to the site!)
Thanks for the welcome. I expect I'll be back :-)
â Micha
Sep 9 at 6:50
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
accepted
As far as I can tell, variable importance is measuring either: a) the percentage that the prediction error increases when the variable is removed, or b) the change in the purity of each node when the variable is removed. (Averaged over all trees in the forest.) Neither of these is a probability, so there's no reason they should add up to 100%.
You can, of course, divide by the sum of all importances to get a percentage, but I think that would create confusion: you now have a percentage of what exactly?
(Welcome to the site!)
Thanks for the welcome. I expect I'll be back :-)
â Micha
Sep 9 at 6:50
add a comment |Â
up vote
4
down vote
accepted
As far as I can tell, variable importance is measuring either: a) the percentage that the prediction error increases when the variable is removed, or b) the change in the purity of each node when the variable is removed. (Averaged over all trees in the forest.) Neither of these is a probability, so there's no reason they should add up to 100%.
You can, of course, divide by the sum of all importances to get a percentage, but I think that would create confusion: you now have a percentage of what exactly?
(Welcome to the site!)
Thanks for the welcome. I expect I'll be back :-)
â Micha
Sep 9 at 6:50
add a comment |Â
up vote
4
down vote
accepted
up vote
4
down vote
accepted
As far as I can tell, variable importance is measuring either: a) the percentage that the prediction error increases when the variable is removed, or b) the change in the purity of each node when the variable is removed. (Averaged over all trees in the forest.) Neither of these is a probability, so there's no reason they should add up to 100%.
You can, of course, divide by the sum of all importances to get a percentage, but I think that would create confusion: you now have a percentage of what exactly?
(Welcome to the site!)
As far as I can tell, variable importance is measuring either: a) the percentage that the prediction error increases when the variable is removed, or b) the change in the purity of each node when the variable is removed. (Averaged over all trees in the forest.) Neither of these is a probability, so there's no reason they should add up to 100%.
You can, of course, divide by the sum of all importances to get a percentage, but I think that would create confusion: you now have a percentage of what exactly?
(Welcome to the site!)
answered Sep 8 at 14:43
Wayne
15.5k13572
15.5k13572
Thanks for the welcome. I expect I'll be back :-)
â Micha
Sep 9 at 6:50
add a comment |Â
Thanks for the welcome. I expect I'll be back :-)
â Micha
Sep 9 at 6:50
Thanks for the welcome. I expect I'll be back :-)
â Micha
Sep 9 at 6:50
Thanks for the welcome. I expect I'll be back :-)
â Micha
Sep 9 at 6:50
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f365955%2fwhy-does-random-forest-variable-importance-not-sum-to-100%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
Why do you assume it should sum to 1? I see no reason for that belief.
â Firebug
Sep 8 at 18:31
See Measures of variable importance in random forests
â Firebug
Sep 8 at 21:22