Interpretation of the given box and whisker plot

Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I wish to understand whether i have interpreted below box & whisker plot correctly; this will also assert my understanding on the same. (I am learning basic statistics & measure of dispersion)
Box & Whisker Plot:

Lets say the number line represents age of students then following is my interpretation.
- Students age group is 2-9
- There are more students with age 6-7 & 7-8.5
- The average student age is 7
- Since each group (Least-Q1, Q1-Q2, Q2-Q3 & Q3-Greatest) in box and whisker plot is roughly equally divided; thus the smallest looking group would be more denser or less variable. So does that mean in above example (Q3-Greatest) group contains most students of aged 8.5-9; so its densest of all and less variable ?
Is my above understanding correct ? Also what other interpretations can i make ?
statistics data-analysis
add a comment |Â
up vote
0
down vote
favorite
I wish to understand whether i have interpreted below box & whisker plot correctly; this will also assert my understanding on the same. (I am learning basic statistics & measure of dispersion)
Box & Whisker Plot:

Lets say the number line represents age of students then following is my interpretation.
- Students age group is 2-9
- There are more students with age 6-7 & 7-8.5
- The average student age is 7
- Since each group (Least-Q1, Q1-Q2, Q2-Q3 & Q3-Greatest) in box and whisker plot is roughly equally divided; thus the smallest looking group would be more denser or less variable. So does that mean in above example (Q3-Greatest) group contains most students of aged 8.5-9; so its densest of all and less variable ?
Is my above understanding correct ? Also what other interpretations can i make ?
statistics data-analysis
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I wish to understand whether i have interpreted below box & whisker plot correctly; this will also assert my understanding on the same. (I am learning basic statistics & measure of dispersion)
Box & Whisker Plot:

Lets say the number line represents age of students then following is my interpretation.
- Students age group is 2-9
- There are more students with age 6-7 & 7-8.5
- The average student age is 7
- Since each group (Least-Q1, Q1-Q2, Q2-Q3 & Q3-Greatest) in box and whisker plot is roughly equally divided; thus the smallest looking group would be more denser or less variable. So does that mean in above example (Q3-Greatest) group contains most students of aged 8.5-9; so its densest of all and less variable ?
Is my above understanding correct ? Also what other interpretations can i make ?
statistics data-analysis
I wish to understand whether i have interpreted below box & whisker plot correctly; this will also assert my understanding on the same. (I am learning basic statistics & measure of dispersion)
Box & Whisker Plot:

Lets say the number line represents age of students then following is my interpretation.
- Students age group is 2-9
- There are more students with age 6-7 & 7-8.5
- The average student age is 7
- Since each group (Least-Q1, Q1-Q2, Q2-Q3 & Q3-Greatest) in box and whisker plot is roughly equally divided; thus the smallest looking group would be more denser or less variable. So does that mean in above example (Q3-Greatest) group contains most students of aged 8.5-9; so its densest of all and less variable ?
Is my above understanding correct ? Also what other interpretations can i make ?
statistics data-analysis
statistics data-analysis
edited Sep 8 at 7:29
StubbornAtom
4,10711136
4,10711136
asked Sep 8 at 4:51
linux user
62
62
add a comment |Â
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
0
down vote
accepted
Students age group is 2-9
Yes. 2 is the minimum age observed in the sample and 9 is the maximum age.
There are more students with age 6-7 & 7-8.5
Not exactly. Half of the children in the sample have ages represented within
the 'box'; that is between 6 and 8.5. Roughly speaking, a quarter of the students are under 6 yrs old, a quarter of them are from 6 to 7 yrs old, a quarter are between 7 and 8.5 years old, and a quarter are older than 8.5 years.
The average student age is 7
More precisely, the median age is 7. (Less than half are below 7 and less than half are above 7.)
Since each group (Least-Q1, Q1-Q2, Q2-Q3 & Q3-Greatest) in box and whisker plot is roughly equally divided; thus the smallest looking group would be more denser or less variable. So does that mean in above example (Q3-Greatest) group contains most students of aged 8.5-9; so its densest of all and less variable ?
I don't think it is useful to use a boxplot to talk about 'density' with any precision. Certainly, it is true that about 3/4 of the students are concentrated within years 6 and 9 yrs of age (a span of 3-4 years, depending how you view age), while only 1/4
are in the longer span of years from 2 to 6. But a histogram is a better graphical device for showing 'densities'.
Note: A boxplot gives no information about how many students are in the sample.
It is best to use boxplots only for samples larger than a dozen or so. The mechanism of making a boxplot depends on finding three numbers which cut sorted observations into four approximately equal parts. [They are the lower
quartile $(Q_1)$ left end of the box, Median, heavy line within the box, and $(Q_3)$ right end of box.] If you have a sample of only seven observations, it
is difficult to know how to divide them into four approximately equal 'chunks'.
Here is a histogram of a (fake) dataset of 40 ages that might have made your boxplot. A histogram is based on area: notice that each student
is represented by one 'brick' of area within his or her bar of the histogram.
The tick marks beneath the histogram show 'exact' ages of the
students (e.g, to the nearest number of weeks). At the resolution of this graph, tick marks for 2 or more students of
very nearly the same age may appear as one mark.

Addendum: A comment expressed interest in means, medians, and modes of skewed distributions.
Here are samples from two distributions: The first
is $mathsfGamma(shape=2, rate=1/20)$ It is a right-skewed distribution with mode 20, median 33.37,
and mean 40. A sample of size $n = 100$ has the following summary
statistics:
summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.441 19.121 33.433 40.629 49.972 203.525
The sample mean and median are similar to the population mean and
median. There is no formal mode because no two observations are
exactly the same, but one might say that the modal interval of the histogram
(lower-left in the figure below) is $(20, 40].$
The second distribution is $mathsfBeta(2, 1)$ It has a left-skewed
distribution with mode 1, median 0.7071, and mean 2/3. A sample of size $n = 100$ has the following summary
statistics:
summary(y)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.08611 0.49792 0.71515 0.67491 0.87883 0.99579
Again here, the sample mean and median closely imitate the population
mean and mean, respectively. The modal histogram interval is $(0.9, 1.0].$
The figure below shows the gamma distribution at left and the
beta distribution at right. The tick marks below the histogram show
the locations of individual points. The curves are the density
functions of the respective distributions.

set.seed(1234)
par(mfcol=c(2,2))
x = rgamma(100, 2, .05)
boxplot(x, horizontal=T, col="skyblue2")
hist(x, prob=T, col="skyblue2"); rug(x)
curve(dgamma(x, 2, .05), add=T, lwd=2)
y = rbeta(100, 2, 1)
boxplot(y, horizontal=T, col="skyblue2")
hist(y, prob=T, col="skyblue2"); rug(y)
curve(dbeta(x, 2, 1), add=T, lwd=2)
par(mfrow=c(1,1))
Note to @linuxuser: If your textbook does not discuss gamma and beta distributions, you can
read about them in Wikipedia. Both families of distributions are widely used in applied probability modeling. [Roughly speaking, the gamma function $Gamma(cdot),$ used to define the density functions, is
a continuous version of the factorial function, filling in values
for non-integers. For positive integer $k$, we have $Gamma(k) = (k-1)!;$ for example $Gamma(5) = 4! = 24.$]
I am beginner in statistics, sir please excuse me if i ask some silly questions.I don't think it is useful to use a boxplot to talk about 'density' with any precision: IMO w.r.t density you will always wish to have an idea/approximately know which part of the population is more denser and for that box plot is enough to tell. Why would you want to quantify the density ever ?
â linux user
Sep 13 at 3:09
Both boxplots and histograms are useful, but it is helpful to unserstand their relative advantages and disadvantages.
â BruceET
Sep 13 at 5:11
add a comment |Â
up vote
0
down vote
Students age group is 2-9
There are more students with age 6-7 & 7-8.5
- The average student age is 7
- Since each group (Least-Q1, Q1-Q2, Q2-Q3 & Q3-Greatest) in box and whisker plot is roughly equally divided; thus the smallest looking group would be more denser or less variable. So does that mean in above example (Q3-Greatest) group contains most students of aged 8.5-9; so its densest of all and less variable ?
The point $1$ is correct.
Note that the point 2 contradicts the point 4: each invertal is roughly $25%$ of the data, so $Q_1$-$Q_3$ is roughtly $50%$ of the data. Also, the statement is not complete: "more students with age 6-7 & 7-8.5" than which group? Do you mean more students compared with other specific interval or in general?
In the point $3$, the word "average" is ambiguous, as there are three types of averages: mean, median and mode. Here $Q_2$ is the median. Depending on the shape of distribution (there can be three types: positively-skewed, negatively-skewed, symmetric), you can have different relationships of the mean, median and mode (usually, mode$<$median$<$mean, mean$<$median$<$mode, mean$approx$median$approx$mode, respectively, however, for symmetric not always). The data looks negatively-skewed, because $75%$ data are in the interval $6$-$9$ against $25%$ in $2$-$6$, which implies the data (ages, basically the number of students) is more densely situated in the interval $6$-$9$. Consequently, you can say the data is less variable (closely situated) in the interval $6$-$9$ compared with the interval $2$-$6$.
I believe you mean "positively-skewed, negatively-skewed, and symmetrical." There are many non-normal, but symmetrical distribution families (including Cauchy and Laplace).
â BruceET
Sep 11 at 0:35
@BruceET, thank you for helpful comment, yes, also uniform, bimodal, etc. I wanted to show the relationship of the three averages, which may not hold for symmetric. Though, the comparison may not hold for others too. Sometimes, median$<$mean$<$mode in negatively-skewed.
â farruhota
Sep 11 at 3:22
For positive skewness it's usually mode < median < mean, and for negative skewness it's usually mean < median < mode. (Not always, you can rig distributions to get most anything.) But boxplots show only median, so it's not clear to me how they fit into the picture. Maybe histograms would work better.
â BruceET
Sep 11 at 3:35
See addendum to my Answer.
â BruceET
Sep 11 at 4:07
@BruceET, yes, by "respectively" I meant the cases for mean, median and mean of different distributions. The box plot also indicates the shape of distribution, no? In this case, the mass is shifted to the right, so it is negatively-skewed, so mean$<$median$<$mode. Thank you, indeed.
â farruhota
Sep 11 at 6:07
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
Students age group is 2-9
Yes. 2 is the minimum age observed in the sample and 9 is the maximum age.
There are more students with age 6-7 & 7-8.5
Not exactly. Half of the children in the sample have ages represented within
the 'box'; that is between 6 and 8.5. Roughly speaking, a quarter of the students are under 6 yrs old, a quarter of them are from 6 to 7 yrs old, a quarter are between 7 and 8.5 years old, and a quarter are older than 8.5 years.
The average student age is 7
More precisely, the median age is 7. (Less than half are below 7 and less than half are above 7.)
Since each group (Least-Q1, Q1-Q2, Q2-Q3 & Q3-Greatest) in box and whisker plot is roughly equally divided; thus the smallest looking group would be more denser or less variable. So does that mean in above example (Q3-Greatest) group contains most students of aged 8.5-9; so its densest of all and less variable ?
I don't think it is useful to use a boxplot to talk about 'density' with any precision. Certainly, it is true that about 3/4 of the students are concentrated within years 6 and 9 yrs of age (a span of 3-4 years, depending how you view age), while only 1/4
are in the longer span of years from 2 to 6. But a histogram is a better graphical device for showing 'densities'.
Note: A boxplot gives no information about how many students are in the sample.
It is best to use boxplots only for samples larger than a dozen or so. The mechanism of making a boxplot depends on finding three numbers which cut sorted observations into four approximately equal parts. [They are the lower
quartile $(Q_1)$ left end of the box, Median, heavy line within the box, and $(Q_3)$ right end of box.] If you have a sample of only seven observations, it
is difficult to know how to divide them into four approximately equal 'chunks'.
Here is a histogram of a (fake) dataset of 40 ages that might have made your boxplot. A histogram is based on area: notice that each student
is represented by one 'brick' of area within his or her bar of the histogram.
The tick marks beneath the histogram show 'exact' ages of the
students (e.g, to the nearest number of weeks). At the resolution of this graph, tick marks for 2 or more students of
very nearly the same age may appear as one mark.

Addendum: A comment expressed interest in means, medians, and modes of skewed distributions.
Here are samples from two distributions: The first
is $mathsfGamma(shape=2, rate=1/20)$ It is a right-skewed distribution with mode 20, median 33.37,
and mean 40. A sample of size $n = 100$ has the following summary
statistics:
summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.441 19.121 33.433 40.629 49.972 203.525
The sample mean and median are similar to the population mean and
median. There is no formal mode because no two observations are
exactly the same, but one might say that the modal interval of the histogram
(lower-left in the figure below) is $(20, 40].$
The second distribution is $mathsfBeta(2, 1)$ It has a left-skewed
distribution with mode 1, median 0.7071, and mean 2/3. A sample of size $n = 100$ has the following summary
statistics:
summary(y)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.08611 0.49792 0.71515 0.67491 0.87883 0.99579
Again here, the sample mean and median closely imitate the population
mean and mean, respectively. The modal histogram interval is $(0.9, 1.0].$
The figure below shows the gamma distribution at left and the
beta distribution at right. The tick marks below the histogram show
the locations of individual points. The curves are the density
functions of the respective distributions.

set.seed(1234)
par(mfcol=c(2,2))
x = rgamma(100, 2, .05)
boxplot(x, horizontal=T, col="skyblue2")
hist(x, prob=T, col="skyblue2"); rug(x)
curve(dgamma(x, 2, .05), add=T, lwd=2)
y = rbeta(100, 2, 1)
boxplot(y, horizontal=T, col="skyblue2")
hist(y, prob=T, col="skyblue2"); rug(y)
curve(dbeta(x, 2, 1), add=T, lwd=2)
par(mfrow=c(1,1))
Note to @linuxuser: If your textbook does not discuss gamma and beta distributions, you can
read about them in Wikipedia. Both families of distributions are widely used in applied probability modeling. [Roughly speaking, the gamma function $Gamma(cdot),$ used to define the density functions, is
a continuous version of the factorial function, filling in values
for non-integers. For positive integer $k$, we have $Gamma(k) = (k-1)!;$ for example $Gamma(5) = 4! = 24.$]
I am beginner in statistics, sir please excuse me if i ask some silly questions.I don't think it is useful to use a boxplot to talk about 'density' with any precision: IMO w.r.t density you will always wish to have an idea/approximately know which part of the population is more denser and for that box plot is enough to tell. Why would you want to quantify the density ever ?
â linux user
Sep 13 at 3:09
Both boxplots and histograms are useful, but it is helpful to unserstand their relative advantages and disadvantages.
â BruceET
Sep 13 at 5:11
add a comment |Â
up vote
0
down vote
accepted
Students age group is 2-9
Yes. 2 is the minimum age observed in the sample and 9 is the maximum age.
There are more students with age 6-7 & 7-8.5
Not exactly. Half of the children in the sample have ages represented within
the 'box'; that is between 6 and 8.5. Roughly speaking, a quarter of the students are under 6 yrs old, a quarter of them are from 6 to 7 yrs old, a quarter are between 7 and 8.5 years old, and a quarter are older than 8.5 years.
The average student age is 7
More precisely, the median age is 7. (Less than half are below 7 and less than half are above 7.)
Since each group (Least-Q1, Q1-Q2, Q2-Q3 & Q3-Greatest) in box and whisker plot is roughly equally divided; thus the smallest looking group would be more denser or less variable. So does that mean in above example (Q3-Greatest) group contains most students of aged 8.5-9; so its densest of all and less variable ?
I don't think it is useful to use a boxplot to talk about 'density' with any precision. Certainly, it is true that about 3/4 of the students are concentrated within years 6 and 9 yrs of age (a span of 3-4 years, depending how you view age), while only 1/4
are in the longer span of years from 2 to 6. But a histogram is a better graphical device for showing 'densities'.
Note: A boxplot gives no information about how many students are in the sample.
It is best to use boxplots only for samples larger than a dozen or so. The mechanism of making a boxplot depends on finding three numbers which cut sorted observations into four approximately equal parts. [They are the lower
quartile $(Q_1)$ left end of the box, Median, heavy line within the box, and $(Q_3)$ right end of box.] If you have a sample of only seven observations, it
is difficult to know how to divide them into four approximately equal 'chunks'.
Here is a histogram of a (fake) dataset of 40 ages that might have made your boxplot. A histogram is based on area: notice that each student
is represented by one 'brick' of area within his or her bar of the histogram.
The tick marks beneath the histogram show 'exact' ages of the
students (e.g, to the nearest number of weeks). At the resolution of this graph, tick marks for 2 or more students of
very nearly the same age may appear as one mark.

Addendum: A comment expressed interest in means, medians, and modes of skewed distributions.
Here are samples from two distributions: The first
is $mathsfGamma(shape=2, rate=1/20)$ It is a right-skewed distribution with mode 20, median 33.37,
and mean 40. A sample of size $n = 100$ has the following summary
statistics:
summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.441 19.121 33.433 40.629 49.972 203.525
The sample mean and median are similar to the population mean and
median. There is no formal mode because no two observations are
exactly the same, but one might say that the modal interval of the histogram
(lower-left in the figure below) is $(20, 40].$
The second distribution is $mathsfBeta(2, 1)$ It has a left-skewed
distribution with mode 1, median 0.7071, and mean 2/3. A sample of size $n = 100$ has the following summary
statistics:
summary(y)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.08611 0.49792 0.71515 0.67491 0.87883 0.99579
Again here, the sample mean and median closely imitate the population
mean and mean, respectively. The modal histogram interval is $(0.9, 1.0].$
The figure below shows the gamma distribution at left and the
beta distribution at right. The tick marks below the histogram show
the locations of individual points. The curves are the density
functions of the respective distributions.

set.seed(1234)
par(mfcol=c(2,2))
x = rgamma(100, 2, .05)
boxplot(x, horizontal=T, col="skyblue2")
hist(x, prob=T, col="skyblue2"); rug(x)
curve(dgamma(x, 2, .05), add=T, lwd=2)
y = rbeta(100, 2, 1)
boxplot(y, horizontal=T, col="skyblue2")
hist(y, prob=T, col="skyblue2"); rug(y)
curve(dbeta(x, 2, 1), add=T, lwd=2)
par(mfrow=c(1,1))
Note to @linuxuser: If your textbook does not discuss gamma and beta distributions, you can
read about them in Wikipedia. Both families of distributions are widely used in applied probability modeling. [Roughly speaking, the gamma function $Gamma(cdot),$ used to define the density functions, is
a continuous version of the factorial function, filling in values
for non-integers. For positive integer $k$, we have $Gamma(k) = (k-1)!;$ for example $Gamma(5) = 4! = 24.$]
I am beginner in statistics, sir please excuse me if i ask some silly questions.I don't think it is useful to use a boxplot to talk about 'density' with any precision: IMO w.r.t density you will always wish to have an idea/approximately know which part of the population is more denser and for that box plot is enough to tell. Why would you want to quantify the density ever ?
â linux user
Sep 13 at 3:09
Both boxplots and histograms are useful, but it is helpful to unserstand their relative advantages and disadvantages.
â BruceET
Sep 13 at 5:11
add a comment |Â
up vote
0
down vote
accepted
up vote
0
down vote
accepted
Students age group is 2-9
Yes. 2 is the minimum age observed in the sample and 9 is the maximum age.
There are more students with age 6-7 & 7-8.5
Not exactly. Half of the children in the sample have ages represented within
the 'box'; that is between 6 and 8.5. Roughly speaking, a quarter of the students are under 6 yrs old, a quarter of them are from 6 to 7 yrs old, a quarter are between 7 and 8.5 years old, and a quarter are older than 8.5 years.
The average student age is 7
More precisely, the median age is 7. (Less than half are below 7 and less than half are above 7.)
Since each group (Least-Q1, Q1-Q2, Q2-Q3 & Q3-Greatest) in box and whisker plot is roughly equally divided; thus the smallest looking group would be more denser or less variable. So does that mean in above example (Q3-Greatest) group contains most students of aged 8.5-9; so its densest of all and less variable ?
I don't think it is useful to use a boxplot to talk about 'density' with any precision. Certainly, it is true that about 3/4 of the students are concentrated within years 6 and 9 yrs of age (a span of 3-4 years, depending how you view age), while only 1/4
are in the longer span of years from 2 to 6. But a histogram is a better graphical device for showing 'densities'.
Note: A boxplot gives no information about how many students are in the sample.
It is best to use boxplots only for samples larger than a dozen or so. The mechanism of making a boxplot depends on finding three numbers which cut sorted observations into four approximately equal parts. [They are the lower
quartile $(Q_1)$ left end of the box, Median, heavy line within the box, and $(Q_3)$ right end of box.] If you have a sample of only seven observations, it
is difficult to know how to divide them into four approximately equal 'chunks'.
Here is a histogram of a (fake) dataset of 40 ages that might have made your boxplot. A histogram is based on area: notice that each student
is represented by one 'brick' of area within his or her bar of the histogram.
The tick marks beneath the histogram show 'exact' ages of the
students (e.g, to the nearest number of weeks). At the resolution of this graph, tick marks for 2 or more students of
very nearly the same age may appear as one mark.

Addendum: A comment expressed interest in means, medians, and modes of skewed distributions.
Here are samples from two distributions: The first
is $mathsfGamma(shape=2, rate=1/20)$ It is a right-skewed distribution with mode 20, median 33.37,
and mean 40. A sample of size $n = 100$ has the following summary
statistics:
summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.441 19.121 33.433 40.629 49.972 203.525
The sample mean and median are similar to the population mean and
median. There is no formal mode because no two observations are
exactly the same, but one might say that the modal interval of the histogram
(lower-left in the figure below) is $(20, 40].$
The second distribution is $mathsfBeta(2, 1)$ It has a left-skewed
distribution with mode 1, median 0.7071, and mean 2/3. A sample of size $n = 100$ has the following summary
statistics:
summary(y)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.08611 0.49792 0.71515 0.67491 0.87883 0.99579
Again here, the sample mean and median closely imitate the population
mean and mean, respectively. The modal histogram interval is $(0.9, 1.0].$
The figure below shows the gamma distribution at left and the
beta distribution at right. The tick marks below the histogram show
the locations of individual points. The curves are the density
functions of the respective distributions.

set.seed(1234)
par(mfcol=c(2,2))
x = rgamma(100, 2, .05)
boxplot(x, horizontal=T, col="skyblue2")
hist(x, prob=T, col="skyblue2"); rug(x)
curve(dgamma(x, 2, .05), add=T, lwd=2)
y = rbeta(100, 2, 1)
boxplot(y, horizontal=T, col="skyblue2")
hist(y, prob=T, col="skyblue2"); rug(y)
curve(dbeta(x, 2, 1), add=T, lwd=2)
par(mfrow=c(1,1))
Note to @linuxuser: If your textbook does not discuss gamma and beta distributions, you can
read about them in Wikipedia. Both families of distributions are widely used in applied probability modeling. [Roughly speaking, the gamma function $Gamma(cdot),$ used to define the density functions, is
a continuous version of the factorial function, filling in values
for non-integers. For positive integer $k$, we have $Gamma(k) = (k-1)!;$ for example $Gamma(5) = 4! = 24.$]
Students age group is 2-9
Yes. 2 is the minimum age observed in the sample and 9 is the maximum age.
There are more students with age 6-7 & 7-8.5
Not exactly. Half of the children in the sample have ages represented within
the 'box'; that is between 6 and 8.5. Roughly speaking, a quarter of the students are under 6 yrs old, a quarter of them are from 6 to 7 yrs old, a quarter are between 7 and 8.5 years old, and a quarter are older than 8.5 years.
The average student age is 7
More precisely, the median age is 7. (Less than half are below 7 and less than half are above 7.)
Since each group (Least-Q1, Q1-Q2, Q2-Q3 & Q3-Greatest) in box and whisker plot is roughly equally divided; thus the smallest looking group would be more denser or less variable. So does that mean in above example (Q3-Greatest) group contains most students of aged 8.5-9; so its densest of all and less variable ?
I don't think it is useful to use a boxplot to talk about 'density' with any precision. Certainly, it is true that about 3/4 of the students are concentrated within years 6 and 9 yrs of age (a span of 3-4 years, depending how you view age), while only 1/4
are in the longer span of years from 2 to 6. But a histogram is a better graphical device for showing 'densities'.
Note: A boxplot gives no information about how many students are in the sample.
It is best to use boxplots only for samples larger than a dozen or so. The mechanism of making a boxplot depends on finding three numbers which cut sorted observations into four approximately equal parts. [They are the lower
quartile $(Q_1)$ left end of the box, Median, heavy line within the box, and $(Q_3)$ right end of box.] If you have a sample of only seven observations, it
is difficult to know how to divide them into four approximately equal 'chunks'.
Here is a histogram of a (fake) dataset of 40 ages that might have made your boxplot. A histogram is based on area: notice that each student
is represented by one 'brick' of area within his or her bar of the histogram.
The tick marks beneath the histogram show 'exact' ages of the
students (e.g, to the nearest number of weeks). At the resolution of this graph, tick marks for 2 or more students of
very nearly the same age may appear as one mark.

Addendum: A comment expressed interest in means, medians, and modes of skewed distributions.
Here are samples from two distributions: The first
is $mathsfGamma(shape=2, rate=1/20)$ It is a right-skewed distribution with mode 20, median 33.37,
and mean 40. A sample of size $n = 100$ has the following summary
statistics:
summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.441 19.121 33.433 40.629 49.972 203.525
The sample mean and median are similar to the population mean and
median. There is no formal mode because no two observations are
exactly the same, but one might say that the modal interval of the histogram
(lower-left in the figure below) is $(20, 40].$
The second distribution is $mathsfBeta(2, 1)$ It has a left-skewed
distribution with mode 1, median 0.7071, and mean 2/3. A sample of size $n = 100$ has the following summary
statistics:
summary(y)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.08611 0.49792 0.71515 0.67491 0.87883 0.99579
Again here, the sample mean and median closely imitate the population
mean and mean, respectively. The modal histogram interval is $(0.9, 1.0].$
The figure below shows the gamma distribution at left and the
beta distribution at right. The tick marks below the histogram show
the locations of individual points. The curves are the density
functions of the respective distributions.

set.seed(1234)
par(mfcol=c(2,2))
x = rgamma(100, 2, .05)
boxplot(x, horizontal=T, col="skyblue2")
hist(x, prob=T, col="skyblue2"); rug(x)
curve(dgamma(x, 2, .05), add=T, lwd=2)
y = rbeta(100, 2, 1)
boxplot(y, horizontal=T, col="skyblue2")
hist(y, prob=T, col="skyblue2"); rug(y)
curve(dbeta(x, 2, 1), add=T, lwd=2)
par(mfrow=c(1,1))
Note to @linuxuser: If your textbook does not discuss gamma and beta distributions, you can
read about them in Wikipedia. Both families of distributions are widely used in applied probability modeling. [Roughly speaking, the gamma function $Gamma(cdot),$ used to define the density functions, is
a continuous version of the factorial function, filling in values
for non-integers. For positive integer $k$, we have $Gamma(k) = (k-1)!;$ for example $Gamma(5) = 4! = 24.$]
edited Sep 11 at 4:42
answered Sep 9 at 4:03
BruceET
33.9k71440
33.9k71440
I am beginner in statistics, sir please excuse me if i ask some silly questions.I don't think it is useful to use a boxplot to talk about 'density' with any precision: IMO w.r.t density you will always wish to have an idea/approximately know which part of the population is more denser and for that box plot is enough to tell. Why would you want to quantify the density ever ?
â linux user
Sep 13 at 3:09
Both boxplots and histograms are useful, but it is helpful to unserstand their relative advantages and disadvantages.
â BruceET
Sep 13 at 5:11
add a comment |Â
I am beginner in statistics, sir please excuse me if i ask some silly questions.I don't think it is useful to use a boxplot to talk about 'density' with any precision: IMO w.r.t density you will always wish to have an idea/approximately know which part of the population is more denser and for that box plot is enough to tell. Why would you want to quantify the density ever ?
â linux user
Sep 13 at 3:09
Both boxplots and histograms are useful, but it is helpful to unserstand their relative advantages and disadvantages.
â BruceET
Sep 13 at 5:11
I am beginner in statistics, sir please excuse me if i ask some silly questions.
I don't think it is useful to use a boxplot to talk about 'density' with any precision : IMO w.r.t density you will always wish to have an idea/approximately know which part of the population is more denser and for that box plot is enough to tell. Why would you want to quantify the density ever ?â linux user
Sep 13 at 3:09
I am beginner in statistics, sir please excuse me if i ask some silly questions.
I don't think it is useful to use a boxplot to talk about 'density' with any precision : IMO w.r.t density you will always wish to have an idea/approximately know which part of the population is more denser and for that box plot is enough to tell. Why would you want to quantify the density ever ?â linux user
Sep 13 at 3:09
Both boxplots and histograms are useful, but it is helpful to unserstand their relative advantages and disadvantages.
â BruceET
Sep 13 at 5:11
Both boxplots and histograms are useful, but it is helpful to unserstand their relative advantages and disadvantages.
â BruceET
Sep 13 at 5:11
add a comment |Â
up vote
0
down vote
Students age group is 2-9
There are more students with age 6-7 & 7-8.5
- The average student age is 7
- Since each group (Least-Q1, Q1-Q2, Q2-Q3 & Q3-Greatest) in box and whisker plot is roughly equally divided; thus the smallest looking group would be more denser or less variable. So does that mean in above example (Q3-Greatest) group contains most students of aged 8.5-9; so its densest of all and less variable ?
The point $1$ is correct.
Note that the point 2 contradicts the point 4: each invertal is roughly $25%$ of the data, so $Q_1$-$Q_3$ is roughtly $50%$ of the data. Also, the statement is not complete: "more students with age 6-7 & 7-8.5" than which group? Do you mean more students compared with other specific interval or in general?
In the point $3$, the word "average" is ambiguous, as there are three types of averages: mean, median and mode. Here $Q_2$ is the median. Depending on the shape of distribution (there can be three types: positively-skewed, negatively-skewed, symmetric), you can have different relationships of the mean, median and mode (usually, mode$<$median$<$mean, mean$<$median$<$mode, mean$approx$median$approx$mode, respectively, however, for symmetric not always). The data looks negatively-skewed, because $75%$ data are in the interval $6$-$9$ against $25%$ in $2$-$6$, which implies the data (ages, basically the number of students) is more densely situated in the interval $6$-$9$. Consequently, you can say the data is less variable (closely situated) in the interval $6$-$9$ compared with the interval $2$-$6$.
I believe you mean "positively-skewed, negatively-skewed, and symmetrical." There are many non-normal, but symmetrical distribution families (including Cauchy and Laplace).
â BruceET
Sep 11 at 0:35
@BruceET, thank you for helpful comment, yes, also uniform, bimodal, etc. I wanted to show the relationship of the three averages, which may not hold for symmetric. Though, the comparison may not hold for others too. Sometimes, median$<$mean$<$mode in negatively-skewed.
â farruhota
Sep 11 at 3:22
For positive skewness it's usually mode < median < mean, and for negative skewness it's usually mean < median < mode. (Not always, you can rig distributions to get most anything.) But boxplots show only median, so it's not clear to me how they fit into the picture. Maybe histograms would work better.
â BruceET
Sep 11 at 3:35
See addendum to my Answer.
â BruceET
Sep 11 at 4:07
@BruceET, yes, by "respectively" I meant the cases for mean, median and mean of different distributions. The box plot also indicates the shape of distribution, no? In this case, the mass is shifted to the right, so it is negatively-skewed, so mean$<$median$<$mode. Thank you, indeed.
â farruhota
Sep 11 at 6:07
add a comment |Â
up vote
0
down vote
Students age group is 2-9
There are more students with age 6-7 & 7-8.5
- The average student age is 7
- Since each group (Least-Q1, Q1-Q2, Q2-Q3 & Q3-Greatest) in box and whisker plot is roughly equally divided; thus the smallest looking group would be more denser or less variable. So does that mean in above example (Q3-Greatest) group contains most students of aged 8.5-9; so its densest of all and less variable ?
The point $1$ is correct.
Note that the point 2 contradicts the point 4: each invertal is roughly $25%$ of the data, so $Q_1$-$Q_3$ is roughtly $50%$ of the data. Also, the statement is not complete: "more students with age 6-7 & 7-8.5" than which group? Do you mean more students compared with other specific interval or in general?
In the point $3$, the word "average" is ambiguous, as there are three types of averages: mean, median and mode. Here $Q_2$ is the median. Depending on the shape of distribution (there can be three types: positively-skewed, negatively-skewed, symmetric), you can have different relationships of the mean, median and mode (usually, mode$<$median$<$mean, mean$<$median$<$mode, mean$approx$median$approx$mode, respectively, however, for symmetric not always). The data looks negatively-skewed, because $75%$ data are in the interval $6$-$9$ against $25%$ in $2$-$6$, which implies the data (ages, basically the number of students) is more densely situated in the interval $6$-$9$. Consequently, you can say the data is less variable (closely situated) in the interval $6$-$9$ compared with the interval $2$-$6$.
I believe you mean "positively-skewed, negatively-skewed, and symmetrical." There are many non-normal, but symmetrical distribution families (including Cauchy and Laplace).
â BruceET
Sep 11 at 0:35
@BruceET, thank you for helpful comment, yes, also uniform, bimodal, etc. I wanted to show the relationship of the three averages, which may not hold for symmetric. Though, the comparison may not hold for others too. Sometimes, median$<$mean$<$mode in negatively-skewed.
â farruhota
Sep 11 at 3:22
For positive skewness it's usually mode < median < mean, and for negative skewness it's usually mean < median < mode. (Not always, you can rig distributions to get most anything.) But boxplots show only median, so it's not clear to me how they fit into the picture. Maybe histograms would work better.
â BruceET
Sep 11 at 3:35
See addendum to my Answer.
â BruceET
Sep 11 at 4:07
@BruceET, yes, by "respectively" I meant the cases for mean, median and mean of different distributions. The box plot also indicates the shape of distribution, no? In this case, the mass is shifted to the right, so it is negatively-skewed, so mean$<$median$<$mode. Thank you, indeed.
â farruhota
Sep 11 at 6:07
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Students age group is 2-9
There are more students with age 6-7 & 7-8.5
- The average student age is 7
- Since each group (Least-Q1, Q1-Q2, Q2-Q3 & Q3-Greatest) in box and whisker plot is roughly equally divided; thus the smallest looking group would be more denser or less variable. So does that mean in above example (Q3-Greatest) group contains most students of aged 8.5-9; so its densest of all and less variable ?
The point $1$ is correct.
Note that the point 2 contradicts the point 4: each invertal is roughly $25%$ of the data, so $Q_1$-$Q_3$ is roughtly $50%$ of the data. Also, the statement is not complete: "more students with age 6-7 & 7-8.5" than which group? Do you mean more students compared with other specific interval or in general?
In the point $3$, the word "average" is ambiguous, as there are three types of averages: mean, median and mode. Here $Q_2$ is the median. Depending on the shape of distribution (there can be three types: positively-skewed, negatively-skewed, symmetric), you can have different relationships of the mean, median and mode (usually, mode$<$median$<$mean, mean$<$median$<$mode, mean$approx$median$approx$mode, respectively, however, for symmetric not always). The data looks negatively-skewed, because $75%$ data are in the interval $6$-$9$ against $25%$ in $2$-$6$, which implies the data (ages, basically the number of students) is more densely situated in the interval $6$-$9$. Consequently, you can say the data is less variable (closely situated) in the interval $6$-$9$ compared with the interval $2$-$6$.
Students age group is 2-9
There are more students with age 6-7 & 7-8.5
- The average student age is 7
- Since each group (Least-Q1, Q1-Q2, Q2-Q3 & Q3-Greatest) in box and whisker plot is roughly equally divided; thus the smallest looking group would be more denser or less variable. So does that mean in above example (Q3-Greatest) group contains most students of aged 8.5-9; so its densest of all and less variable ?
The point $1$ is correct.
Note that the point 2 contradicts the point 4: each invertal is roughly $25%$ of the data, so $Q_1$-$Q_3$ is roughtly $50%$ of the data. Also, the statement is not complete: "more students with age 6-7 & 7-8.5" than which group? Do you mean more students compared with other specific interval or in general?
In the point $3$, the word "average" is ambiguous, as there are three types of averages: mean, median and mode. Here $Q_2$ is the median. Depending on the shape of distribution (there can be three types: positively-skewed, negatively-skewed, symmetric), you can have different relationships of the mean, median and mode (usually, mode$<$median$<$mean, mean$<$median$<$mode, mean$approx$median$approx$mode, respectively, however, for symmetric not always). The data looks negatively-skewed, because $75%$ data are in the interval $6$-$9$ against $25%$ in $2$-$6$, which implies the data (ages, basically the number of students) is more densely situated in the interval $6$-$9$. Consequently, you can say the data is less variable (closely situated) in the interval $6$-$9$ compared with the interval $2$-$6$.
edited Sep 11 at 3:19
answered Sep 9 at 7:38
farruhota
15.6k2734
15.6k2734
I believe you mean "positively-skewed, negatively-skewed, and symmetrical." There are many non-normal, but symmetrical distribution families (including Cauchy and Laplace).
â BruceET
Sep 11 at 0:35
@BruceET, thank you for helpful comment, yes, also uniform, bimodal, etc. I wanted to show the relationship of the three averages, which may not hold for symmetric. Though, the comparison may not hold for others too. Sometimes, median$<$mean$<$mode in negatively-skewed.
â farruhota
Sep 11 at 3:22
For positive skewness it's usually mode < median < mean, and for negative skewness it's usually mean < median < mode. (Not always, you can rig distributions to get most anything.) But boxplots show only median, so it's not clear to me how they fit into the picture. Maybe histograms would work better.
â BruceET
Sep 11 at 3:35
See addendum to my Answer.
â BruceET
Sep 11 at 4:07
@BruceET, yes, by "respectively" I meant the cases for mean, median and mean of different distributions. The box plot also indicates the shape of distribution, no? In this case, the mass is shifted to the right, so it is negatively-skewed, so mean$<$median$<$mode. Thank you, indeed.
â farruhota
Sep 11 at 6:07
add a comment |Â
I believe you mean "positively-skewed, negatively-skewed, and symmetrical." There are many non-normal, but symmetrical distribution families (including Cauchy and Laplace).
â BruceET
Sep 11 at 0:35
@BruceET, thank you for helpful comment, yes, also uniform, bimodal, etc. I wanted to show the relationship of the three averages, which may not hold for symmetric. Though, the comparison may not hold for others too. Sometimes, median$<$mean$<$mode in negatively-skewed.
â farruhota
Sep 11 at 3:22
For positive skewness it's usually mode < median < mean, and for negative skewness it's usually mean < median < mode. (Not always, you can rig distributions to get most anything.) But boxplots show only median, so it's not clear to me how they fit into the picture. Maybe histograms would work better.
â BruceET
Sep 11 at 3:35
See addendum to my Answer.
â BruceET
Sep 11 at 4:07
@BruceET, yes, by "respectively" I meant the cases for mean, median and mean of different distributions. The box plot also indicates the shape of distribution, no? In this case, the mass is shifted to the right, so it is negatively-skewed, so mean$<$median$<$mode. Thank you, indeed.
â farruhota
Sep 11 at 6:07
I believe you mean "positively-skewed, negatively-skewed, and symmetrical." There are many non-normal, but symmetrical distribution families (including Cauchy and Laplace).
â BruceET
Sep 11 at 0:35
I believe you mean "positively-skewed, negatively-skewed, and symmetrical." There are many non-normal, but symmetrical distribution families (including Cauchy and Laplace).
â BruceET
Sep 11 at 0:35
@BruceET, thank you for helpful comment, yes, also uniform, bimodal, etc. I wanted to show the relationship of the three averages, which may not hold for symmetric. Though, the comparison may not hold for others too. Sometimes, median$<$mean$<$mode in negatively-skewed.
â farruhota
Sep 11 at 3:22
@BruceET, thank you for helpful comment, yes, also uniform, bimodal, etc. I wanted to show the relationship of the three averages, which may not hold for symmetric. Though, the comparison may not hold for others too. Sometimes, median$<$mean$<$mode in negatively-skewed.
â farruhota
Sep 11 at 3:22
For positive skewness it's usually mode < median < mean, and for negative skewness it's usually mean < median < mode. (Not always, you can rig distributions to get most anything.) But boxplots show only median, so it's not clear to me how they fit into the picture. Maybe histograms would work better.
â BruceET
Sep 11 at 3:35
For positive skewness it's usually mode < median < mean, and for negative skewness it's usually mean < median < mode. (Not always, you can rig distributions to get most anything.) But boxplots show only median, so it's not clear to me how they fit into the picture. Maybe histograms would work better.
â BruceET
Sep 11 at 3:35
See addendum to my Answer.
â BruceET
Sep 11 at 4:07
See addendum to my Answer.
â BruceET
Sep 11 at 4:07
@BruceET, yes, by "respectively" I meant the cases for mean, median and mean of different distributions. The box plot also indicates the shape of distribution, no? In this case, the mass is shifted to the right, so it is negatively-skewed, so mean$<$median$<$mode. Thank you, indeed.
â farruhota
Sep 11 at 6:07
@BruceET, yes, by "respectively" I meant the cases for mean, median and mean of different distributions. The box plot also indicates the shape of distribution, no? In this case, the mass is shifted to the right, so it is negatively-skewed, so mean$<$median$<$mode. Thank you, indeed.
â farruhota
Sep 11 at 6:07
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2909280%2finterpretation-of-the-given-box-and-whisker-plot%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password