Basics of Statistical Functions
Statistical functions look at a set of data values and produce a value which says something about the set. For example, the mean, also called the arithmetic mean and the average, calculates the average value from a set of values. The median finds the value in the middle of a set, and the mode returns the value which appears most frequently. The range of a set of numeric values returns the difference between the highest and the lowest values in the set. Finally, the standard deviation gives a measure of how much the set of values is distributed around the mean. There are other statistical functions, of course, but these are the ones emphasized on the GMAT.
The mean – also called the arithmetic mean and the average – is a statistical function which takes a group of values, in which there can be repetitions, and calculates their average, which is defined to be the sum of the values divided by the number of values in the group. Example: Suppose 80, 85, 85, 90, 91, 92, 97 are test scores. Then their mean is
(80 + 85 + 85 + 90 + 91 + 92 + 97)/7 = 620/7 = 88.57
This statistic is valuable for identifying the midpoint of a group of values, and is the basis for calculating other statistical functions like the variance and standard deviation. We can think of it as showing the “typical” value in the group of values. Thus, if the list shown in the example were scores on an exam, then we can see about where the middle of the scores lies.
The median is the value that’s in the middle of a group of values in the sense that there are the same number of values larger than the median as there are less than the median. Taking again as an example the values
80, 85, 85, 90, 91, 92, 97
Their median is 90, since there are 7 values in the list, and 3 of them are less than 90 and 3 are larger. Notice that the median is close to the mean, but not necessarily equal to it.
Suppose there are an even number of values in the list. The median is then the average of the two middle values. Therefore, if the list of values were
80, 85, 85, 90, 91, 92, 97, 100
then the median would be the average of 90 and 91, which is 90.5.
The median is particularly useful when there is no upper bound to the data values and a few very large values tend to skew the mean so as to make it less meaningful. For example, suppose the mean starting salary of a class of graduates from a particular college is $100,000, but on examination it turns out that one of the graduates was an outstanding athlete who signed a multi-million dollar contract to play football professionally. When this student’s “starting salary” is eliminated, the mean for the rest of the graduating class becomes $60,000. Obviously, the mean in this example is not representative of the midpoint value. On the other hand, looking at the median instead of the mean, we find that the median is about $60,000, which is much closer to what we would normally deem as the actual midpoint of the salaries.
Another example based on the same idea is the practice of citing the median price of homes being sold in a community. If the median is, say, $250,000, that would suggest to prospective buyers what they can expect to pay if they’re looking to buy in the community. On the other hand, if several multi-million dollar homes have recently sold there, then the mean price would be much higher and would give a misleading impression to the normal home buyer.
Mode and Frequency Distribution
The mode is the value or values that appear most often in a data set of values.
Example: 1, 2, 2, 2, 3, 3, 4, 4, 4.
In this list, the numbers 2 and 4 appear the most frequently and so they are the modes. This set is bi-modal. In the example given above, 80, 85, 85, 90, 91, 92, 97, the number 85 is the one and only mode.
The mode need not be a number. It is simply the most often occurring value(s) in a set of values. For example, in an election for class president, two candidates may tie for the most votes. These candidates are the mode. Or perhaps from a list of 10 movies, a poll shows that a particular movie is the most popular. This movie is the mode.
Another statistical function, the frequency distribution, illustrates the use of the mode. For example, suppose Jane sells this number of insurance policies each week in a 10 week period:
1, 3, 1, 4, 5, 3, 2, 1, 1, 2
Then the frequency distribution of these numbers is shown in this table:
|Number of Policies Sold||Frequency|
As you can see, the mode of this distribution is 1 policy sold in a week, which happens 4 of the 10 weeks. This value, 1 policy sold in a week, is then the most probable value for this distribution. It’s probability is 4/10, whereas the probability for 2 or 3 sold is 2/10 each, and for 4 and 5 sold is 1/10 each.
The range function measures how far apart the highest and lowest values are in a set of values. Suppose that in a certain city the highest temperature recorded during the year is 102, and the lowest is -17. Then the range of temperatures in the city for that year is 119. The range of policies sold by Jane, shown above, is 4. The range of test scores shown above is 17, since they range from 80 to 97.
Range, variance, and standard deviation, used together with the mean, give a good picture of the distribution of data in a data set.
Standard deviation is used to measure how far apart the values in a data set are distributed – the larger the standard deviation, the more the data is dispersed. It is calculated as follows:
- Find the mean of the set of values.
- Calculate the difference of each number and the mean.
- Square each of these differences.
- Take the average of all the squared differences calculated in step 3.
- Take the square root of the average calculated in step 4.
Example: Suppose the values are 1, 2, 4, 5. Then the mean is 3, and the numbers of steps 2 and 3 are shown in the following table:
|x||x – mean||(x – mean)2|
The variance function is used to calculate the standard deviation. In fact, to calculate the variance of a data set, simply follow steps 1-4 above. The standard deviation, then, is computed with step 5, and is the square root of the variance.
The average of 4, 1, 1, and 4 is 2.5. This is the variance for this data set. The square root of this is approximately 1.58. This is the standard deviation to two decimal places.