How to Find the Truncated Mean in SQL
- by Scott Whigham on February 6, 2009 5:34 AMThe truncated mean is an average over a set obtained after you actually ignore the highest and lowest numbers. For example: given a set of the three numbers 100, 200 and 300, the "average" would be 200. We would add 100, 200, and 300 then divide by the number of members in the set: (100+200+300)/3 = 200. A "truncated mean" however would look at the set and eliminate the lowest value and the highest value before calculating the average: (200)/1 = 200 {note that we eliminated 100 and 300 in the set}.
I got into a reporting situation for a Reporting Services report that I was writing this week in which I had some really, really off-the-wall numbers for the highs and lows and the truncated mean was the solution. I had a set of data that was something like 15 rows - 1 row was 0.00, another row was 100000.00 and the other thirteen rows were clustered around 1300-1400. Clearly a regular "average" would've generated an odd number: I had an "average" of something like 7800. Once I eliminated 0.00 and 100000.00 from the set, I was fine. Now sure, there are lots of ways to calculate averages and standard deviations; that's not what this post is about. This post has one goal: to show you how if you wish to to calculate the truncated mean in SQL Server 2005 or SQL Server 2008 (or SQL Server 2011 for that matter!). For more details about what a truncated mean is, check out the wikipedia entry here.
To start out, let's create ourselves a table to play with:
USE tempdb
GO
CREATE TABLE Sales(SalesPerson NVARCHAR(128), TotalSales MONEY);
INSERT Sales
SELECT 'Scott' , $1 UNION ALL
SELECT 'Scott' , $2 UNION ALL
SELECT 'Scott' , $3
(Yes, I know I could have used row initializers but that wouldn't work for SQL 2005, would it?)
Our set is simple: three rows of $1, $2, and $3. The "average" is ($1+$2+$3)/3 = $2 and the truncated mean is $2/1=$2. The key to calculating the truncated mean is use windowing and partitioning. For example, run this query:
SELECT ROW_NUMBER() OVER(ORDER BY TotalSales) AS RowNumber
, COUNT(*) OVER() AS NumberOfMembersInSet
, SalesPerson, TotalSales
FROM Sales
The ROW_NUMBER() function allows us to assign a unique number to each row in a set of rows. The OVER() windowing function allows us to define how we want the ROW_NUMBER() to be calculated. Because we said ROW_NUMBER() OVER(ORDER BY TotalSales), #1 in the ROW_NUMBER() generated will be for the row with the lowest amount of sales ($1 in our example). The COUNT(*) tells us how many total rows are in the set and the COUNT(*) OVER() says to include the total number of rows in the count.
Here's what we get back when we run our work-in-progress query:
Take a look at the "RowNumber" column is critical here; it's what we will use to "throw out" the highest and lowest value. What we really want to do is to say, "Eliminate RowNumber 1 and 3 and tell us the average of the rows in between." And that's simple enough in this example since there are only three rows; the truncated mean would be $2.00.
But how do you return just the truncated mean?
Unfortunately, we can't just write a WHERE clause that says WHERE RowNumber BETWEEN 2 AND COUNT(*)-1 (for various reasons - check out our transact-sql training videos for more). Windowing functions' results cannot be directly accessed in a query. There is a workaround though... Run this query:
SELECT * FROM (
SELECT ROW_NUMBER() OVER(ORDER BY TotalSales) AS RowNumber
, COUNT(*) OVER() AS NumberOfMembersInSet
, SalesPerson, TotalSales
FROM Sales
) AS x
WHERE x.RowNumber > 1
AND x.RowNumber < NumberOfMembersInSet
ORDER BY SalesPerson DESC
Couple of things:
- We've now placed the windowing functions in a subquery
- Now that we've done that, we can put a filter that says we want rows 2 through COUNT(*)-1
The "NumberOfMembersInSet" column is now very important - it effectively tells us how many rows are in the set. We can use that in our WHERE clause to filter out the last row in the set. The final result of the above query is to just return the rows that are RowNumber>1 AND RowNumber<(last row). The only final piece of our puzzle is to calculate the average:
SELECT SalesPerson, AVG(TotalSales) AS AvgSales FROM (
SELECT ROW_NUMBER() OVER(ORDER BY TotalSales) AS RowNumber
, COUNT(*) OVER() AS NumberOfMembersInSet
, SalesPerson, TotalSales
FROM Sales
) AS x
WHERE x.RowNumber > 1
AND x.RowNumber < NumberOfMembersInSet
GROUP BY SalesPerson
ORDER BY SalesPerson DESC
But wait - there's more you need to know...
The above was a simplified version of our query that showed you only how to calculate the truncated mean over one partition. We were only looking at one salesperson's average - what if we want the average for each salesperson? Each salesperson is a different partition and, to account for that, we must use PARTITION BY SalesPersonId in our windowing functions. To illustrate, let's add a few more salespeople and sales to the database:
INSERT Sales
SELECT 'Chad' , $1 UNION ALL
SELECT 'Chad' , $2 UNION ALL
SELECT 'Chad' , $4 UNION ALL
SELECT 'Chad' , $5 UNION ALL
SELECT 'Chrissy', $1 UNION ALL
SELECT 'Chrissy', $1 UNION ALL
SELECT 'Chrissy', $2 UNION ALL
SELECT 'Chrissy', $4 UNION ALL
SELECT 'Chrissy', $5
If you just eyeball it, you can see that Chad has four rows - if we eliminate $1 (the low) and $5 (the high), then the truncated mean is ($2+$4)/2=$3. Chrissy, however, shows us what happens when we have two values that are the same on either the hi or low (the "low" in this case): she has two $1 entries. A truncated mean only removes the first one!
The Final Query
SELECT SalesPerson, AVG(TotalSales) AS AvgSales FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY SalesPerson ORDER BY TotalSales) AS RowNumber
, COUNT(*) OVER(PARTITION BY SalesPerson) AS NumberOfMembersInSet
, SalesPerson, TotalSales
FROM Sales
) AS x
WHERE x.RowNumber > 1
AND x.RowNumber < NumberOfMembersInSet
GROUP BY SalesPerson
ORDER BY SalesPerson DESC
The final result is:
Scott's truncated mean eliminated the low ($1) and the high ($3) which left only a single value ($2)
Chad's truncated mean threw out a low of $1 and a high of $5 and the final calculation was ($2+$4)/2 = $3
Chrissy's truncated mean had to eliminate the first $1 (the low) and the $5 (the high) for a final calculation of ($1+$2+4)/3 = $2.3333.
I hope this helped you understand the truncated mean a bit more. It isn't a calculation that we, as developers, need all that often but it's nice to have it ready when and if you need it!




Hi,
Why can't you use following,
Select fld1, (Sum(amt) - Max(amt) - min(amt)) / (Count(amt) - 2)
From table group by fld1
Nice! I didn't even include it lol. It's funny how your mind automatically jumps to a more complex solution when new technology/techniques come out lol.
One "gotcha" with that approach is when you have someone with only two rows - you get a "Divide by zero" error. Just make sure to include a NULLIF(COUNT(amt)-2) around the whole expression.
I'm not sure which one performs better over a large set of rows but would love to hear someone running these two through several thousand iterations and posting the results.
Most of the time I don’t make comments on blogs, but I have to mention that this post really forced me to do so. Really nice post!
Thank you soooooooooooooooooo much for the post This was exactly the thing I needed to read :)
I really like keeping up with these stories! It completely brings something special the morning routine.
I am currently reading it on my Blackberry and will scan it once I get home. I love your site and marketing strategy. Check out my website if you get a chance. My Public Records Blog - http://downloadpublicrecords.blogspot.com/
You would be crazy to not use more Twitter marketing
Thanks for a good read. I will be returning.
Very interesting way to market on facebook. I also found a way to automate several very powerful methods of getting users. You can grab ID by groups or pages or wall posts with this program. Then once you have them you can do a friend request to your logon or sudo profile. This by passes the captcha codes too. There is also a cool chat program that you can setup scripts and keyword to work with. This program, when I use it I can see a spike in my site traffic. It is awesome...
Heya i got to your site by mistake when i was searching bing for something off topic here but i do have say your site is really helpful, like the theme and the content on here...so thanks for me procrastinating from my previous task, lol