Saturday, October 18, 2014

MDX: Top N by Group with All Others

When summarizing data using MDX, it’s a common requirement to find the top N by group (i.e., the top N tuples of one set for each member of another set). For example, you may need to find the top N brands by state or the top N products by calendar year based on sales count or amount. In MDX, this is accomplished using a combination of the Generate and TopCount functions. As an example, the following returns the top 5 products for each region based on sales amount:


WITH SET [Top N Products] AS
    Generate(
        [Sales Territory].[Sales Territory Region].Children
 , TopCount( 
            [Sales Territory].[Sales Territory Region].CurrentMember
  * [Product].[Product Categories].[Product].MEMBERS
            , 5
            , ( [Measures].[Internet Sales Amount] )
        ) 
    )
SELECT { [Measures].[Internet Sales Amount] } ON COLUMNS,
    NON EMPTY { [Top N Products] }  ON ROWS
FROM [Adventure Works]


This commonly available query uses the Generate function to evaluate the set produced by the TopCount of the cross product of regions and products, and applies it to each of the region member's children. This produces the following results:

Results of a Top N by Group MDX Query
Top N by Group Results

But this only provides part of the picture. Typically, I would want to include all other products within each region as well so that I could provide context to the top N items. For example, in a report (which could also be accomplished using MDX... but let's keep the MDX straightforward for this post) I could calculate each product's percent of sales within each region or each region's percent of sales versus all sales. In order to produce the all other products row for each region, we need to create a new calculated member (i.e., [All Other Products]) using the Aggregate and Except functions.

WITH SET [Top N Products] AS
    Generate(
        [Sales Territory].[Sales Territory Region].Children
 , TopCount( 
            [Sales Territory].[Sales Territory Region].CurrentMember
  * [Product].[Product Categories].[Product].MEMBERS
            , 5
            , ( [Measures].[Internet Sales Amount] )
        ) 
    )
    MEMBER [Product].[Product Categories].[All Other Products] AS 
 Aggregate( 
     Except( 
  NonEmpty( 
             { [Sales Territory].[Sales Territory Region].CurrentMember 
      * [Product].[Product Categories].[Product].MEMBERS } 
  )
  , [Top N Products] 
     )
 )
SELECT { [Measures].[Internet Sales Amount] } ON COLUMNS,
    NON EMPTY { 
 [Top N Products]
 , ( 
     [Sales Territory].[Sales Territory Region].[Sales Territory Region] 
     , [All Other Products] 
 ) 
    } ON ROWS
FROM [Adventure Works]

The Aggregate function takes a set (e.g., the cross product of region and product) and aggregates a measure (e.g., sales amount) within the current context. However, we additionally must use the Except function to exclude the top N products for each region from the aggregate. 

While the NonEmpty function within the Except doesn't make a difference in this example, it can provide a significant performance gain depending on the cross product that is being evaluated.

This query results in the following (as a note, I did use an Order function so that All Other Products would sort within each group, but I left it out of the MDX to keep the example clear):

Results of a Top N by Group with All Others MDX Query
Top N by Group with All Others Results

Although the MDX is a little more complicated, the results are much more useful, providing a basis for creating context for your top N values.

1 comment: