MDX vs. SQL

20/11/2011

תגיות: , , , ,
אין תגובות

A while ago I posted about SQL and MDX in separated posts.

This time I will walk you through MDX and SQL queries to illustrate the similarities and differences between the two query languages.

All the queries are against AdventureWorksDW.

I invite you to add comments to make this idea list, even bigger!

 

SQL MDX
To retrieve a single value from a table, you need to use the Sum() function to add up the values of the selected measure for all rows and optionally assign it an alias as shown in this example. Here, Channel Sales is the view that you created in an earlier step. SELECT
Sum([Reseller Order Quantity])
AS [Reseller Order Quantity]
FROM
[Channel Sales]

In MDX, measures are automatically pre-summarized, so there is no need for a Sum() function. However, you do need to identify the axis, and then the cube or perspective. The Channel Sales perspective has already been created within the standard Adventure Works database.
    Notice that the value returned is the same for both queries. In the MDX query, the All member of all other dimensions is not specified (unless a default member has been created in the cube). In SQL, when you do not specify a filter (using the WHERE clause), the query operates against the entire rowset.

SELECT
[Reseller Order Quantity] on Columns
FROM
[Channel Sales]

In SQL, you can Group By items. When you do, you must put the item in both the SELECT clause and the GROUP BY clause.  SELECT
[Country]
, Sum([Reseller Order Quantity])
AS [Reseller Order Quantity]
FROM
[Channel Sales]
GROUP BY
 [Country]

In MDX, the Group By is done in the cube because the cube already handles bucketing measures by dimension members. Instead of using a GROUP BY clause, you specify a set of members on an axis, such as rows. The query below has all members of the Country level of the Country dimension and places them on rows. Then the query retrieves values for each row and column intersection.
    The results returned are the same values, but they are returned in a different order because the cube automatically includes a sort defined in the UDM. Also, notice the SQL Group By is part of the data set and not a pre-defined row heading like you see in the MDX query results. There is no structure implied by the groupings in the SQL results; the values in the Country column are  just like values in the Reseller Order Quantity column in that they make up a rowset of data returned from the queried view. In MDX, the group is part of the report structure. Values are retrieved logically one at a time from the cube.
SELECT
[Reseller Order Quantity] on Columns
, [Country].[Country].Members on Rows
FROM
[Channel Sales]

To sort by country, you need to add an ORDER BY clause to the query. Then the result set will be more like the MDX query results. In SQL, if you don’t explicitly sort, you get whatever order the data happens to arrive in. An ORDER BY is a separate clause and can be used only to sort rows. SELECT
[Country]
, Sum([Reseller Order Quantity])
AS [Reseller Order Quantity]
FROM
[Channel Sales]
GROUP BY
[Country]
ORDER BY
[Country]

In the SQL query, you have to specify Country in three places in order to get the data returned in the rowset, to perform the grouping, and to get the ordering correct. In the MDX, you only reference Country once. This feature is particularly useful for time series, where the sort order is not the same as the alphabetical order of the displayed captions.   
Rather than sort by name, you can alternatively sort by the sum of the order quantity in descending order in SQL. In SQL, sorting by any arbitrary value is handled the same as sorting the row values by a string value (like Country) —by using an ORDER BY clause.  SELECT
[Country]
, Sum([Reseller Order Quantity])
AS [Reseller Order Quantity]
FROM
[Channel Sales]
GROUP BY
[Country]
ORDER BY
Sum([Reseller Order Quantity]) DESC

In MDX, Order is a Set function which sorts the members before placing them on the axis. All the information needed for the sort is included in one Order function.
In this example, the order quantity is retrieved twice. Once to get the result from the cube and store in cache, then again to retrieve the values from the cache and sort in descending order.
Note that in both SQL and MDX it is possible to sort the rows by something other than what is displayed. For example, you could sort by Sales Amount but display Order Quantity.
SELECT
[Reseller Order Quantity] on Columns
, Order(
[Country].[Country].Members
, [Reseller Order Quantity]
, DESC
) on Rows
FROM
[Channel Sales]
When filtering data values, WHERE clauses in both MDX and SQL have much the same effect. The way they get there is different.
In SQL, the WHERE clause filters the incoming data rows, which are later grouped according to the aggregation function used, such as Sum().

SELECT
[Country]
, Sum([Reseller Order Quantity])
AS [Reseller Order Quantity]
FROM
[Channel Sales]
WHERE
[Calendar Year] = 2002
GROUP BY
[Country]
ORDER BY
[Country]

In MDX, the WHERE clause is applied only when the data value for each cell is retrieved, which is after the axis sets were resolved and placed.
In this example, the non-2002 data is eliminated at the end of the process of gathering values for each intersection, whereas SQL filters out any non-2002 data before summarizing values. Notice that you lose some countries in SQL where there is no 2002 data – Germany and Australia. In the MDX query, all countries are still included even if there is no 2002 data because they are defined as a set independently of the values returned from the cube.
SELECT
[Reseller Order Quantity] on Columns
, [Country].[Country].Members on Rows
FROM
[Channel Sales]
WHERE
[Date].[Calendar Year].&[2002]
    In MDX, the axis keyword NON EMPTY eliminates rows (or columns) that are completely filled with empty cells. Client tools often allow this setting as a user option, because users sometimes want to hide and sometimes want to show the empty rows.
NON EMPTY is not a function, but is a keyword. Since the values in the column won’t be known until after you place the countries on the rows, the query must finish before any empty members can be removed.
SELECT
[Reseller Order Quantity] on Columns
, NON EMPTY [Country].[Country].Members on Rows
FROM
[Channel Sales]
WHERE
[Date].[Calendar Year].&[2002]

Rather than filter the rows of a query by values in the same column that is displayed, you sometimes want to filter by a related column. For example, when displaying states, you may want to filter by country.
In SQL, there is no relationship between different columns. There is nothing in the query that indicates how these states are related to the country of Germany.
SELECT
[State]
, Sum([Reseller Order Quantity])
AS [Reseller Order Quantity]
FROM
[Channel Sales]
WHERE
[Country] = 'Germany'
GROUP BY
[State]
ORDER BY
[State]

In MDX, if you have defined a user hierarchy, you can use family relationships to filter members. The Children function shows clearly the relationship you intend between the country and the associated states. SELECT
[Reseller Order Quantity] on Columns
, [Geography].[Germany].Children on Rows
FROM
[Channel Sales]

In SQL, the WHERE clause can affect what appears on the rows. If filtering away data from the source happens to filters away group-by values, the rows that would have contained those headings disappear.
In the SQL query, the WHERE clause prefilters the row set to return only those states where the first letter is A. Then the query groups the results.
SELECT
[State]
, Sum([Reseller Order Quantity])
AS [Reseller Order Quantity]
FROM
[Channel Sales]
WHERE
[State] Like 'A%'
GROUP BY
[State]
ORDER BY
[State]

In MDX, you create arbitrary filters by changing the set on the axis. One way to do that is to use the Filter function. The filter is applied to each member in this set.
There is no LIKE operator in MDX, but it may be included in the future. Meanwhile, you can get the same effect with the Filter function. In the MDX query, the full set of states is retrieved, then the members that aren’t part of the desired set are eliminated. Then the values associated with the remaining members are retrieved.
SELECT
[Reseller Order Quantity] on Columns
,
Filter(
   [Geography].[Geography].[State-
       Province].Members
   , Left(
            [Geography].[Geography].
            CurrentMember.Name
      , 1
    ) = "A"  — 'Like operator'
promised
)
on Rows
FROM
[Channel Sales]

In SQL, you can filter rows by explicit items—still in the WHERE clause. One way to do that is by listing items in an IN clause.
In SQL, the WHERE clause still prefilters the row set to return only the named countries, and then groups the results.
SELECT
[Country]
, Sum([Reseller Order Quantity])
AS [Reseller Order Quantity]
FROM
[Channel Sales]
WHERE
[Country] IN ('Australia','France')
GROUP BY
[Country]
ORDER BY
[Country]

In MDX, you can create sets of explicit members. To do that, enclose the list of members in braces {   }. The set of explicit members still goes on the axis. Use the ampersand (&) with a key value, but not the member name. In this example, the member key and member name are the same. SELECT
[Reseller Order Quantity] on Columns
, 
{ [Geography].&[Australia]
, [Geography].&[France] }

on Rows
FROM [Channel Sales]

In SQL, in order to select the top few groups, you have to add code in separate places – the TOP n portion goes in the SELECT clause, and the ORDER BY is a separate clause. This means that you can use it only on the rows axis. SELECT TOP 3
[Country]
, Sum([Reseller Order Quantity])
AS [Reseller Order Quantity]
FROM
[Channel Sales]
GROUP BY
[Country]
ORDER BY
Sum([Reseller Order Quantity]) DESC

In MDX, TopCount is a set function that encapsulates the entire operation. It is common in MDX to have a single function that represents multiple operations. The TopCount function is equivalent to using the MDX Order function to sort the members in descending order (ignoring the hierarchies), and then using the Head function to extract the first n members.
In the SQL query, you have to do the ORDER BY to define the sort order of rows that are to be returned by the query, then the GROUP BY to aggregate the rows, and then finally use TOP 3 to limit the result set to 3 rows.
In the MDX query, you do all of these actions with one set function – TopCount. As function arguments, you tell it which members to consider, how many to return, and how the results should be sorted when determining which members are at the top.
SELECT
[Reseller Order Quantity] on Columns
, TopCount(
Country.Country.Members
,
3
, [Reseller Order Quantity]
) on Rows
FROM
[Channel Sales]

הוסף תגובה
facebook linkedin twitter email

כתיבת תגובה

האימייל לא יוצג באתר. שדות החובה מסומנים *