The MDX Query Language
The basic MDX query has four parts. These are, in order, an
optional "with" statement, a "select" statement,
a "from" clause and a "where" clause. While this sounds superficially SQL-like,
it has been our experience that familiarity with SQL is not a help and at times
actively a hindrance to interpreting MDX, especially considering the at times
fragmentary state of MDX documentation.
Of these four parts, 'from' is the most trivial, as it merely
takes the name of the cube being queried. 'Where' is the next simplest at a
conceptual level. Where is the "slicing" clause. Anytime a query only uses data
indexed by a single member of a dimension, that dimension is addressed in the "where"
clause, thereby reducing the complexity of the query. One could imagine taking a
slice of a cube at that particular point, and only looking at the data within the
slice. For instance, if you are interested in the atom positions for the third picosecond,
"Where [time].[3]" restricts the set under consideration to that of the third picosecond.
A good rule of thumb is that all dimensions in the cube need to be referenced in any
query, either in the where clause or the select statement. (In fact, any time you know
that the dataset only contains a single member, such as when all simulations have variant
set of WT, or that you know for a particular structure only a single simulation was run at
298 degrees, you can omit the reference to that dimension. However, this should be done with
caution, as dimensions that are not sliced will by default return values that are the sums
of all members of the referenced dimension.)
'Select' defines which data will be presented in the result set and
their format. In the 'select' statement one defines data points or data sets and then assigns
one or more of these points or sets to each axis. One can choose to return multi-dimensional results,
however; as the default client can only display two-dimensional results we are limiting the sample
queries to display results in two dimensions, for simplicity and clarity. In the simplest case, one
can select the x, y, and z coordinate measures "on columns" and atoms "on rows". This would return
to you a two dimensional table of three columns, one each for x, y, and z, and rows for each atom.
For more complex cases, 'select' allows one a great deal of flexibility in
defining sets. For instance, sets can be filtered. If one is interested in performing calculations
only on the Cα atoms of a protein, one can filter the set of all atoms in the protein to include
only those whose atom name is equal to "CA". One can also navigate the levels of a hierarchical dimension
to perform set manipulation - for instance, one can specify a set that contains only the children of the
18th residue of a particular protein. As "atoms" is the next level below residue numbers, this would return
a set of all the atoms in the 18th residue.
Crossjoins are also useful in selection statements, especially if one is limiting the
result set to two dimensions. Crossjoins essentially compress two dimensions into one, transforming what
would be a two-dimensional array into a linear collection of pairwise matches.
'With' is an optional statement, but it is the part of the query that allows one to
define novel calculations and convert properties to measures for display. Given coordinate data, for
instance, one can define a measure 'distance' in the 'with' statement, that then provides a metric for
calculating distance between two points. One can also turn a property such as "atom name" into a measure
so that it can be displayed in the results. 'With' also can be used to define named sets, using much the
same syntax as is used when they are defined in the select statement. These sets can then be referred to
in the select statement by name.
Annotated Example of an MDX query (distance metric):
WITH SET res1 AS ' Filter(Descendants([ID].[Hierarchy].[structure].[1a2pa0], 2),
[ID].[Hierarchy].CurrentMember.Properties("Atom Name")="CA" )'
SET res2 AS ' Filter(Descendants([sekret id].[Hierarchy].[structure].[1a2pa0], 2),
[sekret id].[Hierarchy].CurrentMember.Properties("Atom Name")="CA" )'
This creates two sets, one drawn from ID, one drawn from ID's equivalent shadow
dimension Sekret ID. In each case the set is defined as the descendants two levels down of the structure
1a2pa0, which will return all the atoms in the structure. The sets are then further filtered to include
only those atoms whose atom name is equal to "CA", or Cα.
MEMBER [Measures].[distance] AS ' SQR( ((res1.Item( Rank([sekret id].[Hierarchy].CurrentMember,res2) - 1),[Measures].[x Coord]) -([ID].[Hierarchy].CurrentMember,[Measures].[x Coord]))^2 + ((res1.Item( Rank([sekret id].[Hierarchy].CurrentMember,res2) - 1),[Measures].[y Coord]) -([ID].[Hierarchy].CurrentMember,[Measures].[y Coord]))^2 + ((res1.Item( Rank([sekret id].[Hierarchy].CurrentMember,res2) - 1),[Measures].[z Coord]) -([ID].[Hierarchy].CurrentMember,[Measures].[z Coord]))^2 ) '
This is a metric to calculate the distance between a pair of atoms, one drawn from the ID dimension,
the drawn from the ID dimension but assigned to the Sekret ID shadow dimension.
SELECT res1 * res2
ON ROWS,
MEASURES.[distance]
ON
COLUMNS
The first clause of the select statement puts the crossjoin of res1 and res2 on rows,
meaning that the rows of the result set will be all possible pairwise matches of the Cα atoms of the protein.
The second clause puts the single measure, distance, on columns, meaning that there will be a single result column
showing the distance between all possible pairs of Cα atoms.
FROM [Dynameomics] Dynameomics is the name of the cube.
WHERE ( [Time].[1], [Run].[1], [Temp].[298], [Variant].[WT], [Conditions].[cs=0,nbcycl=3,cor=10] )
These are the slicing dimensions. For this query, we are only considering data from the first
picosecond, of the first run, where the temperature is 298K, the variant set to wild type, and with the above specified conditions.