|
The basic MDX query has four parts. These are, in order, an optional "with"
statement, a "select" statement, a "from" clause and a "where"
clause. While this sounds superficially SQL-like, it has been our experience
that in most ways familiarity with SQL is not a help and at times actively a
hindrance to interpreting MDX, especially considering the at times fragmentary
state of MDX documentation.
Of these four parts, 'from' is the most trivial, as it merely takes the name
of the cube being queried. 'Where' is the next simplest at a conceptual
level. Where is the "slicing" clause. Anytime a query only uses data
indexed by single member of a dimension, that dimension is addressed in the
"where" clause, thereby reducing the complexity of the query. One could
imagine taking a slice of a cube at that particular point, and only looking at
the data within the slice. For instance, if you are interested in the atom
positions for the third picosecond, "Where [time].[3]" restricts the set
under consideration that of the third picosecond. A good rule of thumb is that
all dimensions in the cube need to be referenced in any query, either in the
where clause or the select statement. (In fact, any time you know that the
dataset only contains a single member, such as when all simulations have variant
set of WT, or that you know for a particular structure only a single simulation
was run at 298 degrees, you can omit the reference to that dimension. However,
this should be done with caution, as dimensions that are not sliced will by
default return values that are the sums of all members of the referenced
dimension.)
'Select' defines which data will be presented in the result set
and their format. In the 'select' statement one defines data points or data
sets and then assigns one or more of these points or sets to each axis. One can
choose return multi-dimensional results, however, as the default client can only
display two-dimensional results we are limiting the sample queries to display
results in two dimensions, for simplicity and clarity. In the simplest case,
one can select the x, y, and z coordinate measures "on columns" and atoms
"on rows". This would return to you a two dimensional table of three
columns, one each for x, y, and z, and rows for each atom.
For more complex
cases, 'select' allows one a great deal of flexibility in defining sets.
For instance, sets can be filtered. If one is interested in performing
calculations only on the c-alpha atoms of a protein, one can filter the set of
all atoms in the protein to include only those whose atom name is equal to
"CA". One can also navigate the levels of a hierarchical dimension to
perform set manipulation - for instance, one can specify a set that contains
only the children of the 18th residue of a particular protein. As "atoms"
is the next level below residue numbers, this would return a set of all the
atoms in the 18th residue.
Crossjoins are also useful in selection statements,
especially if one is limiting the result set to two dimensions. Crossjoins
essentially compress two dimensions into one, transforming what would be a
two-dimensional array into a linear collection of pairwise matches.
'With'
is an optional statement, but it is the part of the query that allows one to
define novel calculations and convert properties to measures for display. Given
coordinate data, for instance, one can define a measure 'distance' in the
'with' statement, that then provides a metric for calculating distance
between two points. One can also turn a property such as "atom name" into a
measure so that it can be displayed in the results. 'With' also can be used
to define named sets, using much the same syntax as is used when they are
defined in the select statement. These sets can then be referred to in the
select statement by name.
Annotated Example of an MDX query (distance metric):
with SET res1 as ' Filter(Descendants([ID].[Hierarchy].[structure].[1a2pa0], 2),
[ID].[Hierarchy].CurrentMember.Properties("Atom Name")="CA" )'
SET res2 as ' Filter(Descendants([sekret id].[Hierarchy].[structure].[1a2pa0],
2), [sekret id].[Hierarchy].CurrentMember.Properties("Atom Name")="CA" )'
This creates two sets, one drawn from ID, one drawn from ID's equivalent
shadow dimension Sekret ID. In each case the set is defined as the descendants
two levels down of the structure 1a2pa0, which will return all the atoms in the
structure. The sets are then further filtered to include only those atoms whose
atom name is equal to "CA", or c-alpha.
MEMBER [Measures].[distance] as ' sqr(
((res1.Item( Rank([sekret id].[Hierarchy].CurrentMember,res2) -
1),[Measures].[x Coord])
-([ID].[Hierarchy].CurrentMember,[Measures].[x Coord]))^2
+
((res1.Item( Rank([sekret id].[Hierarchy].CurrentMember,res2) -
1),[Measures].[y Coord])
-([ID].[Hierarchy].CurrentMember,[Measures].[y Coord]))^2
+
((res1.Item( Rank([sekret id].[Hierarchy].CurrentMember,res2) -
1),[Measures].[z Coord])
-([ID].[Hierarchy].CurrentMember,[Measures].[z Coord]))^2
) '
This is a metric to calculate the distance between a pair of atoms, one drawn
from the ID dimension, the drawn from the ID dimensions but assigned to the
Sekret ID shadow dimension.
select
res1 * res2 on rows,
MEASURES.[distance] on columns
The first clause of the select statement puts the crossjoin of res1 and res2 on
rows, meaning that the rows of the result set will be all possible pairwise
matches of the c-alpha atoms of the protein. The second clause puts the single
measure, distance, on columns, meaning that there will be a single results
column showing the distance between all possible pairs of C-alpha atoms.
from [Dynameomics]
Dynameomics is the name of the cube.
WHERE ( [Time].[1],
[Run].[1],
[Temp].[298],
[Variant].[WT],
[Conditions].[cs=0,nbcycl=3,cor=10]
)
These are the slicing dimensions. For this query, we are only considering data
from the first picosecond, the first run, where the temperature is 298K, the
variant set to wild type, and with the above specified conditions.
|