Database Description > The MDX Query Language

The MDX Query Language

The basic MDX query has four parts. These are, in order, an optional "with" statement, a "select" statement, a "from" clause and a "where" clause. While this sounds superficially SQL-like, it has been our experience that in most ways familiarity with SQL is not a help and at times actively a hindrance to interpreting MDX, especially considering the at times fragmentary state of MDX documentation.

Of these four parts, 'from' is the most trivial, as it merely takes the name of the cube being queried. 'Where' is the next simplest at a conceptual level. Where is the "slicing" clause. Anytime a query only uses data indexed by single member of a dimension, that dimension is addressed in the "where" clause, thereby reducing the complexity of the query. One could imagine taking a slice of a cube at that particular point, and only looking at the data within the slice. For instance, if you are interested in the atom positions for the third picosecond, "Where [time].[3]" restricts the set under consideration that of the third picosecond. A good rule of thumb is that all dimensions in the cube need to be referenced in any query, either in the where clause or the select statement. (In fact, any time you know that the dataset only contains a single member, such as when all simulations have variant set of WT, or that you know for a particular structure only a single simulation was run at 298 degrees, you can omit the reference to that dimension. However, this should be done with caution, as dimensions that are not sliced will by default return values that are the sums of all members of the referenced dimension.)

'Select' defines which data will be presented in the result set and their format. In the 'select' statement one defines data points or data sets and then assigns one or more of these points or sets to each axis. One can choose return multi-dimensional results, however, as the default client can only display two-dimensional results we are limiting the sample queries to display results in two dimensions, for simplicity and clarity. In the simplest case, one can select the x, y, and z coordinate measures "on columns" and atoms "on rows". This would return to you a two dimensional table of three columns, one each for x, y, and z, and rows for each atom.

For more complex cases, 'select' allows one a great deal of flexibility in defining sets. For instance, sets can be filtered. If one is interested in performing calculations only on the c-alpha atoms of a protein, one can filter the set of all atoms in the protein to include only those whose atom name is equal to "CA". One can also navigate the levels of a hierarchical dimension to perform set manipulation - for instance, one can specify a set that contains only the children of the 18th residue of a particular protein. As "atoms" is the next level below residue numbers, this would return a set of all the atoms in the 18th residue.

Crossjoins are also useful in selection statements, especially if one is limiting the result set to two dimensions. Crossjoins essentially compress two dimensions into one, transforming what would be a two-dimensional array into a linear collection of pairwise matches.

'With' is an optional statement, but it is the part of the query that allows one to define novel calculations and convert properties to measures for display. Given coordinate data, for instance, one can define a measure 'distance' in the 'with' statement, that then provides a metric for calculating distance between two points. One can also turn a property such as "atom name" into a measure so that it can be displayed in the results. 'With' also can be used to define named sets, using much the same syntax as is used when they are defined in the select statement. These sets can then be referred to in the select statement by name.

Annotated Example of an MDX query (distance metric): with SET res1 as ' Filter(Descendants([ID].[Hierarchy].[structure].[1a2pa0], 2), [ID].[Hierarchy].CurrentMember.Properties("Atom Name")="CA" )'

SET res2 as ' Filter(Descendants([sekret id].[Hierarchy].[structure].[1a2pa0], 2), [sekret id].[Hierarchy].CurrentMember.Properties("Atom Name")="CA" )'

This creates two sets, one drawn from ID, one drawn from ID's equivalent shadow dimension Sekret ID. In each case the set is defined as the descendants two levels down of the structure 1a2pa0, which will return all the atoms in the structure. The sets are then further filtered to include only those atoms whose atom name is equal to "CA", or c-alpha.

MEMBER [Measures].[distance] as ' sqr( ((res1.Item( Rank([sekret id].[Hierarchy].CurrentMember,res2) - 1),[Measures].[x Coord]) -([ID].[Hierarchy].CurrentMember,[Measures].[x Coord]))^2 + ((res1.Item( Rank([sekret id].[Hierarchy].CurrentMember,res2) - 1),[Measures].[y Coord]) -([ID].[Hierarchy].CurrentMember,[Measures].[y Coord]))^2 + ((res1.Item( Rank([sekret id].[Hierarchy].CurrentMember,res2) - 1),[Measures].[z Coord]) -([ID].[Hierarchy].CurrentMember,[Measures].[z Coord]))^2 ) '

This is a metric to calculate the distance between a pair of atoms, one drawn from the ID dimension, the drawn from the ID dimensions but assigned to the Sekret ID shadow dimension.

select res1 * res2 on rows, MEASURES.[distance] on columns

The first clause of the select statement puts the crossjoin of res1 and res2 on rows, meaning that the rows of the result set will be all possible pairwise matches of the c-alpha atoms of the protein. The second clause puts the single measure, distance, on columns, meaning that there will be a single results column showing the distance between all possible pairs of C-alpha atoms.

from [Dynameomics] Dynameomics is the name of the cube. WHERE ( [Time].[1], [Run].[1], [Temp].[298], [Variant].[WT], [Conditions].[cs=0,nbcycl=3,cor=10] )

These are the slicing dimensions. For this query, we are only considering data from the first picosecond, the first run, where the temperature is 298K, the variant set to wild type, and with the above specified conditions.