Rationale
Structurally similar proteins need not share significant sequence identity. The early observation of structurally and functionally similar proteins (such as hemoglobin and myoglobin) led to the natural separation of these structures into discrete sets, or folds[1, 2]. However, as more structures were determined and more folds were discovered it became clear that not all members of a fold were necessarily linked by a common function [3]. Also, the determination of structures with conserved structural cores surrounded by variable structural regions complicated the classification of new structures into existing folds. This begs the question: to what degree are structural variations tolerated between a domain a potential cousin before they no longer belong to the same fold. Different weighting of the factors determining structural similarity led to domain dictionaries with different fold classifications. We can minimize the effect of these idosyncrasies by deriving a consensus from publicly available domain dictionaries. We have previously demonstrated the application of this method to SCOP, CATH and the Dali Domain Dictionary to generate a consensus domain dictionary[4-6].
SCOP and CATH, gold standards among hierarchal domain dictionaries, have been the subject of detailed comparison. In general, both weigh potential functional and evolutionary relationships between fold members with different strengths at different levels of the hierarchy. In their early formulations, both domain dictionaries represented different design methodologies. Whereas SCOP was hand curated by experts, CATH was maintained by a combination of automated processes and expert curation [5, 6]. However, SCOP has assumed more automated pre-classification of new structures in response to the increasing rate of structure determination, minimizing this methodological distinction [7]. There also exist non-hierarchal methods of classifying domains into folds. The Dali Domain Dictionary is one such method, and relies on clustering a set of all vs. all similarity scores generated between domains using the Dali structural similarity method. Since their inception, each domain dictionary has had to face a single overwhelming problem: the rate of structure determination has quickly outpaced the ability of any process to categorize new structures. Different responses were formulated. Both SCOP and CATH no longer guarantee that any release will necessarily cover structures in the PDB up to the release date and instead focus on categorizing putative novel topologies.
The consensus domain dictionary (CDD) is the backbone of our Dynameomics mass molecular dynamics initiative [9]. It is the basis for our selection of a topologically diverse sample of targets. Therefore, it is imperative that the CDD be kept up-to-date, so that we can identify novel topologies as they are classified and observe potential splits within and merges between our metafolds as classifications shift. Since we use the contents of the CDD as potential targets for simulation of the folding pathway, it is important that we identify domains that appear to be autonomous folding units. There exist a broad category of domains that cannot be understood as folding units, but instead as artifacts of multi-domain or complex structures. The details in the generation of this CDD have been published elsewhere [10].
References
- Perutz MF, Rossmann MG, Cullis AF, Muirhead H, Will G, North AC. Structure of haemoglobin: a
three-dimensional Fourier synthesis at 5.5-A. resolution, obtained by X-ray analysis.
Nature, 185: 416-22, 1960.
- Kendrew JC, Bodo G, Dintzis HM, Parrish RG, Wyckoff H, Phillips DC. A three-dimensional model of the
myoglobin molecule obtained by x-ray analysis.
Nature, 181: 662-6, 1958.
- Nagano N, Orengo CA, Thornton JM. One fold with many functions: the evolutionary
relationships between TIM barrel families based on their sequences, structures and functions.
Journal of Molecular Biology, 321: 741-65, 2002.
- Day R, Beck DA, Armen RS, Daggett V. A consensus view of fold space: combining SCOP, CATH, and the Dali
Domain Dictionary. Protein Science, 12: 2150-60, 2003.
- Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation
of sequences and structures. Journal of Molecular Biology, 247: 536-40, 1995.
- Greene LH, Lewis TE, Addou S, Cuff A, Dallman T, Dibley M, Redfern O, Pearl F,
Nambudiry R, Reid A, Sillitoe I, Yeats C, Thornton JM, Orengo CA. The CATH domain structure database: new protocols and classification levels give
a more comprehensive resource for exploring evolution. Nucleic Acids Research, 35:
D291-7, 2007.
- Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, Murzin
AG. Data growth and its impact on the SCOP database: new developments. Nucleic Acids Research,
36: D419-25, 2008.
- Holm L, Kääriäinen S, Rosenström P, Schenkel A. Searching protein structure databases with DaliLite v.3.
Bioinformatics, 24: 2780-1, 2008.
- Beck DA, Jonsson AL, Schaeffer RD, Scott KA, Day R, Toofanny RD, Alonso DO,
Daggett V. Dynameomics: mass annotation of protein dynamics and unfolding in water by high-throughput atomistic
molecular dynamics simulations. Protein Eng Des Sel, 21: 353-68, 2008.
- Schaeffer RD, Jonsson AL, Simms AM, Daggett V. Generation of a Consensus Protein Domain Dictionary. In Preparation.