Methods > Simulation Software

Simulation Software

The state of molecular mechanics simulations and their analysis has evolved considerably since its inception in the mid-1960s [1, 2]. This radical evolution has been driven by increased scientific knowledge in conjunction with faster and cheaper computing hardware. As our knowledge level improves – often by co-application of experiment and theory to individual problems [3] -  aging, monolithic software codes have been discarded in favor of those that are modern [1], extensible, and easier to use.     Experimentalists want an approachable molecular mechanics tool that allows them to ask and rapidly answer questions regarding their systems of interest. Similarly, theoreticians desire a reliable, validated base upon which they can quickly build new tools to probe aspects of molecular behavior. Extensibility and ease of use have been problems for molecular mechanics simulation and analysis environments.

The desire for easy to use and extensible molecular mechanics platforms is accompanied by the need for software environments that can take advantage of the inexpensive multi-processor hardware that has become available over the past decade. Further, there is increased multi-disciplinary dialog between computer science and structural biology such that new approaches to traditionally hard computational problems have been outlined – but not yet fully incorporated into the codes available to the scientific community. For these reasons, we have written a scalable, parallel software environment for performing and analyzing molecular mechanics simulations: in lucem Molecular Mechanics (ilmm, © Beck, D. A. C., Alonso, D. O. V., Daggett V, University of Washington, 2000-2008).

Several fundamental aspects of the methods employed by ilmm contribute significantly to the computational expense of molecular mechanics calculations with our methods. First and foremost is the choice to use a classical potential model that explicitly represents solvent [4]. Similarly, all solute atoms are modeled explicitly [5]. Second, there is a minimum spherical cutoff range distance that is required to accurately reproduce experimental observations – typically at least 8 Å [6], although longer cutoffs are always better.

Finally, in order to perform a biophysically correct simulation, the time-step for numerical integration must be small enough to assure against discontinuities in the trajectory, which will spoil conservation of fixed parameters of the underlying ensemble [2, 5, 7]. With the methods employed in ilmm, we commonly use a 2 femtosecond time step. As such, the final contribution to computational expense is the highly repetitive nature of the calculations: 1 picosecond of MD requires 500 iterations of the energy function calculation and integration cycle. A microsecond of simulation time requires 500,000,000 iterations.

These three contributors to computational expense conspire and complicate analysis of the MD from an algorithm optimization perspective. The inclusion of all atoms in the solute and solvent, in conjunction with the use of longer cutoff ranges must be addressed together in any attempt to reduce the exponential effect they have on the wall-clock time required to perform simulations.

Sampling in molecular simulations is almost always the most difficult aspect to address – that is, how much simulation time and / or how many simulations are required to accurately sample the event(s) of interest. Many simulations are attempting to document processes that occur on timescales of 100s or 1000s of nanoseconds. As such, we are constantly pushing the boundaries of the MD timescale to extend our simulations [8]. Mathematically, the standard deviation of means varies as 1/N2 [9] where N is the number of samples / timesteps, such that longer simulations can provide more precise results.

Given the desire for longer simulations and the importance in using realistic methodologies, we have developed and implemented several computing approaches to reduce the computational expense of molecular mechanics calculations and simulations without sacrificing the fidelity of the underlying algorithms. First: parallelization of computational tasks – i.e. splitting the work in a logical way over a number of computational elements (i.e. central processing units or CPUs). With the proliferation of inexpensive dual and quad CPU computing platforms, parallel computing has become an attractive solution to reducing the real-world time required for computationally intensive tasks. In addition, the governments of American and European nations have invested heavily in grantee accessible-supercomputing platforms with hundreds and even thousands of CPUs. Under perfect parallelization strategies, the time required for a computation is reduced to 1 / P where P CPUs are available. Perfect parallelization is impossible to achieve, but it can be approached with aggressive load balancing – the act of equalizing the amount of work participating CPUs perform.

The second significant advancement in ilmm is its algorithm for neighbor discovery. This action involves the identification of atoms to include in the non-bonded term of potential energy function discussed in [7]. The problem of neighbor discovery is not unique to MM – it is found in other physics simulation arenas including astrophysics. However, solutions from astrophysics are generally not appropriate for MM as galaxies and solar system have densities that are non-uniform whereas those in MM are quite uniform.

There are a number of algorithms presented for MM neighbor discovery in the literature [2, 10, 11], however, they all based on transformations of the same approach. In fact, hash3d, is different primarily in that we have formalized the treatment of the problem in terms of a hash. Hash functions are essentially rapid index and retrieval mechanisms – usually applied to one dimensional data [12]. Hash3d is a three-dimensional extension to this approach. As the factors of computational expense in molecular simulations are linked, so are the solutions: parallelization was applied within the hash3d algorithm to further accelerate neighbor discovery in the presence of multiple CPUs.

The computational techniques used to improve the speed of molecular simulations can also be employed in the subsequent analyses of the simulation trajectories. For example, in the calculation of the solvent accessible surface area (SASA) for a solute using the algorithm of Lee and Richards [13], it is necessary to examine the overlap of neighboring atoms’ Connelly surface with a given probe radius. Finding the neighboring atoms is an O(N2) problem where N is the number of atoms. As such, the aforementioned hash3d can be used to accelerate this process and reduce the time required for the SASA calculation.

References

  1. Levitt, M., The birth of computational structural biology. Nature Structural Biology, 2001. 8(5): p. 392-393.
  2. Allen, M.P. and D.J. Tildesley, Computer Simulation of Liquids. 1987, Oxford: Oxford University Press.
  3. Daggett, V., Molecular dynamics simulations of the protein unfolding/folding reaction. Accounts of Chemical Research, 2002. 35(6): p. 422-429.
  4. Levitt, M., M. Hirshberg, R. Sharon, K.E. Laidig, and V. Daggett, Calibration and testing of a water model for simulation of the molecular dynamics of proteins and nucleic acids in solution. Journal of Physical Chemistry B, 1997. 101(25): p. 5051-5061.
  5. Levitt, M., M. Hirshberg, R. Sharon, and V. Daggett, Potential-Energy Function and Parameters for Simulations of the Molecular-Dynamics of Proteins and Nucleic-Acids in Solution. Computer Physics Communications, 1995. 91(1-3): p. 215-231.
  6. Beck, D.A.C., R. Armen, and V. Daggett, Cutoff size need not strongly influence molecular dynamics results on solvated polypeptides. Biochemistry, 2005. 44(2): p. 609-616.
  7. Beck, D.A.C. and V. Daggett, Methods for Molecular Dynamics Simulations of Protein Folding / Unfolding in Solution. Methods, 2004. 34(1): p. 112-120.
  8. Daggett, V., Long timescale simulations. Current Opinion in Structural Biology, 2000. 10(2): p. 160-164.
  9. Taylor, D.L., Molecular-Dynamics of Actin Invitro and Invivo - a Fluorescence Approach. Biophysical Journal, 1982. 37(2): p. A89-A89.
  10. Petrella, R.J., I. Andricioaei, B.R. Brooks, and M. Karplus, An improved method for nonbonded list generation: Rapid determination of near-neighbor pairs. Journal of Computational Chemistry, 2003. 24(2): p. 222-231.
  11. Yip, V. and R. Elber, Calculations of a List of Neighbors in Molecular-Dynamics Simulations. Journal of Computational Chemistry, 1989. 10(7): p. 921-927.
  12. Kruse, R.L., C.L. Tondo, and B.P. Leung, Data structures and program design in C. 2nd ed. 1997, Upper Saddle River, N.J.: Prentice Hall. xvi, 671.
  13. Lee, B. and F.M. Richards, Interpretation of Protein Structures - Estimation of Static Accessibility. Journal of Molecular Biology, 1971. 55(3): p. 379-&.