Reaction Coordinates
One dimensional reaction coordinate
We use 15 physical properties of the protein to construct a multidimensional embedded,
one-dimensional (1D) reaction coordinate that faithfully captures the complex nature of unfolding.
For any particular fold, the distances in property space between structures in the native ensemble
and the unfolding trajectories are calculated; the resulting histogram of the mean distances
is the reaction coordinate. The unfolding reaction coordinates for 188 fold representatives
(1534 simulations and 22.9 μs in explicit water) were calculated. Native, transition, intermediate
and denatured states are readily identified using this reaction coordinate. Principal component analysis
(PCA) and the projections of the first 2 or 3 principal components show distinct clusters for unfolding species [1].
The following guide will describe how to calculate a one-dimensional reaction coordinate for a simulation, it will
also show you how to run PCA on a matrix of normalized properties to find clusters of structures with similar properties.
References
- Toofanny RD, Jonsson AL, Daggett V. A Comprehensive Multidimensional-Embedded One-Dimensional
Reaction Coordinate for Protein Unfolding/Folding. Biophysical Journal, Accepted for publication, 2009.
Guide to Reaction Coordinates
Requirements:
At least 1 native (298K) trajectory with all analyses loaded for the reference ensemble.
Analyses loaded for unfolding trajectory
NB. For the reaction coordinate calculation we ran over Dynameomics we used a generic set of 15 properties based on preexisting analyses. You can use these too, but to improve the differentiation between unfolding species you may want to use protein specific properties, such as core Cα-RMSD. You will need to have these tables loaded in the database, if you wish to do this. You can, of course, calculate these in the database if necessary.
There are a series of scripts that need to be run in order for you to calculate the one-dimensional reaction coordinate.
Step 1: Collates the raw properties (chosen for the multi property reaction coordinate) by simulation
Step 2:
Collates the relevant simulation table created in step 1 for a given protein e.g. the raw properties for the 298 K and the 498 K simulations.
Step 3:
Normalizes the properties by range across the 298 K and 498 K simulations
Step 4:
Runs the comparator program which takes the normalized table of properties from Step 3. The output from comparator will be the mean distance to the reference ensemble for each time point.
Step 5:
Runs the histogram maker program on the output from step 4.
Step 1
DECLARE @sim_id INT
DECLARE @table_prefix VARCHAR(64)
DECLARE @server_name VARCHAR(32)
DECLARE @database_name VARCHAR(32)
DECLARE @first BIT
DECLARE @view_cmd VARCHAR(MAX)
DECLARE @struct_id INT
DECLARE @residues INT
SET @first = 1
DECLARE sim_id_cursor CURSOR FOR select distinct a.sim_id, struct_id, server_name,database_name from [directory].dbo.[master_property_v] as a
join [directory].dbo.[master_simulation_simulationgroup] as b
on a.sim_id=b.sim_id
where a.property_abbrev='coord' and b.sim_grp_id=1000 and a.sim_id=1626 and a.sim_id not in (2650, 2651, 2652, 2653, 2654)
OPEN sim_id_cursor
FETCH NEXT FROM sim_id_cursor INTO @sim_id, @struct_id, @server_name, @database_name
SET @table_prefix = '[' + @server_name + '].[' + @database_name + '].'
SELECT @residues=MAX(residue_number) FROM [helix].[directory].[dbo].[Master_ID] WHERE struct_id = @struct_id
WHILE @@FETCH_STATUS = 0
BEGIN
SELECT @residues=MAX(residue_number) FROM [helix].[directory].[dbo].[Master_ID] WHERE struct_id = @struct_id
SET @table_prefix = '[' + @server_name + '].[' + @database_name + '].'
SET @view_cmd = '
SELECT a.sim_id, a.struct_id, a.struct_inst, f.temp, f.run, a.time_step, a.rmsd, b.dissimilarity_score, c.native,
c.non_native, e.radgyr, e.end2end, d.Main_chain, d.side_chain, d.polar, d.non_polar,d.mc_polar,d.mc_non_polar,
d.sc_polar, d.sc_non_polar, d.total, g.fraction_helix, h.fraction_beta INTO dbo.[PS_n_'+ CAST(@sim_id AS VARCHAR) +']
FROM ' + @table_prefix + 'dbo.[rmsd_'+ CAST(@sim_id AS VARCHAR) +'] AS a
Join
' + @table_prefix + 'dbo.[congen_'+ CAST(@sim_id AS VARCHAR)+'] as b on a.time_step=b.time_step
Join
' + @table_prefix + 'dbo.[contact_' + CAST(@sim_id AS VARCHAR)+ '] as c on a.time_step=c.time_step
Join
(SELECT time_step, SUM(main_chain)as Main_chain, SUM(side_chain)as Side_chain,
SUM(polar)as Polar, SUM(non_polar)as Non_polar, SUM(mc_polar)as MC_polar,
SUM(mc_non_polar)as MC_non_polar, SUM(sc_polar)as SC_polar,
SUM(sc_non_polar) as SC_non_polar, SUM(total)as Total
from ' + @table_prefix + 'dbo.[sasa_'+ CAST(@sim_id AS VARCHAR)+']
group by [time_step]) as d on a.time_step=d.time_step
Join
' + @table_prefix + 'dbo.[radgee_'+ CAST(@sim_id AS VARCHAR)+'] as e on a.time_step=e.time_step
Join
(select distinct Sim_id, struct_id, temp, run from [Helix].[directory].dbo.[master_property_v]) as f on a.struct_id = b.struct_id and a.sim_id=f.sim_id
LEFT OUTER Join
(SELECT time_step, CAST(COUNT(*) AS DECIMAL)/CAST(' + CAST(@residues AS VARCHAR) + ' AS DECIMAL) AS fraction_helix FROM ' + @table_prefix + 'dbo.[dssp_'+ CAST(@sim_id AS VARCHAR)+'] WHERE ss_id = 12 GROUP BY time_step) AS g on a.time_step=g.time_step
LEFT OUTER JOIN
(SELECT time_step, CAST(COUNT(*) AS DECIMAL)/CAST(' + CAST(@residues AS VARCHAR) + ' AS DECIMAL) AS fraction_beta FROM
' + @table_prefix + 'dbo.[dssp_' + CAST(@sim_id AS VARCHAR)+'] WHERE ss_id = 3 OR ss_id = 4 GROUP BY time_step)
as h on a.time_step=h.time_step
'
FETCH NEXT FROM sim_id_cursor INTO @sim_id, @struct_id, @server_name, @database_name
PRINT @view_cmd
END
CLOSE sim_id_cursor
DEALLOCATE sim_id_cursor
Step 2
SET NOCOUNT ON
DECLARE @pdb4 VARCHAR(4)
DECLARE @struct_id INT
DECLARE pdb4_cursor CURSOR FOR SELECT DISTINCT a.pdb4 FROM [helix].[Directory].[dbo].[Master_Property_v] as a
JOIN (SELECT DISTINCT i.pdb4
FROM [helix].[Directory].[dbo].[Master_Property_v]as i
JOIN [helix].[directory].dbo.[master_simulation_simulationgroup] as j
on i.sim_id=j.sim_id
WHERE (j.sim_grp_id=1000) )as b
on a.pdb4=b.pdb4
WHERE a.pdb4='1gcp'
OPEN pdb4_cursor
FETCH NEXT FROM pdb4_cursor INTO @pdb4
WHILE @@FETCH_STATUS = 0
BEGIN
DECLARE @sim_id INT
DECLARE @table_prefix VARCHAR(64)
DECLARE @server_name VARCHAR(32)
DECLARE @database_name VARCHAR(32)
DECLARE @first BIT
DECLARE @view_cmd VARCHAR(MAX)
SET @first = 1
DECLARE sim_id_cursor CURSOR FOR
SELECT DISTINCT sim_id, server_name, database_name FROM [helix].[Directory].[dbo].[Master_Property_v] WHERE pdb4 = pdb4 and sim_id<>2628 and sim_id<>2629 and sim_id<>2630 and sim_id<>2631
ORDER BY sim_id
OPEN sim_id_cursor
FETCH NEXT FROM sim_id_cursor INTO @sim_id, @server_name, @database_name
SET @table_prefix = '[' + @server_name + '].[' + @database_name + '].'
WHILE @@FETCH_STATUS = 0
BEGIN
IF (@first = 1)
SET @view_cmd = ' CREATE VIEW dbo.Property_space_n_' + @pdb4 + '_498_all
AS
SELECT sim_id, struct_id, struct_inst, temp, run, time_step, native,non_native, radgyr, end2end, main_chain, side_chain, polar, non_polar, mc_polar, mc_non_polar, sc_polar, sc_non_polar, total, fraction_helix, fraction_beta
FROM dbo.ps_n_' + CAST(@sim_id AS VARCHAR) + ''
ELSE
SET @view_cmd =
@view_cmd + '
UNION ALL
SELECT sim_id, struct_id, struct_inst, temp, run, time_step, native, non_native, radgyr, end2end, main_chain, side_chain, polar, non_polar, mc_polar, mc_non_polar, sc_polar, sc_non_polar, total, fraction_helix, fraction_beta
FROM dbo.ps_n_' + CAST(@sim_id AS VARCHAR) + ''
SET @first = 0
FETCH NEXT FROM sim_id_cursor INTO @sim_id, @server_name, @database_name
END
CLOSE sim_id_cursor
DEALLOCATE sim_id_cursor
EXEC (@view_cmd)
FETCH NEXT FROM pdb4_cursor INTO @pdb4
END
CLOSE pdb4_cursor
DEALLOCATE pdb4_cursor
Step 3
DECLARE @pdb4 VARCHAR(4)
DECLARE @table_prefix VARCHAR(64)
DECLARE @server_name VARCHAR(32)
DECLARE @database_name VARCHAR(32)
DECLARE @first BIT
DECLARE @view_cmd VARCHAR(MAX)
DECLARE @struct_id INT
DECLARE @residues INT
SET @first = 1
DECLARE sim_id_cursor CURSOR FOR SELECT DISTINCT a.pdb4 FROM [helix].[Directory].[dbo].[Master_Property_v]as a
JOIN [helix].[directory].dbo.[master_simulation_simulationgroup] as b
on a.sim_id=b.sim_id
WHERE
b.sim_grp_id=1000
and a.pdb4='1gcp' and
a.sim_id<>2628 and a.sim_id<>2629 and a.sim_id<>2630 and a.sim_id<>2631
ORDER BY a.pdb4
OPEN sim_id_cursor
FETCH NEXT FROM sim_id_cursor INTO @pdb4
WHILE @@FETCH_STATUS = 0
BEGIN
SET @view_cmd =
'SELECT b.sim_id,b.struct_id, b.struct_inst, b.temp, b.run, b.time_step,
(cast(b.native as float) - cast (a.min_native as float))/(cast (a.max_native as float) - cast(a.min_native as float)) as norm_native,
(cast(b.non_native as float) - cast (a.min_non_native as float))/(cast(a.max_non_native as float)- cast(a.min_non_native as float)) as norm_non_native,
(cast(b.radgyr as float) - cast(a.min_radgyr as float))/(cast(a.max_radgyr as float) - cast(a.min_radgyr as float)) as norm_radgyr,
(cast(b.end2end as float) - cast(a.min_end2end as float))/ (cast(a.max_end2end as float) - cast(a.min_end2end as float)) as norm_end2end,
(b.main_chain - a.min_main_chain)/(a.max_main_chain - a.min_main_chain) as norm_main_chain,
(b.side_chain - a.min_side_chain)/(a.max_side_chain - a.min_side_chain) as norm_side_chain,
(b.polar - a.min_polar)/(a.max_polar - a.min_polar) as norm_polar,
(b.non_polar - a.min_non_polar) / (a.max_non_polar - a.min_non_polar) as norm_non_polar,
(b.mc_polar - a.min_mc_polar) / (a.max_mc_polar - a.min_mc_polar) as norm_mc_polar,
(b.mc_non_polar - a.min_mc_non_polar) / (a.max_mc_non_polar - a.min_mc_non_polar) as norm_mc_non_polar,
(b.sc_polar - a.min_sc_polar) / (a.max_sc_polar - a.min_sc_polar) as norm_sc_polar,
(b.sc_non_polar - a.min_sc_non_polar) / (a.max_sc_non_polar - a.min_sc_non_polar) as norm_sc_non_polar,
(b.total - a.min_total_sasa)/(a.max_total_sasa - a.min_total_sasa) as norm_total_sasa,
(cast(b.fraction_helix as float) - cast(a.min_fraction_helix as float))/(cast(a.max_fraction_helix as float) - cast(a.min_fraction_helix as float)) as norm_fraction_helix,
(cast(b.fraction_beta as float) - cast (a.min_fraction_beta as float))/(cast(a.max_fraction_beta as float) - cast(a.min_fraction_beta as float)) as norm_fraction_beta
into dbo.[Property_space_n_norm_' + @pdb4 + '_498_all] FROM dbo.[property_space_n_' + @pdb4 + '_498_all] as b
JOIN (
select
max(native) as max_native, min(native) as min_native,
max(non_native) as max_non_native, min(non_native) as min_non_native,
max(radgyr)as max_radgyr, min(radgyr)as min_radgyr,
max(end2end) as max_end2end, min(end2end) as min_end2end,
max(main_chain) as max_main_chain, min(main_chain) as min_main_chain,
max(side_chain) as max_side_chain, min(side_chain) as min_side_chain,
max(polar) as max_polar, min(polar) as min_polar,
max(non_polar)as max_non_polar, min(non_polar)as min_non_polar,
max(mc_polar) as max_mc_polar, min(mc_polar) as min_mc_polar,
max(mc_non_polar) as max_mc_non_polar, min(mc_non_polar) as min_mc_non_polar,
max(sc_polar) as max_sc_polar, min(sc_polar) as min_sc_polar,
max(sc_non_polar) as max_sc_non_polar, min(sc_non_polar) as min_sc_non_polar,
max(total) as max_total_sasa, min(total) as min_total_sasa,
max(fraction_helix) as max_fraction_helix, min(fraction_helix) as min_fraction_helix,
max(fraction_beta) as max_fraction_beta, min(fraction_beta) as min_fraction_beta
from dbo.[property_space_n_' + @pdb4 + '_498_all]
)as a on 1=1
order by b.[time_step]'
FETCH NEXT FROM sim_id_cursor INTO @pdb4
PRINT @view_cmd
END
CLOSE sim_id_cursor
DEALLOCATE sim_id_cursor
Step 4
DECLARE @pdb4 VARCHAR(4)
DECLARE @table_prefix VARCHAR(64)
DECLARE @server_name VARCHAR(32)
DECLARE @database_name VARCHAR(32)
DECLARE @first BIT
DECLARE @view_cmd VARCHAR(MAX)
DECLARE @struct_id INT
DECLARE @residues INT
SET @first = 1
DECLARE sim_id_cursor CURSOR FOR SELECT DISTINCT a.pdb4 FROM [helix].[Directory].[dbo].[Master_Property_v]as a
JOIN [helix].[directory].dbo.[master_simulation_simulationgroup] as b
on a.sim_id=b.sim_id
WHERE
b.sim_grp_id=1000
and pdb4 in ('1a6n', '1cvz', '1hgu','1ifc','1ixa', '1tmc', '2afp', '2giw')
OPEN sim_id_cursor
FETCH NEXT FROM sim_id_cursor INTO @pdb4
WHILE @@FETCH_STATUS = 0
BEGIN
SET @view_cmd =
'CREATE TABLE dbo.[Property_space_rxn_n_' + @pdb4 + '] (
[sim_id] INT not null,
[struct_id] INT not null,
[struct_inst] INT not null,
[temp] INT not null,
[run] INT not null,
[time_step] INT not null,
[Mean_distance_to_reference] REAL not null)
INSERT into dbo.[Property_space_rxn_n_' + @pdb4 + '] EXECUTE comparator @input_name=''[Helix].[property_space].dbo.[property_space_n_norm_' + @pdb4 + '_498_all]'', @ignore_self = 0, @min_reference_time_step =0, @min_sample_time_step=0
'
FETCH NEXT FROM sim_id_cursor INTO @pdb4
PRINT @view_cmd
END
CLOSE sim_id_cursor
DEALLOCATE sim_id_cursor
Step 5
DECLARE @pdb4 VARCHAR(4)
DECLARE @table_prefix VARCHAR(64)
DECLARE @server_name VARCHAR(32)
DECLARE @database_name VARCHAR(32)
DECLARE @first BIT
DECLARE @view_cmd VARCHAR(MAX)
DECLARE @struct_id INT
DECLARE @residues INT
SET @first = 1
DECLARE sim_id_cursor CURSOR FOR SELECT DISTINCT a.pdb4 FROM [helix].[Directory].[dbo].[Master_Property_v]as a
JOIN [helix].[directory].dbo.[master_simulation_simulationgroup] as b
on a.sim_id=b.sim_id
WHERE
b.sim_grp_id=2 and a.pdb4<>'1iom' and a.pdb4<>'4wbc' and a.server_name='turn'
ORDER BY a.pdb4
OPEN sim_id_cursor
FETCH NEXT FROM sim_id_cursor INTO @pdb4
WHILE @@FETCH_STATUS = 0
BEGIN
SET @view_cmd =
'CREATE TABLE dbo.[Property_space_n_histogram_' + @pdb4 + '] (
[sim_id] INT not null,
[struct_id] INT not null,
[struct_inst] INT not null,
[temp] INT not null,
[run] INT not null,
[bin_midpoint] REAL not null,
[bin_population] INT not null,
[bin_fraction] REAL)
INSERT into dbo.[Property_space_n_histogram_' + @pdb4 + '] EXECUTE pistogramor @input_name = ''[Turn].[property_space].dbo.[Property_space_rxn_n_' + @pdb4 + '_498_all]'', @bins = 100
'
FETCH NEXT FROM sim_id_cursor INTO @pdb4
EXEC (@view_cmd)
END
CLOSE sim_id_cursor
DEALLOCATE sim_id_cursor
Running PCA
We run principal component analysis on the normalized properties. This reduces this multi-dimensional data
into principal components ordered by the amount of variance they capture. We can thus vizualize
15 properties in 2 or 3 dimensions. We can also determine which properties are contributing the
most to the variance in each principal component these are called Loadings.
PCA is run in Mathematica, it takes the normalised properties table created by step 3.
<<MultivariateStatistics`
dbTmp=OpenSQLConnection[JDBC["jtds_sqlserver","SERVER;database=DATABASE",Username"USER",Password"PASSWORD"];
mtx=Transpose[SQLExecute[dbDir,"SELECT * From [helix].[property_space].dbo.[Property_space_normalise_498_all_100ps] order by struct_id, temp, run, time_step "]];
mtx=(mtx//.Null0);
mtxa=Take[mtx, 6];
mtx= Take[mtx, -15];
mtx=Transpose[mtx];
orig=mtx;
mtx=PrincipalComponents[mtx];
ListPlot[mtx[[All,{1,2}]]]
pcs=mtx[[All,{1,2, 3}]];
cor=Correlation[mtx, orig];
cor//MatrixForm
{matU,matD, matV}=SingularValueDecomposition[Correlation[orig]];
varcap=MapThread[#1[[#2]]/Tr[matD]&,{matD,Range[1,Length[matD]]}]
ArrayPlot[cor]
mtxb=Join[mtxa,Transpose[pcs]];
mtxb=Transpose[mtxb];
Export["pc1vpc2vpc3_all_2.csv",mtxb]
Export ["PC_coefficents_all_2.csv", cor]
Export ["Variance-captured_PCA_all_2.csv", varcap]