Input files preparation

In the zip file available in the GitHub repository you will find all the initial files needed to run this tutorial.

Preliminary molecular dynamics run

As a first step for the VMetaD run, we have to choose the atoms which will constitute the reference frame on which we will calculate the position of the ligand with respect to the protein. To be sure in choosing residues that remain stable (i.e. do not belong to a moving loop), we run a 100 ns-long plain molecular dynamics (MD) simulation at 300 K (that is the temperature at which we will run and analyze all the simulations in this tutorial). This run took 2 h on 4 CPUs of a M1 Max MacBook Pro).

After the simulation, we calculate the per-residue root mean square fluctuations (RMSF), which highlights the more dynamic residues.

Alt text
Per-residue root mean square fluctuation (RMSF) computed from a plain MD simulation of lysozyme-benzene complex. The portion of the plot with light blue background represents the C-terminal domain, where the benzene binds.

We can see that almost all the C-terminal domain (residues 71-162) does not show large fluctuations, and only few residues are above the (arbitrary) RMSF threshold of 0.15 nm. We thus consider to define the reference frame considering all the residues 71-162 but the five ones with RMSF > 0.15 nm (residues 135, 136, 137, 140, and 141).

PDB file for reference frame refitting

To be sure to keep the structure stable during the simulation, we will define the reference frame of VMetaD on the residues selected for the fit. To have a reference structure to be used in PLUMED (see the next section for the input file), we convert the initial structure to the PDB format using gmx editconf (to ensure consistency with the numbering) and keeping only the C-alphas:

gmx editconf -f starting.gro -o ref_ca.pdb
grep "CA" ref_ca.pdb > tmpfile && mv tmpfile ref_ca.pdb

Remember to add the correct values to the b-factor columns (and/or removing unwanted atoms for the fitting procedure) telling PLUMED which are the atoms we want to use for the fitting procedure (see the ref_ca.pdb file in the folder for reference).

Choice of the restraining potential size

We now need to choose the size of the restraint potential. In the original paper we showed that the reliability of the estimates is not affected by the size of the potential. However, we have to keep in mind that we need a part of the box where the ligand can stay far away from the protein in order to represent the unbound state in a satisfactory way. An important point here is that the sphere constraint must be inside the box, otherwise the entropic correction will not accurately account for the loss of configurational space. To visually inspect how large the potential is, and to get a feel for the possible movements of the ligand, we can visualize both the system and the restraint with VMD (downloadable here).

We can open VMD and load the starting.gro structure file in the GitHub folder. After the structure is loaded and the visualization has been set up at your taste, you can open the Tk console and write

pbc box

which draws the cubic box in which the system is inserted.

Now we can generate the atom selection we defined after checking the RMSF:

set sel [atomselect top "resid 71 to 134 or resid 138 to 139 or resid 142 to 162"]

The console should answer

atomselectXX

Where XX is a number. The selection for the reference frame has been defined and named $sel. Now we can compute the position of the center of mass of this set of atoms:

measure center $sel weight mass

The console should answer with the position of the center of mass (in Ångstrom)

35.28876876831055 34.06196594238281 32.041622161865234

Knowing this information, we can draw the sphere with a radius of (for example) 2 nm (20 Å) with the following command

draw sphere {35.28876876831055 34.06196594238281 32.041622161865234} radius 20 resolution 100

Receiving a number as an answer from the console. Such number is the ID of the 3D object we just draw. The drawn sphere is opaque, not allowing us to see inside it; to make it transparent, we need to specify that we want a transparent material

draw material Transparent

We can see that the sphere contains the entire domain, but it is probably too small to represent the unbound state in a precise way. Let’s delete the sphere using the ID of the 3D object (let’s say that it is 14), and plot a new sphere of radius 2.8 nm

draw delete 14
draw sphere {35.28876876831055 34.06196594238281 32.041622161865234} radius 28 resolution 100

You can see the expected result below

Alt text
Cartoon representation of the lysozyme-benzene complex, including the restraining potential applied within a 2.8 nm radius of the reference frame center of mass. The boundaries of the simulation box are also highlighted to show that the sphere is entirely contained by the box.

The PLUMED input file

(You can read the following line-by-line description keeping an eye on the plumed.dat file in the GitHub folder as a reference)

Reference frame setup

We start with the WHOLEMOLECULES instruction, to be sure that lysozyme (ENTITY0) will not be broken by the periodic boundary condition, as well as the benzene molecule (ENTITY1):

Click on the labels of the actions for more information on what each action computes
tested onv2.9
tested onmaster
WHOLEMOLECULESThis action is used to rebuild molecules that can become split by the periodic boundary conditions. More details ENTITY0the atoms that make up a molecule that you wish to align=1-1284 ENTITY1the atoms that make up a molecule that you wish to align=1285-1290

Now that we are sure of the integrity of the structures in PLUMED, we perform the rototranslational fit of the system to make sure that the protein will be in the fixed reference frame position:

Click on the labels of the actions for more information on what each action computes
tested onv2.9
tested onmaster
# --- Click here to reveal hidden parts of input file ---- 
FIT_TO_TEMPLATEThis action is used to align a molecule to a template. More details REFERENCEa file in pdb format containing the reference structure and the atoms involved in the CV=ref_ca.pdb TYPE the manner in which RMSD alignment is performed=OPTIMAL

We then start with the groups definition. We previously prepared a GROMACS index file (index.ndx) with all the groups named as intended. As an alternative, you can also define such groups with atom ids.

Click on the labels of the actions for more information on what each action computes
tested onv2.9
tested onmaster
# --- Click here to reveal hidden parts of input file ---- 
prot_noh: GROUPDefine a group of atoms so that a particular list of atoms can be referenced with a single label in definitions of CVs or virtual atoms. More details NDX_FILEthe name of index file (gromacs syntax)=index.ndx NDX_GROUPthe name of the group to be imported (gromacs syntax) - first group found is used by default=Protein-H
sph: GROUPDefine a group of atoms so that a particular list of atoms can be referenced with a single label in definitions of CVs or virtual atoms. More details NDX_FILEthe name of index file (gromacs syntax)=index.ndx NDX_GROUPthe name of the group to be imported (gromacs syntax) - first group found is used by default=sphere
lig: GROUPDefine a group of atoms so that a particular list of atoms can be referenced with a single label in definitions of CVs or virtual atoms. More details NDX_FILEthe name of index file (gromacs syntax)=index.ndx NDX_GROUPthe name of the group to be imported (gromacs syntax) - first group found is used by default=ligand

We have three groups:

After the definition of the groups, to avoid that the passage in a periodic boundary conditions causes a “jump” of the ligand with respect to the protein, we add a WRAPAROUND instruction:

Click on the labels of the actions for more information on what each action computes
tested onv2.9
tested onmaster
# --- Click here to reveal hidden parts of input file ---- 
WRAPAROUNDRebuild periodic boundary conditions around chosen atoms. More details ATOMSwrapped atoms=lig AROUNDreference atoms=sph

Ending the fitting part of the PLUMED input.

Spherical coordinates definition

We now compute the position of the center of mass of the atoms defining the reference frame ($(x,y,z)=(0,0,0)$ in our CV space), and the center of mass of the ligand:

Click on the labels of the actions for more information on what each action computes
tested onv2.9
tested onmaster
# --- Click here to reveal hidden parts of input file ---- 
sph_center: COMCalculate the center of mass for a group of atoms. More details ATOMSthe list of atoms which are involved the virtual atom's definition=sph
lig_center: COMCalculate the center of mass for a group of atoms. More details ATOMSthe list of atoms which are involved the virtual atom's definition=lig

sph_coord: POSITIONCalculate the components of the position of an atom or atoms. More details ATOMthe atom number=sph_center NOPBC ignore the periodic boundary conditions when calculating distances
lig_coord: POSITIONCalculate the components of the position of an atom or atoms. More details ATOMthe atom number=lig_center NOPBC ignore the periodic boundary conditions when calculating distances

From the position, we can obtain the cartesian coordinates of the ligand in this reference frame

Click on the labels of the actions for more information on what each action computes
tested onv2.9
tested onmaster
# --- Click here to reveal hidden parts of input file ---- 
abs_x: MATHEVALAn alias to the CUSTOM function that can also be used to calaculate combinations of variables using a custom expression. More details ARGthe values input to this function=lig_coord.x,sph_coord.x FUNCthe function you wish to evaluate=x-y PERIODICif the output of your function is periodic then you should specify the periodicity of the function=NO
abs_y: MATHEVALAn alias to the CUSTOM function that can also be used to calaculate combinations of variables using a custom expression. More details ARGthe values input to this function=lig_coord.y,sph_coord.y FUNCthe function you wish to evaluate=x-y PERIODICif the output of your function is periodic then you should specify the periodicity of the function=NO
abs_z: MATHEVALAn alias to the CUSTOM function that can also be used to calaculate combinations of variables using a custom expression. More details ARGthe values input to this function=lig_coord.z,sph_coord.z FUNCthe function you wish to evaluate=x-y PERIODICif the output of your function is periodic then you should specify the periodicity of the function=NO

and via the usual transformation, obtain the final spherical coordinates $(\rho,\theta,\varphi)$

Click on the labels of the actions for more information on what each action computes
tested onv2.9
tested onmaster
# --- Click here to reveal hidden parts of input file ---- 
rho: MATHEVALAn alias to the CUSTOM function that can also be used to calaculate combinations of variables using a custom expression. More details ARGthe values input to this function=abs_x,abs_y,abs_z FUNCthe function you wish to evaluate=sqrt(x*x+y*y+z*z) PERIODICif the output of your function is periodic then you should specify the periodicity of the function=NO
theta: MATHEVALAn alias to the CUSTOM function that can also be used to calaculate combinations of variables using a custom expression. More details ARGthe values input to this function=abs_z,rho FUNCthe function you wish to evaluate=acos(x/y) PERIODICif the output of your function is periodic then you should specify the periodicity of the function=0.,pi
phi: MATHEVALAn alias to the CUSTOM function that can also be used to calaculate combinations of variables using a custom expression. More details ARGthe values input to this function=abs_x,abs_y FUNCthe function you wish to evaluate=atan2(y,x) PERIODICif the output of your function is periodic then you should specify the periodicity of the function=-pi,pi

which will be our CVs.

Restraining

We now have to impose the spherical restraint. We put a UPPER_WALLS which impedes the ligand to go farther than 2.8 nm:

Click on the labels of the actions for more information on what each action computes
tested onv2.9
tested onmaster
# --- Click here to reveal hidden parts of input file ---- 
restr: UPPER_WALLSDefines a wall for the value of one or more collective variables, More details ARGthe arguments on which the bias is acting=rho ATthe positions of the wall=2.8 KAPPAthe force constant for the wall=10000

the $k$ value is 10,000 kJ/mol/nm^2, which means that if the ligand is out by 0.1 nm he will feel a potential of 100 kJ/mol.

One effect that we should take into account is the possibility that the ligand, in advanced phases of the simulation, will try to unfold the protein, being the place occupied by it the volume portion with less history-dependent potential deposited. To limit this phenomenon we will put in place a RMSD restraining that will be removed during the reweighting procedure. To compute the RMSD we will use the same atoms included in the ref_ca.pdb file instruction

Click on the labels of the actions for more information on what each action computes
tested onv2.9
tested onmaster
# --- Click here to reveal hidden parts of input file ---- 
rmsd: RMSDCalculate the RMSD with respect to a reference structure. This action has hidden defaults. More details REFERENCEa file in pdb format containing the reference structure and the atoms involved in the CV=ref_ca.pdb TYPE the manner in which RMSD alignment is performed=OPTIMAL
restr_rmsd: RESTRAINTAdds harmonic and/or linear restraints on one or more variables. This action has hidden defaults. More details ARGthe values the harmonic restraint acts upon=rmsd ATthe position of the restraint=0. KAPPA specifies that the restraint is harmonic and what the values of the force constants on each of the variables are=250.0

Reweighting CVs

As anticipated in the theory part, to compute binding free energy differences we will need to reweight our free energy landscape on new apt CVs which will allow us in discriminating efficiently the bound and unbound states. Like in the original work, here we will use again the distance from the origin of the reference frame $\rho$ and the number of contacts between the protein and the ligand. This will guarantee that if the guest can be considered not in contact with the host (and thus defining the unbound state), even if $\rho$ is large.

The total number of contact $c$ is defined with a switching function

\[c = \sum_{i,j} \frac{ 1 - \left(\frac{r_{ij}}{r_0}\right)^{6} } {1 - \left(\frac{r_{ij}}{r_0}\right)^{12} }\]

which runs on all the (heavy) atoms of the protein and all the atoms of the ligand. This is implemented with COOORDINATION:

Click on the labels of the actions for more information on what each action computes
tested onv2.9
tested onmaster
# --- Click here to reveal hidden parts of input file ---- 
c: COORDINATIONCalculate coordination numbers. This action has hidden defaults. More details GROUPAFirst list of atoms=lig GROUPBSecond list of atoms (if empty, N*(N-1)/2 pairs in GROUPA are counted)=prot_noh R_0The r_0 parameter of the switching function=0.45

where we set a $r_0$ parameter at 0.45 nm.

Volume-based Metadynamics

We now set up the VMetaD:

Click on the labels of the actions for more information on what each action computes
tested onv2.9
tested onmaster
# --- Click here to reveal hidden parts of input file ---- 
METADUsed to performed metadynamics on one or more collective variables. This action has hidden defaults. More details ...
  ARGthe labels of the scalars on which the bias will act=rho,theta,phi
  GRID_MINthe lower bounds for the grid=0,0.,-pi
  GRID_MAXthe upper bounds for the grid=3.5,pi,pi
  SIGMAthe widths of the Gaussian hills=0.1,pi/16.,pi/8
  HEIGHTthe heights of the Gaussian hills=0.5
  PACEthe frequency for hill addition=2000
  BIASFACTORuse well tempered metadynamics and use this bias factor=10
  TEMPthe system temperature - this is only needed if you are doing well-tempered metadynamics=300
  LABELa label for the action so that its output can be referenced in the input to other actions=metad
  CALC_RCT calculate the c(t) reweighting factor and use that to obtain the normalized bias [rbias=bias-rct]
... METAD

The most exotic option used is CALC_RCT, which allows the calculation on the fly of the Tiwary-Parrinello estimator that we will use for reweighting.

Printing

We finally print all the relevant files that we will use for post-processing and analysis.

Click on the labels of the actions for more information on what each action computes
tested onv2.9
tested onmaster
# --- Click here to reveal hidden parts of input file ---- 
PRINTPrint quantities to a file. More details ARGthe labels of the values that you would like to print to the file=metad.* FILEthe name of the file on which to output these quantities=metad_data.dat STRIDE the frequency with which the quantities of interest should be output=200
PRINTPrint quantities to a file. More details ARGthe labels of the values that you would like to print to the file=rmsd,restr_rmsd.bias FILEthe name of the file on which to output these quantities=rmsd_restraint.dat STRIDE the frequency with which the quantities of interest should be output=200
PRINTPrint quantities to a file. More details ARGthe labels of the values that you would like to print to the file=restr.bias FILEthe name of the file on which to output these quantities=sphere_restraint.dat STRIDE the frequency with which the quantities of interest should be output=200
PRINTPrint quantities to a file. More details ARGthe labels of the values that you would like to print to the file=abs_x,abs_y,abs_z FILEthe name of the file on which to output these quantities=xyz_coord.dat STRIDE the frequency with which the quantities of interest should be output=200
PRINTPrint quantities to a file. More details ARGthe labels of the values that you would like to print to the file=rho,theta,phi FILEthe name of the file on which to output these quantities=rtp_coord.dat STRIDE the frequency with which the quantities of interest should be output=200
PRINTPrint quantities to a file. More details ARGthe labels of the values that you would like to print to the file=c,rho FILEthe name of the file on which to output these quantities=coord_rho.dat STRIDE the frequency with which the quantities of interest should be output=200

FLUSHThis command instructs plumed to flush all the open files with a user specified frequency. More details STRIDEthe frequency with which all the open files should be flushed=200

We will have all the VMetaD quantities in metad_data.dat, the restraints data in {rmsd,sphere}_restaint.dat, the $(x,y,z)$ and $(\rho,\theta,\varphi)$ coordinates in xyz_coord.dat and rtp_coord.dat, respectively, and the reweighting CVs in coord_rho.dat.

PLEASE NOTE: all the printing frequencies are synchronized with the VMetaD PACE (every 10 print we deposit 1 gaussian), and it should be synchronized also with trajectory printing (I personally suggest the same frequency used for gaussian deposition). This allows us to restart safely in case of issues and re-run or re-analyze with new CVs the run, if needed.

Last advices before launching the simulation

One issue that can be observed when launching when running VMetaD is the sudden interruption of the simulation with a cryptic error. Please check the position of the ligand: in most cases, the system reached $(\rho,\theta,\varphi)=(0,0,0)$, where the derivatives of the potential cannot be defined, and thus PLUMED sends an error. Despite being annoying, this is perfectly normal, and does not invalidate the run. Please restart it from the previous checkpoint (save a checkpoint often!).

Another consideration regarding the singularity of the reference frame is its position with respect to the actual binding site. In close proximity of it (let’s say less than 2 sigmas in $\rho$) even an extremely small movement in any direction makes the ligand move of several sigmas in $\theta$ and $\varphi$, with the possibility of underestimation of the bias. This could imply the need of longer simulation times to fill up the basin. To limit this effect, we suggest to verify if the origin of the reference frame is farther that 2-3 sigmas in $\rho$ from the binding site (the perfect situation would be setting the origin in a point occupied by the host, at 4-5 Å from the binding site).

Launch!

You can now run the VMetaD tutorial. We advice you to run it for (at least) 500 ns. In this example, we run it for 1 µs.

Back to VMetad home