Input files
All files and data required to perform this tutorial are on the GitHub repository. They can either be accessed by cloning the repository to your machine;
git clone https://github.com/blake-armstrong/binding-tutorial-PLUMED
or by following the download instructions in the README.md
page on the repository to download the basic ZIP file.
Once downloaded, the input files required to run a simulation with OpenMM and PLUMED are provided in the simulation_files
directory. Included in the scripts
directory are bash scripts that will automate setting up the multiple-walker simulation files and will analyse the output for you. These scripts are not strictly necessary to perform this tutorial, as the actions they perform can be carried out manually by the user. All the data has already been produced and analysed, and is presented in the ‘data’ directory. We will discuss how that data was produced and analysed so you can do it all yourself.
OpennMM & PLUMED Installation
The first thing we need to do is get a working installation of OpenMM and openmmplumed. Comprehensive installation instructions for OpenMM. We recommend using conda
, as it is the most straight forward way to install the software without needing to worry about dependencies. See instructions for installing conda. After testing the installation with:
python -m openmm.testInstallation
All platforms available to you will be shown. In this tutorial we use the CPU platform, and the input files provided set up a simulation to use the CPU platform, which every user should have available by default. The instructions to install the OpenMM PLUMED plugin are also very simple when using conda. This will also install the command line plumed tool which will be used for analysis later on.
Running a simulation
Running the bash script setup_walkers.sh
bash scripts/setup_walkers.sh
or
bash scripts/setup_walkers.sh NUMWALKERS
will create a set of directory trees within the data
directory labelled with cylinder radii ranging from 0.0 nm to 0.5 nm. NUMWALKERS
is an integer for the number of walkers desired with the default being 4 (NB this can be changed but must be greater than 1 for multiple walker metadynamics). These directories are where the multiple walker simulations will be run for systems with various radii for the flat-bottom cylindrical restraining potential. This setup script has been included so that the user can play around with different numbers of walkers in an easy way, if they so desire. By default, it will create directories and setups for four walkers. The details on how to run a single walker are discussed further later.
If we navigate to the directory for a radius of 0.0 nm,
cd data/radius_0.0
here we will find a run.sh
script here that will handle submitting the run.py job for each walker. The run.py
file (which is located in the simulation_files
directory) will allow us to run our simulation using OpenMM through its Python API. The plumed.inp
file which contains the PLUMED specifics. There is an ‘analysis.sh’ script here as well, which will handle the analysis of our simulation output, which is discussed here.
There is extensive documentation for using the OpenMM Python API which we direct you to if you cannot understand some component of the run.py
file while we briefly discuss it here.
run.py
import openmm as mm
import openmm.app as app
import openmm.unit as unit
from openmmplumed import PlumedForce
These lines give us access to the OpenMM code, and the OpenMM PLUMED plugin that can parse a PLUMED input file, interpret it, and provide the instructions to OpenMM.
coordinates = "coord.pdb" # Name of our coordinates file.
forcefield = "argon.xml" # Name of our force field file.
plumed_file = "plumed.inp" # Name of our PLUMED input file.
temperature = 300 * unit.kelvin # Simulation temperature.
timestep = 1e-3 * unit.picosecond # Simulation timestep.
trel = 1 / unit.picosecond # Thermostat relaxation constant for the Langevin thermostat.
thermo_file = "output.out" # Name of the file to write our periodic simulation output to.
nthermo = 10000 # How often to periodicly write simulation output (in steps).
traj_file = "trajectory.dcd" # Name of the file to periodicly write our system coordinates to.
ntraj = 10000 # How often to periodicly write our system coordinates (in steps).
nsteps = 2000000 # Total number of steps to simulate for (this corresponds to 2 ns).
Simulation parameter definitions that will be used further in the file. When providing a unit with a dimension, we make use of the OpenMM unit object. By default, each walker will run for 2 ns which should take 5 - 10 minutes to complete. This should be enough to produce a PMF that is sufficiently converged to analyse for our purposes due to the simplicity of the model system.
pdb = app.PDBFile(coordinates)
forcefield = app.ForceField(forcefield)
platform = mm.Platform.getPlatformByName("CPU")
properties = {}
system = forcefield.createSystem(
pdb.topology,
nonbondedMethod=app.CutoffNonPeriodic, # Run a non-periodic simulation.
nonbondedCutoff=0.9 * unit.nanometer,
useDispersionCorrection=False,
constraints=None,
rigidWater=False,
)
The details of this section are better explained in the OpenMM docs.
for n, force in enumerate(system.getForces()):
if force.getName() == "CMMotionRemover":
system.removeForce(n)
break
This section is needed to prevent the thermal energy from being removed from the system as we have only one particle. It is not important to understand, as it would not generally be necessary for a real simulation.
with open(plumed_file, "r") as f:
script = f.read()
system.addForce(PlumedForce(script))
This is where a PLUMED input script is read and our forces will be created and added to the system.
integrator = mm.LangevinMiddleIntegrator(temperature, trel, timestep)
simulation = app.Simulation(pdb.topology, system, integrator, platform, properties)
simulation.context.setPositions(pdb.positions)
simulation.context.setVelocitiesToTemperature(temperature, random.randrange(99999))
simulation.reporters.append(
app.StateDataReporter(
thermo_file,
nthermo,
separator="\t",
step=False,
time=True,
potentialEnergy=True,
kineticEnergy=False,
totalEnergy=False,
temperature=True,
volume=True,
density=True,
progress=False,
remainingTime=False,
speed=True,
elapsedTime=False,
)
)
simulation.reporters.append(app.DCDReporter(traj_file, ntraj, enforcePeriodicBox=False))
simulation.reporters.append(
app.StateDataReporter(
stdout,
int(nsteps / 50),
separator="\t",
step=False,
time=False,
potentialEnergy=True,
kineticEnergy=False,
totalEnergy=False,
temperature=True,
volume=True,
density=False,
progress=True,
remainingTime=True,
speed=True,
elapsedTime=True,
totalSteps=nsteps,
)
)
simulation.step(nsteps)
The details of this section are better explained in the OpenMM docs.
plumed.inp
Here we will discuss all of the sections of the PLUMED input file (plumed.inp).
RESTARTActivate restart. More details
Required for multiple walkers. Ensures each walker reads HILLS from all walkers when initialised.
UNITSThis command sets the internal units for the code. More details ENERGYthe units of energy=kj/mol LENGTHthe units of lengths=nm TIMEthe units of time=ps
Simulation units. Same as OpenMM units for consistency. Not a necessity.
fixed1FIXEDATOMAdd a virtual atom in a fixed position. This action has hidden defaults. More details ATcoordinates of the virtual atom=2.50,2.50,2.50 : d1 : DISTANCECalculate the distance/s between pairs of atoms. More details ATOMSthe pair of atom that we are calculating the distance between=fixed1,1 NOPBC ignore the periodic boundary conditions when calculating distances COMPONENTS calculate the x, y and z components of the distance separately and store them as label
Here we define our z-distance that we will use for the collective variable. We define a fixed point at the middle of our cubic simulation box (50 Å in each direction) and calculate the distance between the fixed point and our one particle, and save it to the d1
variable.
CUSTOMCalculate a combination of variables using a custom expression. More details ... ARGthe values input to this function=d1.x,d1.y VARthe names to give each of the arguments in the function=x,y FUNCthe function you wish to evaluate=0.5*14473*(max(0,sqrt(x^2+y^2)-0.0))^2 PERIODICif the output of your function is periodic then you should specify the periodicity of the function=NO ...rd :
rdb : BIASVALUETakes the value of one variable and use it as a bias More details ARGthe labels of the scalar/vector arguments whose values will be used as a bias on the system=rd
Here we use the x and y components of the d1 variable to create rd
, the flat-bottomed cylindrical restraining potential. We then tell PLUMED to actually use the potential to apply forces throughout the simulation, and not just record its value.
UPPER_WALLSDefines a wall for the value of one or more collective variables, More details ARGthe arguments on which the bias is acting=d1.z ATthe positions of the wall=1.500 KAPPAthe force constant for the wall=14473 lwall : LOWER_WALLSDefines a wall for the value of one or more collective variables, More details ARGthe arguments on which the bias is acting=d1.z ATthe positions of the wall=-0.125 KAPPAthe force constant for the wall=14473uwall :
Here we define upper and lower harmonic walls in the z-dimension. The lower wall emulates the repulsion of a surface, and starts acting on the particle at -0.125 nm in the z direction. The upper wall prevents the binding atom from going too far from the surface
in the z-direction, and starts acting at 1.5 nm.
fixed2FIXEDATOMAdd a virtual atom in a fixed position. This action has hidden defaults. More details ATcoordinates of the virtual atom=2.50,2.50,2.80 : d2 : DISTANCECalculate the distance/s between pairs of atoms. More details ATOMSthe pair of atom that we are calculating the distance between=fixed2,1 NOPBC ignore the periodic boundary conditions when calculating distances COMPONENTS calculate the x, y and z components of the distance separately and store them as label
rd1 : CUSTOMCalculate a combination of variables using a custom expression. More details ... ARGthe values input to this function=d1.x,d1.y,d1.z VARthe names to give each of the arguments in the function=ax,ay,az FUNCthe function you wish to evaluate=sqrt(ax^2+ay^2+az^2) PERIODICif the output of your function is periodic then you should specify the periodicity of the function=NO ...
rd2 : CUSTOMCalculate a combination of variables using a custom expression. More details ... ARGthe values input to this function=d2.x,d2.y,d2.z VARthe names to give each of the arguments in the function=bx,by,bz FUNCthe function you wish to evaluate=sqrt(bx^2+by^2+bz^2) PERIODICif the output of your function is periodic then you should specify the periodicity of the function=NO ...
g : CUSTOMCalculate a combination of variables using a custom expression. More details ... ARGthe values input to this function=rd1,rd2 VARthe names to give each of the arguments in the function=x,y FUNCthe function you wish to evaluate=-30*exp(-((x)^2/(2*(0.1^2))))+15*exp(-((y)^2/(2*(0.1^2)))) PERIODICif the output of your function is periodic then you should specify the periodicity of the function=NO ...
pes : BIASVALUETakes the value of one variable and use it as a bias More details ARGthe labels of the scalar/vector arguments whose values will be used as a bias on the system=g
The following is used to create an arbitrary potential energy surface that mimics the binding profile of an atom normal to a surface. There will be a minimum at [2.5, 2.5, 2.5] nm to represent a stable surface site and a maximum at [2.5, 2.5, 2.8] nm to represent a barrier for leaving the surface.
PRINTPrint quantities to a file. More details FILEthe name of the file on which to output these quantities=colvar STRIDE the frequency with which the quantities of interest should be output=1000 ARGthe labels of the values that you would like to print to the file=*
This will output a collective variable file to monitor that everything in the simulation is proceeding as expected.
FLUSHThis command instructs plumed to flush all the open files with a user specified frequency. More details STRIDEthe frequency with which all the open files should be flushed=1000
This will flush the output every 1000 steps, allowing us to monitor that everything within PLUMED is working as it should while the simulation is running.
#SETTINGS NREPLICAS=3 METADUsed to performed metadynamics on one or more collective variables. More details ... ARGthe labels of the scalars on which the bias will act=d1.z # Active CV. HEIGHTthe heights of the Gaussian hills=2.5 # Gaussian height. Approximately kBT in this case. SIGMAthe widths of the Gaussian hills=0.01 # gaussian width. PACEthe frequency for hill addition=1000 # Gaussian deposition pace. BIASFACTORuse well tempered metadynamics and use this bias factor=18 TEMPthe system temperature - this is only needed if you are doing well-tempered metadynamics=300 # Well-tempered parameters. GRID_MINthe lower bounds for the grid=-0.5 # Grid parameters for d1.z CV. GRID_MAXthe upper bounds for the grid=1.8 # WALKERS_DIRshared directory with the hills files from all the walkers=../BIAS # Multiple walkers directory for reading/writing BIAS. WALKERS_RSTRIDEstride for reading hills files=1000 # Multiple walkers reading pace. WALKERS_IDwalker id=REPLACEWALKERID # ID of current walker. WALKERS_Nnumber of walkers=REPLACETOTALWALKERS # Total number of walkers. ... METAD
This tells PLUMED to run multiple walker metadynamics with a collective variable defined as the z-component of the d1
distance. WALKERS_ID and WALKERS_N are set to placeholder names which will get replaced when submitting the job through the run.sh
script. While the multiple walkers run, the HILLS files will be written to and read from the ‘BIAS’ directory. If you wish to run a single walker simulation, the RESTART
keyword should be removed, alongside the last four lines inside the METAD
section. The HILLS
file will then be written inside the current directory instead of a dedicated BIAS
directory. The analysis scripts will then need to be modified accordingly.
run.sh
The run.sh
file will handle submitting all of the walkers for a given simulation.
export OPENMM_CPU_THREADS=1
The OPENMM_CPU_THREADS
environmental variable is used for OpenMM to assign the number of threads per job on the CPU. If this isn’t set OpenMM will use all of your available CPU cores. In our case where our system only has one atom, it is the most efficient to run with only one thread per walker.
pids=""
for w in Walker_*; do
cd ${w}
pid=$(python3 ../../scripts/run.py > screen.out & echo $!)
echo "Running ${w} with PID ${pid}"
pids="${pids}${pid} "
cd ..
done
echo "To kill walkers call ' kill -9 ${pids} '"
This for loop iterates through every walker directory and runs python3 ../../simulation_files/run.py
to begin the metadynamics simulation, and submits the job as a background process with &
. The script saves the process ID and outputs it to the screen afterwards to make it easy to kill the background processes if something goes wrong. On UNIX-based operating systems you can run either the top
or htop
command to monitor the background processes.
Beginning the simulation
To begin your simulations for a given radius run the command bash run.sh
from the data/radius_0.x
directory. If you do not modify the run.py
file, each walker will run for 100 ns to give a combined time of 400 ns. This amount of time is enough for the PMFs to adequately converge. Feel free to run for longer to get an even smoother profile! Once your simulations have finished running, it is time to analyse.