Tutorial 1: Single-structure refinement with EMMIVox
These are the steps for refining a single structural model into a cryo-EM maps with EMMIVox.
Note: all the python scripts are contained in the scripts
folder. Please run each step of the tutorial in the dedicated subfolder of tutorials/1-refinement
referred to as Working directory.
0. System setup
-
Create the topology files and the initial conformation in GROMACS format starting from the deposited PDB (
7P6A.pdb
). We have done this for you using CHARMM-GUI. CHARMM-GUI can also complete your system if there are missing residues and create the force field for a small-molecule, if present.Note: The conformation in the
step3_input.gro
file (GROMACS format) produced by CHARMM-GUI is identical to one in the deposited PDB. However, CHARMM-GUI will probably translate and rotate your initial PDB, which will then not fit the input cryo-EM map anymore. Furthermore, atoms order might be different. -
CHARMM-GUI renumbers residues and chains in
step3_input.gro
, so we first need to make sure we have a PDB file consistent with thegro
atom order. We can create this PDB file using this command:bash renumber.sh step3_input.gro step3_input.pdb
-
Add to the index file created by CHARMM-GUI (
index.ndx
) two custom groups:-
System-MAP
, which contains all the atoms that will be used to generate the cryo-EM map. You can do this withmake_ndx.py
inscripts
usingMDAnalysis
selection syntax. Hydrogen atoms and the carboxylate oxygens of glutamic/aspartic acid will be automatically removed from this group, as they are not used in PLUMED to calculate the cryo-EM map. A second group, calledSystem-MAP-H
will also be created to include these missing atoms (mostly to write them in the trajectory file).python make_ndx.py step3_input.gro "protein" System-MAP --ndx index.ndx
-
System-XTC
, which contains all the atoms that will be written to the GROMACS trajectory file (xtc format). In this case, we want to write all the atoms used for the cryo-EM calculation plus hydrogen and carboxylate oxygens of glutamic/aspartic acid.python make_XTC_ndx.py System-MAP-H --ndx index.ndx
-
Working directory: 0-Building
1. Map preparation
-
At this stage we need to download the cryo-EM full map
emd_13223.map
. the two half-mapsemd_13223_half_map_1.map
andemd_13223_half_map_2.map
, and the PDB7P6A.pdb
, which will be needed to zone the density map close to the model (optional, but speeds up things a lot). -
To convert the input cryo-EM map to PLUMED format, calculate an error map from the two half maps, and optionally filter voxels by correlation, you need to execute this:
python cryo-EM_preprocess.py emd_13223.map 0.9 emd_plumed.dat --zone 3.5 --zone_PDB 7P6A.pdb --zone_sel "protein" --halfmaps emd_13223_half_map_1.map emd_13223_half_map_2.map > log.preprocess
Note 1: The value
0.9
is the cutoff to exclude correlated voxels (above this threshold). If you want to keep all the voxels of the input map, set this to1.0
.emd_plumed.dat
is the name of the output map in PLUMED format.--zone
can be used to keep only voxels within a certain distance (here 3.5 Ang) from the model specified by--zone_PDB
.Note 2: If you want to give a look at the map after the preparation, you can inspect
emd_plumed.mrc
. This is the map that will be used in PLUMED for modelling and corresponds toemd_plumed.dat
. -
Now we align the cryo-EM map in PLUMED format (
emd_plumed.dat
) to the GROMACS conformation (step3_input.pdb
). This is needed because CHARMM-GUI probably moved the initial structure during system setup.python align-VOXELS.py ../0-Building/step3_input.pdb 7P6A.pdb emd_plumed_aligned.dat emd_plumed.dat --ref_sel "backbone" --mobile_sel "backbone and not altLoc B and not (name O and resid 379)"
Note 3: Since there are some discrepancies between the number of atoms in the two PDBs, we have specified a selection of common atoms for alignment using
MDAnalysis
selection syntax.--ref_sel
refers tostep3_input.pdb
and--mobile_sel
to7P6A.pdb
. You can verify that the alignment is correct by checking the RMSD value in output: it should be zero as the two models are identical.
Working directory: 1-Map-Preparation
2. System equilibration
We need to prepare the system with an energy minimization and equilibration at room temperature. No cryo-EM restraints will be used at this stage.
-
Run energy minimization:
gmx_mpi grompp -f 0-em-steep.mdp -c ../0-Building/step3_input.gro -p ../0-Building/topol.top -o em.tpr
gmx_mpi mdrun -pin on -deffnm em
-
Do a 1-ns NPT equilibration with positional restraints on everything except water and ions:
gmx_mpi grompp -f 1_npt_posres.mdp -c em.gro -n ../0-Building/index.ndx -p ../0-Building/topol.top -r em.gro -o npt_posres.tpr -maxwarn 1
gmx_mpi mdrun -pin on -deffnm npt_posres -nsteps 500000 -update gpu
-
Do a 2-ns NVT equilibration with positional restraints on everything except water and ions:
gmx_mpi grompp -f 2_nvt_posres.mdp -c npt_posres.gro -n ../0-Building/index.ndx -p ../0-Building/topol.top -r em.gro -o nvt_posres.tpr -maxwarn 1
gmx_mpi mdrun -pin on -deffnm nvt_posres -nsteps 1000000 -update gpu
Working directory: 2-Equilibration
3. Optimizing the cryo-EM map scaling factor
We postprocess the NVT equilibration trajectory to obtain an optimal scaling factor between experimental and predicted map.
bash optimize_scale.sh 8
Here we are using 8 CPU cores (and the GPU) to postprocess the trajectory with PLUMED. The optimal value of the scale will be printed in BEST_SCALE
at the end of the optimization. This will be used for both single-structure refinement and ensemble modelling.
Working directory: 3-Map-Scaling
4. Production
To run the production simulation for single-structure refinement we need to:
-
Prepare the
plumed_EMMI.dat
file with production parameters using:bash prepare_PLUMED_input.sh
The generated PLUMED input file, called
plumed_EMMI.dat
, should look like this:
# include topology info MOLINFOThis command is used to provide information on the molecules that are present in your system. More details STRUCTUREa file in pdb format containing a reference structure=../3-Map-Scaling/step3_input_xtc.pdb WHOLE The reference structure is whole, i # define map atoms system-map: GROUPDefine a group of atoms so that a particular list of atoms can be referenced with a single label in definitions of CVs or virtual atoms. More details NDX_FILEthe name of index file (gromacs syntax)=../0-Building/index.ndx NDX_GROUPthe name of the group to be imported (gromacs syntax) - first group found is used by default=System-MAP # make map atoms whole WHOLEMOLECULESThis action is used to rebuild molecules that can become split by the periodic boundary conditions. More details ... ADDREFERENCE Define the reference position of the first atom of each entity using a PDB file EMST only for backward compatibility, as of PLUMED 2 ENTITY0the atoms that make up a molecule that you wish to align=system-map STRIDE the frequency with which molecules are reassembled=4 ... WHOLEMOLECULES # create EMMI score EMMIVOXBayesian single-structure and ensemble refinement with cryo-EM maps. More details ... # name of this action LABELa label for the action so that its output can be referenced in the input to other actions=emmi # general parameters TEMPtemperature=300.0 NL_STRIDEneighbor list update frequency=50 NL_DIST_CUTOFFneighbor list distance cutoff=1.0 NL_GAUSS_CUTOFFneighbor list Gaussian sigma cutoff=3.0 # define atoms for cryo-EM restraint and read experimental data ATOMSatoms used in the calculation of the density map, typically all heavy atoms=system-map DATA_FILEfile with cryo-EM map=../1-Map-Preparation/emd_plumed_aligned.dat # info about the experimental map NORM_DENSITYintegral of experimental density=684.480896 RESOLUTIONcryo-EM map resolution=0.19 # data likelihood (or noise model): Marginal SIGMA_MINminimum density error=0.2 GPU calculate EMMIVOX on GPU with Libtorch # output: in production write with the frequency at which XTC/TRR are written STATUS_FILEwrite a file with all the data useful for restart=EMMIStatus WRITE_STRIDEstride for writing status file=5000 # comment this if you have a hetero-complex BFACT_NOCHAIN Do not use chain ID for Bfactor MC # in production, you should sample Bfactors DBFACTBfactor MC step=0.05 MCBFACT_STRIDEBfactor MC stride=500 BFACT_SIGMABfactor sigma prior=0.1 # scale factor SCALEscale factor=1.3 # correlation CORRELATION calculate correlation coefficient ... # in production, apply bias to system emr: BIASVALUETakes the value of one variable and use it as a bias More details ARGthe labels of the scalar/vector arguments whose values will be used as a bias on the system=emmi.scoreb STRIDEthe frequency with which the forces due to the bias should be calculated=4 # print output to file PRINTPrint quantities to a file. More details ARGthe labels of the values that you would like to print to the file=emmi.* FILEthe name of the file on which to output these quantities=COLVAR STRIDE the frequency with which the quantities of interest should be output=5000
-
Run a 10-ns long production run following the instructions below, after setting the number of CPU cores to use (
$OMP_NUM_THREADS
).gmx_mpi grompp -f 4-nvt-production.mdp -c ../2-Equilibration/nvt_posres.gro -n ../0-Building/index.ndx -p ../0-Building/topol.top -o production.tpr
gmx_mpi mdrun -pin on -deffnm production -ntomp $OMP_NUM_THREADS -nsteps 5000000 -plumed plumed_EMMI.dat
Note: Do not add the option
-update gpu
when using GROMACS in combination with PLUMED.
Working directory: 4-Production
5. Postprocessing and validation
-
We first need to extract the conformation with best EMMIVox score from our production run and perform a short energy minimization. To setup all the files needed for minimization, you need to execute this script:
bash prepare_PLUMED_input_emin.sh
-
Now we run energy minimization using 8 CPU cores:
bash run_PLUMED_emin.sh 8
-
After minimization is complete, we need to align
conf_pbc.pdb
to the original cryo-EM map using thetransformation.dat
file created during the map preparation stage:python align-PDBs.py conf_pbc.pdb conf_pbc_aligned.pdb ../1-Map-Preparation/transformation.dat
-
Finally, we add the BFactors column to our aligned model:
python add-BFACT.py conf_pbc_aligned.pdb EMMIStatus conf_EMMIVOX.pdb
-
The output PDB file
conf_EMMIVOX.pdb
is ready to be validated with PHENIX:bash do_PHENIX conf_EMMIVOX.pdb ../1-Map-Preparation/emd_13223.map 1.9 > results.EMMIVOX
where
1.9
is the resolution of the input mapemd_13223.map
in Angstrom. Validation metrics are saved inresults.EMMIVOX
. -
Compare the EMMIVOX-refined model with the deposited PDB:
bash do_PHENIX ../1-Map-Preparation/7P6A.pdb ../1-Map-Preparation/emd_13223.map 1.9 > results.PDB
-
Model fit to the data (CC_mask) can be calculated using our internal script:
python cryo-EM_validate.py ../1-Map-Preparation/emd_13223.map --pdbA=conf_EMMIVOX.pdb --threshold=0.0 > CCmask.EMMIVOX
python cryo-EM_validate.py ../1-Map-Preparation/emd_13223.map --pdbA=../1-Map-Preparation/7P6A.pdb --threshold=0.0 > CCmask.PDB
Working directory: 5-Analysis