# Clustering

When studying nucleation it is often useful to use a clustering atoms to determine how many atoms are in the largest crystalline nucleus. The implementation of this approach in PLUMED is detailed in this paper. A typical input in that paper for calcluating the number of atoms in the largest cluster is shown below:

Click on the labels of the actions for more information on what each action computes
tested onv2.9
tested onmaster
# Ccalculate the coordination numbers of the atoms
lq: COORDINATIONNUMBERCalculate the coordination numbers of atoms so that you can then calculate functions of the distribution of More details SPECIESthe list of atoms for which the symmetry function is being calculated and the atoms that can be in the environments=1-100 SWITCHthe switching function that it used in the construction of the contact matrix. Options for this keyword are explained in the documentation for LESS_THAN.={CUBIC D_0=0.45 D_MAX=0.55}
# Calculate the contact matrix for the atoms for which we calculated the coordinaion numbers
cm: CONTACT_MATRIXAdjacency matrix in which two atoms are adjacent if they are within a certain cutoff. More details GROUPspecifies the list of atoms that should be assumed indistinguishable=lq SWITCHthe input for the switching function that acts upon the distance between each pair of atoms. Options for this keyword are explained in the documentation for LESS_THAN.={CUBIC D_0=0.45 D_MAX=0.55}
# Do a clustering using the contact matrix above
dfs: DFSCLUSTERINGFind the connected components of the matrix using the depth first search clustering algorithm. More details MATRIXYou should use ARG instead of this keyword which was used in older versions of PLUMED and is provided for back compatibility only=cm
# Sum the coordination numbers for the atoms in the largest cluster
clust1: CLUSTER_PROPERTIESCalculate properties of the distribution of some quantities that are part of a connected component More details CLUSTERSthe label of the action that does the clustering=dfs ARGcalculate the sum of the arguments calculated by this action for the cluster=lq CLUSTER which cluster would you like to look at 1 is the largest cluster, 2 is the second largest, 3 is the the third largest and so on=1 SUM calculate the sum of all the quantities

This input is fine but it is also somewhat unweildy and a little confusing. The problem is that you have to calculate the coordination numbers of all the atoms in order to do the clustering and (unless you have a deep understanding of the way the code is implemented) it is not clear why. With the new sytax you can achieve the same result as follows:

Click on the labels of the actions for more information on what each action computes
tested onv2.9
tested onmaster
# Calculate the contact matrix.  This action computes a 100x100 matrix
cm: CONTACT_MATRIXAdjacency matrix in which two atoms are adjacent if they are within a certain cutoff. More details GROUPspecifies the list of atoms that should be assumed indistinguishable=1-100 SWITCHthe input for the switching function that acts upon the distance between each pair of atoms. Options for this keyword are explained in the documentation for LESS_THAN.={CUBIC D_0=0.45 D_MAX=0.55}
# Do a clustering using the contact matrix that was computed above as input
# This action returns a 100 dimensional vector. If element i of this matrix
# is equal to 5 this means that atom i in the input to the contact matrix above
# is part of the 5th largest cluster.  
dfs: DFSCLUSTERINGFind the connected components of the matrix using the depth first search clustering algorithm. More details ARGthe input matrix=cm
# This next action returns a vector with 100 elements. If element i is equal to 1 then atom 
# i is part of the largest cluster.  If it is equal to zero then it is part of some 
# other cluster.
c1: CLUSTER_WEIGHTSSetup a vector that has one for all the atoms that form part of the cluster of interest and that has zero for all other atoms. More details CLUSTERSthe label of the action that does the clustering=dfs CLUSTER which cluster would you like to look at 1 is the largest cluster, 2 is the second largest, 3 is the the third largest and so on=1
# Now calculate the coordination numbers using the usual matrix multiplication trick
ones: ONESCreate a constant vector with all elements equal to one This action is a shortcut. More details SIZEthe number of ones that you would like to create=100
coords: MATRIX_VECTOR_PRODUCTCalculate the product of the matrix and the vector More details ARGthe label for the matrix and the vector/scalar that are being multiplied=cm,ones
# Multiply the coordination numbers by c1.  We now have a vector where element i is equal to the 
# coordiation number of atom i if atom i is part of the largest cluster and zero otherwise.
fcoords: CUSTOMCalculate a combination of variables using a custom expression. More details ARGthe values input to this function=coords,c1 FUNCthe function you wish to evaluate=x*y PERIODICif the output of your function is periodic then you should specify the periodicity of the function=NO
# And lastly sum the coordination numbers of the atoms in the largest cluster
coordsum: SUMCalculate the sum of the arguments More details ARGthe vector/matrix/grid whose elements shuld be added together=fcoords PERIODICif the output of your function is periodic then you should specify the periodicity of the function=NO

This new syntax is much more clear as the clustering operation is performed on the CONTACT_MATRIX directly. The vector returned by the DFSCLUSTERING object then tells you which cluster each atom belongs to. You can thus use simple logical operations on this vector to determine the properties for all your clusters. Furthermore, you don’t even need to use the coordination numbers. If you simply want to calculate the number of atoms in the largest cluster you can use the following input:

Click on the labels of the actions for more information on what each action computes
tested onv2.9
tested onmaster
# Calculate the contact matrix.  This action computes a 100x100 matrix
cm: CONTACT_MATRIXAdjacency matrix in which two atoms are adjacent if they are within a certain cutoff. More details GROUPspecifies the list of atoms that should be assumed indistinguishable=1-100 SWITCHthe input for the switching function that acts upon the distance between each pair of atoms. Options for this keyword are explained in the documentation for LESS_THAN.={CUBIC D_0=0.45 D_MAX=0.55}
# Do the clustering
dfs: DFSCLUSTERINGFind the connected components of the matrix using the depth first search clustering algorithm. More details ARGthe input matrix=cm
# Get a 100 element vector that has ones for those atoms that are part of the largest cluster
c1: CLUSTER_WEIGHTSSetup a vector that has one for all the atoms that form part of the cluster of interest and that has zero for all other atoms. More details CLUSTERSthe label of the action that does the clustering=dfs CLUSTER which cluster would you like to look at 1 is the largest cluster, 2 is the second largest, 3 is the the third largest and so on=1   
# Sum the vector above to get the number of atoms in the largest cluster
suml: SUMCalculate the sum of the arguments More details ARGthe vector/matrix/grid whose elements shuld be added together=c1 PERIODICif the output of your function is periodic then you should specify the periodicity of the function=NO