5.3. ALM: Theoretical background

Interatomic force constants (IFCs)

The starting point of the computational methodology is to approximate the potential energy of interacting atoms by a Taylor expansion with respect to atomic displacements by

(1)\[\begin{split}&U - U_{0} = \sum_{n=1}^{N} U_{n} = U_{1} + U_{2} + U_{3} + \cdots, \\ &U_{n} = \frac{1}{n!} \sum_{\substack{\ell_{1}\kappa_{1}, \dots, \ell_{n}\kappa_{n} \\ \mu_{1},\dots, \mu_{n}}} \Phi_{\mu_{1}\dots\mu_{n}}(\ell_{1}\kappa_{1};\dots;\ell_{n}\kappa_{n}) \; u_{\mu_{1}}(\ell_{1}\kappa_{1})\cdots u_{\mu_{n}}(\ell_{n}\kappa_{n}).\end{split}\]

Here, \(u_{\mu}(\ell\kappa)\) is the atomic displacement of \(\kappa\)th atom in the \(\ell\)th unit cell along \(\mu\)th direction, and \(\Phi_{\mu_{1}\dots\mu_{n}}(\ell_{1}\kappa_{1};\dots;\ell_{n}\kappa_{n})\) is the \(n\)th-order interatomic force constant (IFC).

Symmetry relationship between IFCs

The are several relationships between IFCs which may be used to reduce the number of independence IFCs.

  • Permutation

IFC should be invariant under the exchange of the triplet \((\ell,\kappa,\mu)\), e.g.,

\[\Phi_{\mu_{1}\mu_{2}\mu_{3}}(\ell_{1}\kappa_{1};\ell_{2}\kappa_{2};\ell_{3}\kappa_{3})=\Phi_{\mu_{1}\mu_{3}\mu_{2}}(\ell_{1}\kappa_{1};\ell_{3}\kappa_{3};\ell_{2}\kappa_{2})=\dots.\]
  • Periodicity

Since IFCs should depend on interatomic distances, they are invariant under a translation in units of the lattice vector, namely

\[\Phi_{\mu_{1}\mu_{2}\dots\mu_{n}}(\ell_{1}\kappa_{1};\ell_{2}\kappa_{2};\dots;\ell_{n}\kappa_{n})=\Phi_{\mu_{1}\mu_{2}\dots\mu_{n}}(0\kappa_{1};\ell_{2}-\ell_{1}\kappa_{2};\dots;\ell_{n}-\ell_{1}\kappa_{n}).\]
  • Crystal symmetry

A crystal symmetry operation maps an atom \(\vec{r}(\ell\kappa)\) to another equivalent atom \(\vec{r}(LK)\) by rotation and translation. Since the potential energy is invariant under any crystal symmetry operations, IFCs should transform under a symmetry operation as follows:

(2)\[\sum_{\nu_{1},\dots,\nu_{n}}\Phi_{\nu_{1}\dots\nu_{n}}(L_{1}K_{1};\dots;L_{n}K_{n}) O_{\nu_{1}\mu_{1}}\cdots O_{\nu_{n}\mu_{n}} = \Phi_{\mu_{1}\dots\mu_{n}}(\ell_{1}\kappa_{1};\dots;\ell_{n}\kappa_{n}),\]

where \(O\) is the rotational matrix of the symmetry operation. Let \(N_{s}\) be the number of symmetry operations, there are \(N_{s}\) relationships between IFCs which may be used to find independent IFCs.

Note

In the implementation of ALM, symmetricaly independent IFCs are searched in Cartesian coordinate where the matrix element \(O_{\mu\nu}\) is 0 or \(\pm1\) in all symmetry operations except for those of hexagonal (trigonal) lattice. Also, except for hexagonal (trigonal) systems, the product \(O_{\nu_{1}\mu_{1}}\cdots O_{\nu_{n}\mu_{n}}\) in the left hand side of equation (2) becomes non-zero only for a specific pair of \(\{\nu\}\) (and becomes 0 for all other \(\{\nu\}\)s). Therefore, let \(\{\nu^{\prime}\}\) be such a pair of \(\{\nu\}\), the equation (2) can be reduced to

(3)\[\Phi_{\nu_{1}^{\prime}\dots\nu_{n}^{\prime}}(L_{1}K_{1};\dots;L_{n}K_{n}) = s \Phi_{\mu_{1}\dots\mu_{n}}(\ell_{1}\kappa_{1};\dots;\ell_{n}\kappa_{n}),\]

where \(s=\pm1\). The code employs equation (3) instead of equation (2) to reduce the number of IFCs. If IFCs of the left-hand side and the right-hand side of equation (3) are equivalent and the coupling coefficient is \(s=-1\), the IFC is removed since it becomes zero. For hexagonal (trigonal) systems, there can be symmetry operations where multiple terms in the left-hand side of equation (2) become non-zero. For such cases, equation (2) is not used to reduce the number of IFCs. Alternatively, the corresponding symmetry relationships are imposed as constraints between IFCs in solving fitting problems.

Constraints between IFCs

Since the potential energy is invariant under rigid translation and rotation, it may be necessary for IFCs to satisfy corresponding constraints.

The constraints for translational invariance are given by

(4)\[\sum_{\ell_{1}\kappa_{1}}\Phi_{\mu_{1}\mu_{2}\dots\mu_{n}}(\ell_{1}\kappa_{1};\ell_{2}\kappa_{2};\dots;\ell_{n}\kappa_{n}) = 0,\]

which should be satisfied for arbitrary pairs of \(\ell_{2}\kappa_{2},\dots,\ell_{n}\kappa_{n}\) and \(\mu_{1},\dots,\mu_{n}\). The code alm imposes equation (4) by default (ICONST = 1).

The constraints for rotational invariance are

\[\begin{split}&\sum_{\ell^{\prime}\kappa^{\prime}} (\Phi_{\mu_{1}\dots\mu_{n}\nu}(\ell_{1}\kappa_{1};\dots;\ell_{n}\kappa_{n};\ell^{\prime}\kappa^{\prime}) r_{\mu}(\ell^{\prime}\kappa^{\prime}) - \Phi_{\mu_{1}\dots\mu_{n}\mu}(\ell_{1}\kappa_{1};\dots;\ell_{n}\kappa_{n};\ell^{\prime}\kappa^{\prime}) r_{\nu}(\ell^{\prime}\kappa^{\prime})) \\ & \hspace{10mm} + \sum_{\lambda = 1}^{n}\sum_{\mu_{\lambda}^{\prime}} \Phi_{\mu_{1}\dots\mu_{\lambda}^{\prime}\dots\mu_{n}}(\ell_{1}\kappa_{1};\dots;\ell_{\lambda}\kappa_{\lambda};\dots;\ell_{n}\kappa_{n}) (\delta_{\mu,\mu_{\lambda}}\delta_{\nu,\mu_{\lambda}^{\prime}} - \delta_{\nu,\mu_{\lambda}}\delta_{\mu,\mu_{\lambda}^{\prime}}) = 0,\end{split}\]

which must be satisfied for arbitrary pairs of \((\ell_{1}\kappa_{1},\dots,\ell_{n}\kappa_{n};\mu_{1},\dots,\mu_{n};\mu,\nu)\). This is complicated since \((n+1)\)th-order IFCs (first line) are related to \(n\)th-order IFCs (second line).

For example, the constraints for rotational invariance related to harmonic terms can be found as

(5)\[\begin{split}&\sum_{\ell_{2}\kappa_{2}} (\Phi_{\mu_{1}\nu}(\ell_{1}\kappa_{1};\ell_{2}\kappa_{2})r_{\mu}(\ell_{2}\kappa_{2})-\Phi_{\mu_{1}\mu}(\ell_{1}\kappa_{1};\ell_{2}\kappa_{2})r_{\nu}(\ell_{2}\kappa_{2})) \notag \\ & \hspace{10mm} + \Phi_{\nu}(\ell_{1}\kappa_{1})\delta_{\mu,\mu_{1}} - \Phi_{\mu}(\ell_{1}\kappa_{1})\delta_{\nu,\mu_{1}} = 0,\end{split}\]

and

(6)\[\begin{split}& \sum_{\ell_{3}\kappa_{3}} (\Phi_{\mu_{1}\mu_{2}\nu}(\ell_{1}\kappa_{1};\ell_{2}\kappa_{2};\ell_{3}\kappa_{3}) r_{\mu}(\ell_{3}\kappa_{3}) \notag - \Phi_{\mu_{1}\mu_{2}\mu}(\ell_{1}\kappa_{1};\ell_{2}\kappa_{2};\ell_{3}\kappa_{3}) r_{\nu}(\ell_{3}\kappa_{3})) \\ & \hspace{10mm} + \Phi_{\nu\mu_{2}}(\ell_{1}\kappa_{1};\ell_{2}\kappa_{2})\delta_{\mu,\mu_{1}} - \Phi_{\mu\mu_{2}}(\ell_{1}\kappa_{1};\ell_{2}\kappa_{2})\delta_{\nu,\mu_{1}} \notag \\ & \hspace{10mm} + \Phi_{\mu_{1}\nu}(\ell_{1}\kappa_{1};\ell_{2}\kappa_{2})\delta_{\mu,\mu_{2}} - \Phi_{\mu_{1}\mu}(\ell_{1}\kappa_{1};\ell_{2}\kappa_{2})\delta_{\nu,\mu_{2}} = 0.\end{split}\]

When NORDER = 1, equation (5) will be considered if ICONST = 2, whereas equation (6) will be neglected. To further consider equation (6), please use ICONST = 3, though it may enforce a number of harmonic IFCs to be zero since cubic terms don’t exist in harmonic calculations (NORDER = 1).

Estimate IFCs by linear regression

Basic notations

From the symmetrically independent set of IFCs and the constraints between them for satisfying the translational and/or rotational invariance, we can construct an irreducible set of IFCs \(\{\Phi_{i}\}\). Let us denote a column vector comprising the \(N\) irreducible set of IFCs as \(\boldsymbol{\Phi}\). Then, the Taylor expansion potential (TEP) defined by equation (1) is written as

\[U_{\mathrm{TEP}} = \boldsymbol{b}^{T}\boldsymbol{\Phi}.\]

Here, \(\boldsymbol{b} \in \mathbb{R}^{1\times N}\) is a function of atomic displacements \(\{u_{i}\}\) defined as \(\boldsymbol{b} = \partial U / \partial \boldsymbol{\Phi}\). The atomic forces based on the TEP is then given as

(7)\[\boldsymbol{F}_{\mathrm{TEP}} = - \frac{\partial U_{\mathrm{TEP}}}{\partial \boldsymbol{u}} = - \frac{\partial \boldsymbol{b}^{T}}{\partial \boldsymbol{u}} \boldsymbol{\Phi} = A \boldsymbol{\Phi},\]

where \(A \in \mathbb{R}^{3N_{s} \times N}\) with \(N_{s}\) being the number of atoms in the supercell, and \(\boldsymbol{u}^{T} = (u_{1}^{x}, u_{1}^{y}, u_{1}^{z}, \dots, u_{N_{s}}^{x}, u_{N_{s}}^{y}, u_{N_{s}}^{z})\) is the vector comprising \(3N_{s}\) atomic displacements in the supercell. Note that the matrix \(A\) and force vector \(\boldsymbol{F}_{\mathrm{TEP}}\) depend on the atomic configuration of the supercell. To make this point clearer, let us denote them as \(A(\boldsymbol{u})\) and \(\boldsymbol{F}_{\mathrm{TEP}}(\boldsymbol{u})\).

To estimate the IFC vector \(\boldsymbol{\Phi}\) by linear regression, it is usually necessary to consider several different displacement patterns. Let us suppose we have \(N_d\) displacement patterns and atomic forces for each pattern obtained by DFT. Then, equation (7) defined for each displacement pattern can be combined to a single equation as

\[\boldsymbol{\mathscr{F}}_{\mathrm{TEP}} = \mathbb{A} \boldsymbol{\Phi},\]

where \(\boldsymbol{\mathscr{F}}^{T} = [\boldsymbol{F}^{T}(\boldsymbol{u}_{1}), \dots, \boldsymbol{F}^{T}(\boldsymbol{u}_{N_d})]\) and \(\mathbb{A}^{T} = [A^{T}(\boldsymbol{u}_{1}),\dots,A^{T}(\boldsymbol{u}_{N_d})]\).

Ordinary least-squares

In the ordinary least-squares (LMODEL = least-squares), IFCs are estimated by solving the following problem:

(8)\[\boldsymbol{\Phi}_{\mathrm{OLS}} = \mathop{\rm argmin}\limits_{\boldsymbol{\Phi}} \frac{1}{2N_{d}} \|\boldsymbol{\mathscr{F}}_{\mathrm{DFT}} - \boldsymbol{\mathscr{F}}_{\mathrm{TEP}} \|^{2}_{2} = \mathop{\rm argmin}\limits_{\boldsymbol{\Phi}} \frac{1}{2N_{d}} \|\boldsymbol{\mathscr{F}}_{\mathrm{DFT}} - \mathbb{A} \boldsymbol{\Phi} \|^{2}_{2}.\]

Therefore, the IFCs are determined so that the residual sum of squares (RSS) is minimized. To determine all elements of \(\boldsymbol{\Phi}_{\mathrm{OLS}}\) uniquely, \(\mathbb{A}^{T}\mathbb{A}\) must be full rank. When the fitting is successful, alm reports the relative fitting error \(\sigma\) defined by

(9)\[\sigma = \sqrt{\frac{\|\boldsymbol{\mathscr{F}}_{\mathrm{DFT}} - \mathbb{A} \boldsymbol{\Phi} \|^{2}_{2}}{\|\boldsymbol{\mathscr{F}}_{\mathrm{DFT}}\|_{2}^{2}}},\]

where the denominator is the square sum of the DFT forces.

Elastic-net regression

In the elasitc-net optimization (LMODEL = elastic-net), IFCs are estimated by solving the following optimization problem:

(10)\[\boldsymbol{\Phi}_{\mathrm{enet}} = \mathop{\rm argmin}\limits_{\boldsymbol{\Phi}} \frac{1}{2N_{d}} \|\boldsymbol{\mathscr{F}}_{\mathrm{DFT}} - \mathbb{A} \boldsymbol{\Phi} \|^{2}_{2} + \alpha \beta \| \boldsymbol{\Phi} \|_{1} + \frac{1}{2} \alpha (1-\beta) \| \boldsymbol{\Phi} \|_{2}^{2},\]

where \(\alpha\) is a hyperparameter that controls the trade-off between the sparsity and accuracy of the model, and \(\beta \; (0 < \beta \leq 1)\) is a hyperparameter that controls the ratio of the \(L_{1}\) and \(L_{2}\) regularization terms. \(\alpha\) and \(\beta\) must be given by input tags L1_ALPHA and L1_RATIO, respectively.

An optimal value of \(\alpha\) can be estimated, for example, by cross-validation (CV). A \(n\)-fold CV can be performed by setting the CV-tag properly.