5.3. ALM: Theoretical background

Interatomic force constants (IFCs)

The starting point of the computational methodology is to approximate the potential energy of interacting atoms by a Taylor expansion with respect to atomic displacements by

(1)\[\begin{split}&U - U_{0} = \sum_{n=1}^{N} U_{n} = U_{1} + U_{2} + U_{3} + \cdots, \\ &U_{n} = \frac{1}{n!} \sum_{\substack{\ell_{1}\kappa_{1}, \dots, \ell_{n}\kappa_{n} \\ \mu_{1},\dots, \mu_{n}}} \Phi_{\mu_{1}\dots\mu_{n}}(\ell_{1}\kappa_{1};\dots;\ell_{n}\kappa_{n}) \; u_{\mu_{1}}(\ell_{1}\kappa_{1})\cdots u_{\mu_{n}}(\ell_{n}\kappa_{n}).\end{split}\]

Here, \(u_{\mu}(\ell\kappa)\) is the atomic displacement of \(\kappa\)th atom in the \(\ell\)th unit cell along \(\mu\)th direction, and \(\Phi_{\mu_{1}\dots\mu_{n}}(\ell_{1}\kappa_{1};\dots;\ell_{n}\kappa_{n})\) is the \(n\)th-order interatomic force constant (IFC).

Symmetry relationship between IFCs

The are several relationships between IFCs which may be used to reduce the number of independence IFCs.

  • Permutation

IFC should be invariant under the exchange of the triplet \((\ell,\kappa,\mu)\), e.g.,

  • Periodicity

Since IFCs should depend on interatomic distances, they are invariant under a translation in units of the lattice vector, namely

  • Crystal symmetry

A crystal symmetry operation maps an atom \(\vec{r}(\ell\kappa)\) to another equivalent atom \(\vec{r}(LK)\) by rotation and translation. Since the potential energy is invariant under any crystal symmetry operations, IFCs should transform under a symmetry operation as follows:

(2)\[\sum_{\nu_{1},\dots,\nu_{n}}\Phi_{\nu_{1}\dots\nu_{n}}(L_{1}K_{1};\dots;L_{n}K_{n}) O_{\nu_{1}\mu_{1}}\cdots O_{\nu_{n}\mu_{n}} = \Phi_{\mu_{1}\dots\mu_{n}}(\ell_{1}\kappa_{1};\dots;\ell_{n}\kappa_{n}),\]

where \(O\) is the rotational matrix of the symmetry operation. Let \(N_{s}\) be the number of symmetry operations, there are \(N_{s}\) relationships between IFCs which may be used to find independent IFCs.


When FCSYM_BASIS = Cartesian, symmetricaly irreducible set of IFCs is searched in the Cartesian coordinate where the matrix element \(O_{\mu\nu}\) is 0 or \(\pm1\) for all space group operations except for those of the hexagonal (trigonal) systems. Also, except for the hexagonal (trigonal) systems, the product \(O_{\nu_{1}\mu_{1}}\cdots O_{\nu_{n}\mu_{n}}\) in the left hand side of equation (2) becomes non-zero only for a specific pair of \(\{\nu\}\) (and becomes 0 for all other \(\{\nu\}\)s). Therefore, let \(\{\nu^{\prime}\}\) be such a pair of \(\{\nu\}\), the equation (2) can be reduced to

(3)\[\Phi_{\nu_{1}^{\prime}\dots\nu_{n}^{\prime}}(L_{1}K_{1};\dots;L_{n}K_{n}) = s \Phi_{\mu_{1}\dots\mu_{n}}(\ell_{1}\kappa_{1};\dots;\ell_{n}\kappa_{n}),\]

where \(s=\pm1\). The code employs equation (3) instead of equation (2) to reduce the number of IFCs. If IFCs of the left-hand side and the right-hand side of equation (3) are equivalent and the coupling coefficient is \(s=-1\), the IFC is removed since it becomes zero. For hexagonal (trigonal) systems, there can be symmetry operations where multiple terms in the left-hand side of equation (2) become non-zero. For such cases, equation (2) is not used to reduce the number of IFCs. Alternatively, the corresponding symmetry relationships are imposed as constraints between IFCs in solving fitting problems.

When FCSYM_BASIS = Lattice (default), the symmetry reduction of IFCs is performed in the lattice coordinate. In this case, all elements of the rotational matrix become either 0 or \(\pm1\) in the \(\boldsymbol{a}_1, \boldsymbol{a}_2, \boldsymbol{a}_3\) basis even for the hexagonal systems. Therefore, the numerical stability of the reduction process, particularly of the construction of rref (reduced row echelon form), is improved even when the number of numerical digits for irrational numbers given in an input file is less than sufficient (double precision).

Constraints between IFCs

Since the potential energy is invariant under rigid translation and rotation, it may be necessary for IFCs to satisfy corresponding constraints.

The constraints for translational invariance are given by

(4)\[\sum_{\ell_{1}\kappa_{1}}\Phi_{\mu_{1}\mu_{2}\dots\mu_{n}}(\ell_{1}\kappa_{1};\ell_{2}\kappa_{2};\dots;\ell_{n}\kappa_{n}) = 0,\]

which should be satisfied for arbitrary pairs of \(\ell_{2}\kappa_{2},\dots,\ell_{n}\kappa_{n}\) and \(\mu_{1},\dots,\mu_{n}\). The code alm imposes equation (4) by default (ICONST = 1).

The constraints for rotational invariance are

\[\begin{split}&\sum_{\ell^{\prime}\kappa^{\prime}} (\Phi_{\mu_{1}\dots\mu_{n}\nu}(\ell_{1}\kappa_{1};\dots;\ell_{n}\kappa_{n};\ell^{\prime}\kappa^{\prime}) r_{\mu}(\ell^{\prime}\kappa^{\prime}) - \Phi_{\mu_{1}\dots\mu_{n}\mu}(\ell_{1}\kappa_{1};\dots;\ell_{n}\kappa_{n};\ell^{\prime}\kappa^{\prime}) r_{\nu}(\ell^{\prime}\kappa^{\prime})) \\ & \hspace{10mm} + \sum_{\lambda = 1}^{n}\sum_{\mu_{\lambda}^{\prime}} \Phi_{\mu_{1}\dots\mu_{\lambda}^{\prime}\dots\mu_{n}}(\ell_{1}\kappa_{1};\dots;\ell_{\lambda}\kappa_{\lambda};\dots;\ell_{n}\kappa_{n}) (\delta_{\mu,\mu_{\lambda}}\delta_{\nu,\mu_{\lambda}^{\prime}} - \delta_{\nu,\mu_{\lambda}}\delta_{\mu,\mu_{\lambda}^{\prime}}) = 0,\end{split}\]

which must be satisfied for arbitrary pairs of \((\ell_{1}\kappa_{1},\dots,\ell_{n}\kappa_{n};\mu_{1},\dots,\mu_{n};\mu,\nu)\). This is complicated since \((n+1)\)th-order IFCs (first line) are related to \(n\)th-order IFCs (second line).

For example, the constraints for rotational invariance related to harmonic terms can be found as

(5)\[\begin{split}&\sum_{\ell_{2}\kappa_{2}} (\Phi_{\mu_{1}\nu}(\ell_{1}\kappa_{1};\ell_{2}\kappa_{2})r_{\mu}(\ell_{2}\kappa_{2})-\Phi_{\mu_{1}\mu}(\ell_{1}\kappa_{1};\ell_{2}\kappa_{2})r_{\nu}(\ell_{2}\kappa_{2})) \notag \\ & \hspace{10mm} + \Phi_{\nu}(\ell_{1}\kappa_{1})\delta_{\mu,\mu_{1}} - \Phi_{\mu}(\ell_{1}\kappa_{1})\delta_{\nu,\mu_{1}} = 0,\end{split}\]


(6)\[\begin{split}& \sum_{\ell_{3}\kappa_{3}} (\Phi_{\mu_{1}\mu_{2}\nu}(\ell_{1}\kappa_{1};\ell_{2}\kappa_{2};\ell_{3}\kappa_{3}) r_{\mu}(\ell_{3}\kappa_{3}) \notag - \Phi_{\mu_{1}\mu_{2}\mu}(\ell_{1}\kappa_{1};\ell_{2}\kappa_{2};\ell_{3}\kappa_{3}) r_{\nu}(\ell_{3}\kappa_{3})) \\ & \hspace{10mm} + \Phi_{\nu\mu_{2}}(\ell_{1}\kappa_{1};\ell_{2}\kappa_{2})\delta_{\mu,\mu_{1}} - \Phi_{\mu\mu_{2}}(\ell_{1}\kappa_{1};\ell_{2}\kappa_{2})\delta_{\nu,\mu_{1}} \notag \\ & \hspace{10mm} + \Phi_{\mu_{1}\nu}(\ell_{1}\kappa_{1};\ell_{2}\kappa_{2})\delta_{\mu,\mu_{2}} - \Phi_{\mu_{1}\mu}(\ell_{1}\kappa_{1};\ell_{2}\kappa_{2})\delta_{\nu,\mu_{2}} = 0.\end{split}\]

When NORDER = 1, equation (5) will be considered if ICONST = 2, whereas equation (6) will be neglected. To further consider equation (6), please use ICONST = 3, though it may enforce a number of harmonic IFCs to be zero since cubic terms don’t exist in harmonic calculations (NORDER = 1).

Estimate IFCs by linear regression

Basic notations

From the symmetrically independent set of IFCs and the constraints between them for satisfying the translational and/or rotational invariance, we can construct an irreducible set of IFCs \(\{\Phi_{i}\}\). Let us denote a column vector comprising the \(N\) irreducible set of IFCs as \(\boldsymbol{\Phi}\). Then, the Taylor expansion potential (TEP) defined by equation (1) is written as

\[U_{\mathrm{TEP}} = \boldsymbol{b}^{T}\boldsymbol{\Phi}.\]

Here, \(\boldsymbol{b} \in \mathbb{R}^{1\times N}\) is a function of atomic displacements \(\{u_{i}\}\) defined as \(\boldsymbol{b} = \partial U / \partial \boldsymbol{\Phi}\). The atomic forces based on the TEP is then given as

(7)\[\boldsymbol{F}_{\mathrm{TEP}} = - \frac{\partial U_{\mathrm{TEP}}}{\partial \boldsymbol{u}} = - \frac{\partial \boldsymbol{b}^{T}}{\partial \boldsymbol{u}} \boldsymbol{\Phi} = A \boldsymbol{\Phi},\]

where \(A \in \mathbb{R}^{3N_{s} \times N}\) with \(N_{s}\) being the number of atoms in the supercell, and \(\boldsymbol{u}^{T} = (u_{1}^{x}, u_{1}^{y}, u_{1}^{z}, \dots, u_{N_{s}}^{x}, u_{N_{s}}^{y}, u_{N_{s}}^{z})\) is the vector comprising \(3N_{s}\) atomic displacements in the supercell. Note that the matrix \(A\) and force vector \(\boldsymbol{F}_{\mathrm{TEP}}\) depend on the atomic configuration of the supercell. To make this point clearer, let us denote them as \(A(\boldsymbol{u})\) and \(\boldsymbol{F}_{\mathrm{TEP}}(\boldsymbol{u})\).

To estimate the IFC vector \(\boldsymbol{\Phi}\) by linear regression, it is usually necessary to consider several different displacement patterns. Let us suppose we have \(N_d\) displacement patterns and atomic forces for each pattern obtained by DFT. Then, equation (7) defined for each displacement pattern can be combined to a single equation as

\[\boldsymbol{\mathscr{F}}_{\mathrm{TEP}} = \mathbb{A} \boldsymbol{\Phi},\]

where \(\boldsymbol{\mathscr{F}}^{T} = [\boldsymbol{F}^{T}(\boldsymbol{u}_{1}), \dots, \boldsymbol{F}^{T}(\boldsymbol{u}_{N_d})]\) and \(\mathbb{A}^{T} = [A^{T}(\boldsymbol{u}_{1}),\dots,A^{T}(\boldsymbol{u}_{N_d})]\).

Ordinary least-squares

In the ordinary least-squares (LMODEL = least-squares), IFCs are estimated by solving the following problem:

(8)\[\boldsymbol{\Phi}_{\mathrm{OLS}} = \mathop{\rm argmin}\limits_{\boldsymbol{\Phi}} \frac{1}{2N_{d}} \|\boldsymbol{\mathscr{F}}_{\mathrm{DFT}} - \boldsymbol{\mathscr{F}}_{\mathrm{TEP}} \|^{2}_{2} = \mathop{\rm argmin}\limits_{\boldsymbol{\Phi}} \frac{1}{2N_{d}} \|\boldsymbol{\mathscr{F}}_{\mathrm{DFT}} - \mathbb{A} \boldsymbol{\Phi} \|^{2}_{2}.\]

Therefore, the IFCs are determined so that the residual sum of squares (RSS) is minimized. To determine all elements of \(\boldsymbol{\Phi}_{\mathrm{OLS}}\) uniquely, \(\mathbb{A}^{T}\mathbb{A}\) must be full rank. When the fitting is successful, alm reports the relative fitting error \(\sigma\) defined by

(9)\[\sigma = \sqrt{\frac{\|\boldsymbol{\mathscr{F}}_{\mathrm{DFT}} - \mathbb{A} \boldsymbol{\Phi} \|^{2}_{2}}{\|\boldsymbol{\mathscr{F}}_{\mathrm{DFT}}\|_{2}^{2}}},\]

where the denominator is the square sum of the DFT forces.

Elastic-net regression

In the elasitc-net optimization (LMODEL = elastic-net), IFCs are estimated by solving the following optimization problem:

(10)\[\boldsymbol{\Phi}_{\mathrm{enet}} = \mathop{\rm argmin}\limits_{\boldsymbol{\Phi}} \frac{1}{2N_{d}} \|\boldsymbol{\mathscr{F}}_{\mathrm{DFT}} - \mathbb{A} \boldsymbol{\Phi} \|^{2}_{2} + \alpha \beta \| \boldsymbol{\Phi} \|_{1} + \frac{1}{2} \alpha (1-\beta) \| \boldsymbol{\Phi} \|_{2}^{2},\]

where \(\alpha\) is a hyperparameter that controls the trade-off between the sparsity and accuracy of the model, and \(\beta \; (0 < \beta \leq 1)\) is a hyperparameter that controls the ratio of the \(L_{1}\) and \(L_{2}\) regularization terms. \(\alpha\) and \(\beta\) must be given by input tags L1_ALPHA and L1_RATIO, respectively.

An optimal value of \(\alpha\) can be estimated, for example, by cross-validation (CV). A \(n\)-fold CV can be performed by setting the CV-tag properly.

Adaptive LASSO [1]

In adaptive LASSO (LMODEL = adaptive-lasso), IFCs are estimated by solving the following optimization problem:

(11)\[\boldsymbol{\Phi}_{\mathrm{adalasso}} = \mathop{\rm argmin}\limits_{\boldsymbol{\Phi}} \frac{1}{2N_{d}} \|\boldsymbol{\mathscr{F}}_{\mathrm{DFT}} - \mathbb{A} \boldsymbol{\Phi} \|^{2}_{2} + \alpha \sum_i w_i |\Phi_i| ,\]

where \(\alpha\) is a hyperparameter given by L1_ALPHA, and \(w_i\) is the parameter-dependent weight. In ALM, we simply use \(w_i = 1/|\Phi_{\mathrm{OLS},i}|\) with \(\Phi_{\mathrm{OLS},i}\) being the coefficient estimator produced by an OLS fitting. Hence, in this option, the size of the training dataset must be large enough to make the matrix \(\mathbb{A}\) overdetermined. The code keeps running even when \(\mathbb{A}\) is underdetermined. So, please be careful.


The minimum size of the training dataset necessary for making \(\mathbb{A}\) overdetermined can be roughly estimated as follows:

We assume that there are \(N\) independent IFCs (after imposing constraints, if there is any). In this case, the number of columns of matrix \(\mathbb{A}\) becomes \(N\), and \(\mathbb{A}\) becomes overdetermined when the number of independent rows of \(\mathbb{A}\) is \(N\) or larger. If the training structures are generated randomly and all atoms are displaced from their original positions in each configuration, we can generate \(3\times N_{\mathrm{atom}}\) (\(N_{\mathrm{atom}}\) is the number of atoms in the supercell) linearly independent rows of \(\mathbb{A}\) from one displaced configuration , i.e., one static DFT calculation. Hence, we expect that \(\mathbb{A}\) becomes overdetermined when

\[N_d \geq \frac{N}{3N_{\mathrm{atom}}}\]

where \(N_d\) is the number of displacement patterns in the training dataset.

In cross validation, the entire training dataset is divided into smaller subsets. For each subset, the above condition should be satisfied.

[1]H. Zou, The Adaptive Lasso and Its Oracle Properties, J. Am. Stat. Assoc. 101, 1418 (2006).