5.1. ALM: Input files¶
Format of input files¶
Each input file should consist of entry fields. Available entry fields are
&general, &interaction, &cutoff, &cell, &position, and &optimize (&fitting).
Each entry field starts from the key label &field and ends at the terminate character “/”. (This is equivalent to Fortran namelist.)
For example, &general entry field of program alm should be given like
&general
# Comment line
PREFIX = prefix
MODE = fitting
/
Multiple entries can be put in a single line. Also, characters put on the right of sharp (“#”) are neglected. Therefore, the above example is equivalent to the following:
&general
PREFIX = prefix; MODE = fitting # Comment line
/
Each variable must be given inside the appropriate entry field.
List of supported input variables¶
&general | ||||
HESSIAN | KD | MODE | NAT | NKD |
PERIODIC | PREFIX | PRINTSYM | TOLERANCE | |
&interaction | ||||
NBODY | NORDER | |||
&optimize | ||||
CONV_TOL | CV | CV_MINALPHA | DEBIAS_OLS | DFILE |
DFSET | DFSET_CV | ENET_DNORM | FC2XML | FC3XML |
FFILE | ICONST | L1_ALPHA | L1_RATIO | LMODEL |
MAXITER | NDATA | NDATA_CV | NSTART NEND | NSTART_CV NEND_CV |
ROTAXIS | SKIP | SOLUTION_PATH | SPARSE | SPARSESOLVER |
STANDARDIZE |
Description of input variables¶
“&general”-field¶
- PREFIX-tag : Job prefix to be used for names of output files
Default: None Type: String
- MODE-tag = optimize | suggest | fitting
optimize (>= 1.1.0) Estimate harmonic and anharmonic IFCs.This mode requires an appropriate &optimize field.fitting (deprecated) An alias ofMODE = optimize
suggest Suggests the displacement patterns necessaryto estimate harmonic and anharmonic IFCS.
Default: None Type: String
- NAT-tag : Number of atoms in the supercell
Default: None Type: Integer
- NKD-tag : Number of atomic species
Default: None Type: Integer
- KD-tag = Name[1], … , Name[
NKD
]
Default: None Type: Array of strings Example: In the case of GaAs with NKD = 2
, it should beKD = Ga As
.
- TOLERANCE-tag : Tolerance for finding symmetry operations
Default: 1.0e-3 Type: Double
- PRINTSYM-tag = 0 | 1
0 Symmetry operations won’t be saved in “SYMM_INFO” 1 Symmetry operations will be saved in “SYMM_INFO”
Default: 0 type: Integer
- PERIODIC-tag = PERIODIC[1], PERIODIC[2], PERIODIC[3]
0 Do not consider periodic boundary conditions whensearching for interacting atoms.1 Consider periodic boundary conditions whensearching for interacting atoms.
Default: 1 1 1 type: Array of integers Description: This tag is useful for generating interacting atoms in low dimensional systems. When PERIODIC[i]
is zero, periodic boundary condition is turned off along the direction of the lattice vector \(\boldsymbol{a}_{i}\).
- HESSIAN-tag = 0 | 1
0 Do not save the Hessian matrix 1 Save the entire Hessian matrix of the supercell as PREFIX.hessian.
Default: 0 type: Integer
“&interaction”-field¶
- NORDER-tag : The order of force constants to be calculated. Anharmonic terms up to \((m+1)\)th order will be considered with
NORDER
= \(m\).
Default: None Type: Integer Example: NORDER = 1
for calculate harmonic terms only,NORDER = 2
to include cubic terms as well, and so on.
- NBODY-tag : Entry for excluding multiple-body interactions from anharmonic force constants
Default:
NBODY
= [2, 3, 4, …,NORDER
+ 1]Type: Array of integers
Description: This tag may be useful for excluding multi-body clusters which are supposedly less important. For example, a set of fourth-order IFCs \(\{\Phi_{ijkl}\}\), where \(i, j, k\), and \(l\) label atoms in the supercell, can be categorized into four different subsets; on-site, two-body, three-body, and four-body terms. Neglecting the Cartesian coordinates of IFCs for simplicity, each subset contains the IFC elements shown as follows:
on-site \(\{\Phi_{iiii}\}\)two-body \(\{\Phi_{iijj}\}\), \(\{\Phi_{iiij}\}\) (\(i\neq j\))three-body \(\{\Phi_{iijk}\}\) (\(i\neq j, i\neq k, j \neq k\))four-body \(\{\Phi_{ijkl}\}\) (all subscripts are different from each other)Since the four-body clusters are expected to be less important than the three-body and less-body clusters, you may want to exclude the four-body terms from the Taylor expansion potential because the number of such terms is huge. This can be done by setting the
NBODY
tag asNBODY = 2 3 3
together withNORDER = 3
.More examples:
NORDER = 2; NBODY = 2 2
includes harmonic and cubic IFCs but excludes three-body clusters from the cubic terms.
NORDER = 5; NBODY = 2 3 3 2 2
includes anharmonic terms up to the sixth-order, where the four-body clusters are excluded from the fourth-order IFCs, and the multi (\(\geq 3\))-body clusters are excluded from the fifth- and sixth-order IFCs.
“&cutoff”-field¶
In this entry field, one needs to specify cutoff radii of interaction for each order in units of Bohr.
In the current implementation, cutoff radii should be defined for every possible pair of atomic elements.
For example, the cutoff entry for a harmonic calculation (NORDER = 1
) of Si (NKD = 1
) should be like
&cutoff
Si-Si 10.0
/
This means that the cutoff radius of 10 \(a_{0}\) is used for harmonic Si-Si terms.
Please note that the first column should be two character strings, which are contained in the KD
-tag,
connected by a hyphen (’-’).
When one wants to consider cubic terms (NORDER = 2
), please specify the cutoff radius for cubic terms in the third column as the following:
&cutoff
Si-Si 10.0 5.6 # Pair r_{2} r_{3}
/
Instead of giving specific cutoff radii, one can write “None” as follows:
&cutoff
Si-Si None 5.6
/
which means that all possible harmonic terms between Si-Si atoms will be included.
Caution
Setting ‘None’ for anharmonic terms can greatly increase the number of parameters and thereby increase the computational cost.
When there are more than two atomic elements, please specify the cutoff radii between every possible pair of atomic elements. In the case of MgO (NKD = 2
), the cutoff entry should be like
&cutoff
Mg-Mg 8.0
O-O 8.0
Mg-O 10.0
/
which can equivalently be written by using the wild card (’*’) as
&cutoff
*-* 8.0
Mg-O 10.0 # Overwrite the cutoff radius for Mg-O harmonic interactions
/
Important
Cutoff radii specified by an earlier entry are overwritten by a new entry that comes later.
Once the cutoff radii are properly given, harmonic force constants \(\Phi_{i,j}^{\mu,\nu}\) satisfying \(r_{ij} \le r_{c}^{\mathrm{KD}[i]-\mathrm{KD}[j]}\) will be searched.
In the case of cubic terms, force constants \(\Phi_{ijk}^{\mu\nu\lambda}\) satisfying \(r_{ij} \le r_{c}^{\mathrm{KD}[i]-\mathrm{KD}[j]}\), \(r_{ik} \le r_{c}^{\mathrm{KD}[i]-\mathrm{KD}[k]}\), and \(r_{jk} \le r_{c}^{\mathrm{KD}[j]-\mathrm{KD}[k]}\) will be searched and determined by fitting.
“&cell”-field¶
Please give the cell parameters in this entry in units of Bohr as the following:
&cell
a
a11 a12 a13
a21 a22 a23
a31 a32 a33
/
The cell parameters are then given by \(\vec{a}_{1} = a \times (a_{11}, a_{12}, a_{13})\), \(\vec{a}_{2} = a \times (a_{21}, a_{22}, a_{23})\), and \(\vec{a}_{3} = a \times (a_{31}, a_{32}, a_{33})\).
“&position”-field¶
In this field, one needs to specify the atomic element and fractional coordinate of atoms in the supercell. Each line should be
ikd xf[1] xf[2] xf[3]
where ikd is an integer specifying the atomic element (ikd = 1, …, NKD
) and xf[i] is the
fractional coordinate of an atom. There should be NAT
such lines in the &position entry field.
“&optimize”-field (“&fitting”-field)¶
This field is necessary when MODE = optimize
(or a deprecated option MODE = fitting
).
- LMODEL-tag : Choice of the linear model used for estimating force constants
“least-squares”, “LS”, “OLS”, 1 Ordinary least square “elastic-net”, “enet”, 2 Elastic net
Default: least-squares
Type: String
Description: When
LMODEL = ols
, the force constants are estimated from the displacement-force datasets via the ordinary least-squares (OLS), which is usually sufficient to calculate harmonic and third-order force constants.The elastic net (
LMODEL = enet
) should be useful to calculate the fourth-order (and higher-order) force constants. When the elastic net is selected, the users have to set the following related tags:CV
,L1_RATIO
,L1_ALPHA
,CV_MAXALPHA
,CV_MINALPHA
,CV_NALPHA
,STANDARDIZE
,ENET_DNORM
,MAXITER
,CONV_TOL
,NWRITE
,SOLUTION_PATH
,DEBIAS_OLS
- DFSET-tag: File name containing displacement-force datasets for training
New in version 1.1.0.
Default: None Type: String Description: The format of DFSET
can be found here
- DFILE-tag: File name containing atomic displacements in Cartesian coordinate
Deprecated since version 1.1.0: Use
DFSET
instead.
Default: None Type: String Description: The format of DFILE
can be found here. This tag is deprecated and will be removed in a future major release. Please useDFSET
instead.
- FFILE-tag: File name containing atomic forces in Cartesian coordinate
Deprecated since version 1.1.0: Use
DFSET
instead.
Default: None Type: String Description: The format of FFILE
can be found here. This tag is deprecated and will be removed in a future major release. Please useDFSET
instead.
- NDATA-tag : Number of displacement-force data sets
Default: None Type: Integer Description: If NDATA
is not given, the code reads all lines ofDFSET
(excluding comment lines) and estimatesNDATA
by dividing the line number byNAT
. If the number of lines is not divisible byNAT
, an error is raised.DFSET
should contain at leastNDATA
\(\times\)NAT
lines.
- NSTART, NEND-tags : Specifies the range of data to be used for fitting
Default: NSTART = 1
,NEND = NDATA
Type: Integer Example: To use the data in the range of [20:30] out of 50 entries, the tags should be NSTART = 20
andNEND = 30
.
- SKIP-tag : Specifies the range of data to be skipped for training
Default: None Type: Two integers connected by a hyphen Description: SKIP
=\(i\)-\(j\) skips the data in the range of [\(i\):\(j\)]. The \(i\) and \(j\) must satisfy \(1\leq i \leq j \leq\)NDATA
. This option may be useful when doing cross-validation manually (CV=-1
).
- ICONST-tag = 0 | 1 | 2 | 3 | 11
0 No constraints 1 Constraint for translational invariance is imposed between IFCs.Available only whenLMODEL = ols
.11 Same asICONST = 1
but the constraint is imposed algebraically rather than numerically.Select this option whenLMODEL = enet
.2 In addition toICONST = 1
, constraints for rotational invariance will beimposed up to (NORDER
+ 1)th order. Available only whenLMODEL = ols
.3 In addition toICONST = 2
, constraints for rotational invariance between (NORDER
+ 1)th orderand (NORDER
+ 2)th order, which are zero, will be considered.Available only whenLMODEL = ols
.
Default: 11 Type: Integer Description: See this page for the numerical formulae.
- ROTAXIS-tag : Rotation axis used to estimate constraints for rotational invariance. This entry is necessary when
ICONST = 2, 3
.
Default: None Type: String Example: When one wants to consider the rotational invariance around the \(x\)-axis, one should give ROTAXIS = x
. If one needs additional constraints for the rotation around the \(y\)-axis,ROTAXIS
should beROTAXIS = xy
.
- FC2XML-tag : XML file to which the harmonic terms are fixed upon fitting
Default: None Type: String Description: When FC2XML
-tag is given, harmonic force constants are fixed to the values stored in theFC2XML
file. This may be useful for optimizing cubic and higher-order terms without changing the harmonic terms. Please make sure that the number of harmonic terms in the new computational condition is the same as that in theFC2XML
file.
- FC3XML-tag : XML file to which the cubic terms are fixed upon fitting
Default: None Type: String Description: Same as the FC2XML
-tag, butFC3XML
is to fix cubic force constants.
- SPARSE-tag = 0 | 1
0 Use a direct solver (SVD or QRD) to estimate force constants 1 Use a sparse solver to estimate force constants
Default: 0 Type: Integer Description: When you want to calculate force constants of a large system and generate training datasets by displacing only a few atoms from equilibrium positions, the resulting sensing matrix becomes large but sparse. For such matrices, a sparse solver is expected to be more efficient than SVD or QRD in terms of both memory usage and computational time. When SPARSE = 1
is set, the code uses a sparse solver implemented in Eigen3 library. You can change the solver type viaSPARSESOLVER
. Effective whenLMODEL = ols
.
- SPARSESOLVER-tag : Type of the sparse solver to use
Default: SimplicialLDLT Type: String Description: Currently, only the sparse solvers of Eigen3 library can be used. Available options are SimplicialLDLT, SparseQR, ConjugateGradient, LeastSquaresConjugateGradient, and BiCGSTAB. When an iterative algorithm (conjugate gradient) is selected, a stopping criterion can be specified by the CONV_TOL
andMAXITER
tags. Effective whenLMODEL = ols
andSPARSE = 1
.See also
Eigen documentation page: Solving Sparse Linear Systems
- MAXITER-tag : Number of maximum iterations in iterative algorithms
Default: 10,000 Type: Integer Description: Effective when an iterative solver is selected via SPARSESOLVER
(whenLMODEL = ols
) or whenLMODEL = enet
.
- CONV_TOL-tag : Convergence criterion of iterative algorithms
Default: 1.0e-8 Type: Double Description: When LMODEL = ols
and an iterative solver is selected viaSPARSESOLVER
,CONV_TOL
value is passed to the Eigen3 function via setTolerance(). WhenLMODEL = enet
, the coordinate descent iteration stops at \(i\)th iteration if \(\sqrt{\frac{1}{N}|\boldsymbol{\Phi}_{i} - \boldsymbol{\Phi}_{i-1}|_{2}^{2}} <\)CONV_TOL
is satisfied, where \(N\) is the length of the vector \(\boldsymbol{\Phi}\).See also
Eigen documentation page: IterativeSolverBase
- L1_RATIO-tag : The ratio of the L1 regularization term
Default: 1.0 (LASSO) Type: Double Description: The L1_RATIO
changes the regularization term asL1_ALPHA
\(\times\) [L1_RATIO
\(|\boldsymbol{\Phi}|_{1}\) + \(\frac{1}{2}\) (1-L1_RATIO
) \(|\boldsymbol{\Phi}|_{2}^{2}\)]. Therefore,L1_RATIO = 1
corresponds to LASSO.L1_RATIO
must be0 < L1_ratio <= 1
. Effective whenLMODEL = enet
. See also here.
- L1_ALPHA-tag : The coefficient of the L1 regularization term
Default: 0.0 Type: Double Description: This tag is used only when LMODEL = enet
andCV = 0
. See also here.
- CV-tag : Cross-validation mode for elastic net
0 Cross-validation mode is off.The elastic net optimization is solved with the givenL1_ALPHA
value.The force constants are written toPREFIX
.fcs andPREFIX
.xml.>= 2 CV
-fold cross-validation is performed automatically.NDATA
training datasets are divided intoCV
subsets, andCV
different combinations oftraining-validation datasets are created internally. For each combination, the elastic netoptimization is solved with the variousL1_ALPHA
values defined by theCV_MINALPHA
,CV_MAXALPHA
, andCV_NALPHA
tags. The result of each cross-validation is stored inPREFIX
.enet_cvset[1, …,CV
], and their average and deviation are stored inPREFIX
.cvscore.-1 The cross-validation is performed manually.The Taylor expansion potential is trained by using the training datasets inDFSET
, andthe validation score is calculated by using the data inDFSET_CV
for variousL1_ALPHA
valuesdefined theCV_MINALPHA
,CV_MAXALPHA
, andCV_NALPHA
tags.After the calculation, the fitting and validation errors are stored inPREFIX
.enet_cv.This option may be convenient for a large-scale problem since multiple optimization tasks withdifferent training-validation datasets can be done in parallel.
Default: 0 Type: Integer Description: This tag is used only when LMODEL = enet
.
- DFSET_CV-tag : File name containing displacement-force datasets used for manual cross-validation
Default: DFSET_CV = DFSET
Type: String Description: This tag is used only when LMODEL = enet
andCV = -1
.
- NDATA_CV-tag : Number of displacement-force validation datasets
Default: None Type: Integer Description: This tag is used only when LMODEL = enet
andCV = -1
.
- NSTART_CV, NEND_CV-tags : Specifies the range of data to be used for validation
Default: NSTART_CV = 1
,NEND_CV = NDATA_CV
Type: Integer Example: This tag is used only when LMODEL = enet
andCV = -1
.
- CV_MINALPHA, CV_MAXALPHA, CV_NALPHA-tags : Options to specify the
L1_ALPHA
values used in cross-validation
Default: CV_MINALPHA = 1.0e-4
,CV_MAXALPHA = 1.0
,CV_NALPHA = 1
Type: Double, Double, Integer Description: CV_NALPHA
values ofL1_ALPHA
are generated fromCV_MINALPHA
toCV_MAXALPHA
in logarithmic scale. A recommended value ofCV_MAXALPHA
is printed out to the log file. This tag is used only whenLMODEL = enet
and the cross-validation mode is on (CV > 0
orCV = -1
).
- STANDARDIZE-tag = 0 | 1
0 Do not standardize the sensing matrix 1 Each column of the sensing matrix is standardized in such a way that its mean valuebecomes 0 and standard deviation becomes 1.
Default: 1 Type: Integer Description: This option influences the optimal L1_ALPHA
value. So, if you change theSTANDARDIZE
option, you have to rerun the cross-validation. Effective only whenLMODEL = enet
.
- ENET_DNORM-tag : Normalization factor of atomic displacements
Default: 1.0 Type: Double Description: The normalization factor of atomic displacement \(u_{0}\) in units of Bohr. When \(u_{0} (\neq 1)\) is given, the displacement data are scaled as \(u_{i} \rightarrow u_{i}/u_{0}\) before constructing the sensing matrix. This option influences the optimal L1_ALPHA
value. So, if you change theENET_DNORM
value, you will have to rerun the cross-validation. Effective only whenLMODEL = enet
andSTANDARDIZE = 0
.
- SOLUTION_PATH-tag = 0 | 1
0 Do not save the solution path. 1 Save the solution path of each cross-validation combination in PREFIX
.solution_path.
Default: 0 Type: Integer Description: Effective when LMODEL = enet
and the cross-validation mode is on.
- DEBIAS_OLS-tag = 0 | 1
0 Save the solution of the elastic net problem to PREFIX
.fcs andPREFIX
.xml.1 After the solution of the elastic net optimization problem is obtained,only non-zero coefficients are collected, and the ordinary least-squares fitting issolved again with the non-zero coefficients before saving the results toPREFIX
.fcs andPREFIX
.xml. This might be useful to reduce the bias of the elastic net solution.
Default: 0 Type: Integer Description: Effective when LMODEL = enet
andCV = 0
.
How to make a DFSET file¶
Format of DFSET
(recommended as of ver. 1.1.0)¶
The displacement-force data sets obtained by first-principles (or classical force-field) calculations
have to be saved to a file, say DFSET. Then, the force constants are estimated by setting DFSET =
DFSET and with MODE = optimize
.
The DFSET file must contain the atomic displacements and corresponding forces in Cartesian coordinate for at least NDATA
structures (displacement patterns)
in the following format:
Here, NAT
is the number of atoms in the supercell.
The unit of displacements and forces must be Bohr and Ryd/Bohr, respectively.
Format of DFILE
and FFILE
(deprecated)¶
Deprecated since version 1.1.0: Use DFSET
instead.
The displacement-force data sets obtained by first-principles (or classical force-field) calculations
have to be saved to DFILE
and FFILE
to estimate IFCs with MODE = fitting
.
In DFILE
, please explicitly specify the atomic displacements \(u_{\alpha}(\ell\kappa)\) in units of Bohr as follows:
When there are NAT
atoms in the supercell and NDATA
data sets,
there should be NAT
\(\times\) NDATA
lines in the DFILE
without blank lines.
In FFILE
, please specify the corresponding atomic forces in units of Ryd/Bohr.