Scaling deep learning to find materials
Making Materials in the Lab with Active Neural Networks (Gnomes for Materials Mechanics, I: The A-Lab Project, Ed. Darmstadt, California)
It’s one thing to predict the existence of a material, but quite another to actually make it in the lab. The A-Lab can help with that. Gerbrand Ceder, a materials scientist at LBNL and the University of California, Berkeley, leads the A-Lab team and he says they now have the ability to quickly make new materials.
Several tactics are used by GNo ME to predict more materials than previous systems. Rather than replace all of the calcium cations in the material with magnesium, it may be better to try a wide range of unusual atom swaps. The system learns from its mistakes and it’s ok if they don’t work out. This is similar to ChatGPT for materials discovery.
In 17 days the A-Lab produced 41 new materials, 9 of which were created after active learning improved the synthesis. Some materials were eventually synthesized after humans regrinding a mixture, but not all, because of the difficulties they encountered in the A-Lab.
Active learning was performed in stages of generation and later evaluation of filtered materials through DFT. In the first stage, materials from the snapshots of the Materials Project and the OQMD are used to generate candidates with an initial model trained on the Materials Project data, with a mean absolute error of 21 meV atom−1 in formation energy. Filtration and subsequent evaluation with DFT led to discovery rates between 3% and 10%, depending on the threshold used for discovery. Structural GNNs are trained to improve the predictive performance after every round of active learning. Furthermore, stable crystal structures are added to the set of materials that can be substituted into, yielding a greater number of candidates to be filtered by the improved models. This procedure of retraining and evaluation was completed six times, yielding the total of 381,000 stable crystal discoveries. The number of stable crystals may continue to go up because of continued exploration with active learning.
Using artificial intelligence to make A-Lab an autonomous system for mixed-mixing, heating and cooling of powdered ingredients: A case study
The A-Lab uses state-of- the-art technology to mix and heat powdered ingredients and then analyze the result in order to find out if the procedure worked. The US$2-million set-up took 18 months to build. The biggest challenge for the system was using artificial intelligence to make it autonomously responsible for all aspects of the system, as well as for planning experiments, interpreting data and making decisions. “The robots are great fun to watch, but the innovation is really under the hood,” Ceder says.
Together, these advances promise to dramatically accelerate the discovery of materials for clean-energy technologies, next-generation electronics and a host of other applications. “A lot of the technologies around us, including batteries and solar cells, could really improve with better materials,” says Ekin Dogus Cubuk, who leads the materials discovery team at Google DeepMind in London and was involved in both studies, which were published today in Nature1,2.
The co-director of Cornell University’sArtificial Intelligence for Science Institute in New York, who was not involved in the research, believes that scientific discovery is the next frontier for artificial intelligence. That is why I like this so much.
SAPS Substitution Patterns: Expansions and Initializations of a Large Sample of Novel Stable Structures from the Materials Project and OqMD
SAPS enable efficient discovery of such structures. TheSubstitution patterns section outlines the probabilities that can be used to find candidate ion replacements. We are able to get the Wyckoff positions of the structures using symmetry analysers. We only consider symmetry groupings when we consider partial replacements from 1 to all of the candidate ion. Early experiments limited the partial substitutions to materials that would charge-balance after partial substitutions when considering common oxidation states; however, greater expansion of candidates was achieved by removing such charge-balancing from the later experiments. The main part of this paper talks about the partial-substitution framework that allows for greater use of common crystal structures while allowing for the discovery of new prototypical structures. There are different candidates from the same distribution who can be found in the same dataset.
We traced reference structures from substitution of 381,000 novel stable structures back to a structure in the Materials Project or OqMD by means of a topological sort to prove the impact of theSAPS. The diverse candidate- generation procedure resulted in a total of 22% of the 381,000 stable structures being attributed to a SAPS substitution.
Random structures are generated through AIRSS. Random structures are relaxed through soft-sphere potentials after they are initialized assible structures. A substantial number of initializations and relaxations are needed to discover new materials, as different initial structures lead to different minima on the structure–energy landscape. We always create 100 AIRSS structures for every composition that is predicted to be within 50 meV of stable.
In the supplementary note 5, we describe not all DFT relaxations converge for the 100 initializations. Only a few initializations converge for certain compositions. A good initial volume guess for the composition is one of the main difficulties. We try a range of initial volumes ranging from 0.4 to 1.2 times a volume estimated by considering relevant atomic radii, finding that the DFT relaxation fails or does not converge for the whole range for each composition. The work is needed to uncover the reasons why mostAIRSS initialization fail for certain compositions.
The crystal definition is used for the structural models. Each atom is represented as a single node in the graph. The edge is defined when the interatomic distance isn’t higher than a threshold. The edges are embedded based on the interatomic distance. We also include a global feature that is connected in the graph representation to all nodes. Every facet of the GKN is aggregated and used to update the representations of the different aspects of the network. After 3–6 layers of message passing, an output layer projects the global vector to get an estimate of the energy. All data for training are shifted and scaled to approximately standardize the datasets. The model is trained on the materials project data and has an absolute error of 21 meV. A model with an absolute error of 11 meV atom1 was trained during the active-learning procedure. Training for structural models is performed with 1,000 epochs, with a learning rate of 5.55 × 10−4 and a linear decay learning rate schedule. We normally train with a batches size of 512, but we also use the magic realism in the MLP. To embed the edges, we use a Gaussian featurizer. Unless otherwise stated, the number of message-passing iteration is 3 for the embedded dimensions of all of the edges.
Following Roost (representation learning from stoichiometry)58, we find GNNs to be effective at predicting the formation energy of a composition and structure.
Discovering new datasets aided by neural networks requires a careful balance between ensuring that the neural networks trained on the dataset are stable and promoting new discoveries. New structures and prototypes will be inherently out of distribution for models; however, we hope that the models are still capable of extrapolating and yielding reasonable predictions. This is out-of-distribution detection problem is further exacerbated by the implicit domain shift, in which models are trained on relaxed structures but evaluated on substitutions before relaxation. To counteract the effects of these effects, we make several adjustments.
To learn machine-learning models of energy, a random split over different crystal structures is used to create a test set. However, as the GNoME dataset contains several crystal structures with the same composition, this metric is less trustworthy over GNoME. The model generalizes to new compositions if there are several structures within the same composition in the training and the test sets. In the paper, we assign examples to the test and training sets, as well as use a deterministic hash for the reduced formula of the composition. This ensures that there are no overlapping compositions in the training and test sets. We take a standard MD5 formula and convert it to an irk and modulo 100 and threshold at 85.
Although neural network models offer flexibility that allows them to achieve state-of-the-art performance on a wide range of problems, they may not generalize to data outside the training distribution. Using an ensemble of models is a simple, popular choice for providing predictive uncertainty and improving generalization of machine-learning predictions33. This technique requires less training than one might think. The prediction corresponds to the mean over the outputs of all n models; the uncertainty can be measured by the spread of the n outputs. Training machine-learning models for stability prediction is done using 10 graph networks. The interquartile range is a more reliable predictor of performance because of the instability of graph-network predictions.
Source: Scaling deep learning for materials discovery
DFT Analysis of Transition Metal Compositions: Clustering-Based Reduction Strategies and Automated Bandgaps for Li, Na, Mg, Ge and Ga
Only the structure predicted to have the minimum energy in a composition can be used for DFT verification. For an in depth evaluation of a specific composition family of interest, we design clustering-based reduction strategies. We take the top 100 structures and compare them with pymatgen’s built-in structure matcher. We cluster the connected components on the graph of pairwise similarities and take the minimum energy structure as the cluster representation. When applicable, this strategy is a good one to use to discover polymorphs.
We use the VASP (refs. 34,59) with the PBE41 functional and PAW40,60 potentials in all DFT calculations. Our DFT settings are consistent with the Materials Project workflows as encoded in pymatgen23 and atomate61. We use consistent settings with the Materials Project workflow, including the Hubbard U parameter applied to a subset of transition metals in DFT+U, 520 eV plane-wave-basis cutoff, magnetization settings and the choice of PBE pseudopotentials, except for Li, Na, Mg, Ge and Ga. For Li, Na, Mg, Ge and Ga, we use more recent versions of the respective potentials with the same number of valence electrons. For all structures, we use the standard protocol of two-stage relaxation of all geometric degrees of freedom, followed by a final static calculation, along with the custodian package23 to handle any VASP-related errors that arise and adjust appropriate simulations. We force a different kind of kpoint generation for hexagonal cells, called gamma-centred kpoint generation. We assume ferromagnetic spin initialization with finite magnetic moments, as preliminary attempts to incorporate different spin orderings showed computational costs that were prohibitive to sustain at the scale presented. In AIMD simulations, we turn off spin polarisation and use a 2-fs time step.
For validation purposes (such as the filtration of Li-ion conductors), bandgaps are calculated for most of the stable materials discovered. We automate bandgap jobs in our computation pipelines by first copying all outputs from static calculations and using the pymatgen-based MPNonSCFSet in line mode to compute the bandgap and density of states of all materials. Future work can be done by analyzing bandgaps of the novel discoveries.
r2SCAN is an accurate and numerically efficient functional that has seen increasing adoption from the community for increasing the fidelity of computational DFT calculations. This functional is provided in the upgraded version of VASP6 and, for all corresponding calculations, we use the settings as detailed by MPScanRelaxSet and MPScanStaticSet in pymatgen. Notably, r2SCAN functionals require the use of PBE52 or PBE54 potentials, which can differ slightly from the PBE equivalents used elsewhere in this paper. To speed up computation, we perform three jobs for every SCAN-based computation. We precondition by means of the updated PBE54 potentials, so that we can have a standard relaxation job. This preconditioning step greatly speeds up SCAN computations, which—on average—are five times slower and can otherwise crash on our infrastructure owing to elongated trajectories. Then, we relax with the r2SCAN functional, followed by a static computation.
To calculate the total number of stable crystals in relation to the previous work, we need to know the decomposition energies for the Materials Project and OQMD and update the DFT settings. Furthermore, to ensure fair comparison and that our discoveries are not affected by optimization failures in these high-throughput recalculations, we use the minimum energy of the Materials Project calculation and our recalculation when both are available.
To count the number of layered materials, we use the methodology developed in ref. 45, which is made available through the pymatgen.analysis.dimensionality package with a default tolerance of 0.45 Å.
Source: Scaling deep learning for materials discovery
Graph networks for high-throughput computing of atomic Li/Mn transition-metal oxides: The case for large scale machine learning for materials discovery
The methodology used to estimate the number of viable Li-ion conductors is included in the main part of the paper. 46 in a high-throughput fashion. The methodology involves applying filters against the Li-metal anode to find the most viable Li-ion conductors.
The Li/Mn transition-metal oxide family is discussed in ref. 25 to analyse the capabilities of machine-learning models for use in discovery. The results of previous machine-learning methods were compared against findings in the cited work in the main text.
In Fig. The classification error is present for predicting the outcome of DFT- based molecular dynamics using GNN. ‘GNoME: unique structures’ refers to the first step in the relaxation of crystals in the structural pipeline. We use the forces of each atom on the first step of relaxation. The different training subsets are created by randomly sampling compositions in the structural pipeline uniformly. The compositions of ‘GNo ME: intermediate structures’ are similar to those of ‘GNoME: unique structures’, though they have different steps of DFT relaxation. The red diamond refers to the GNN interatomic potential that can be found on the data from M3GNet which includes three relaxation steps per composition.
For efforts in machine learning, GNoME models make use of JAX and the capabilities to just-in-time compile programs onto devices such as graphics processing units (GPUs) and tensor processing units (TPUs). Graph networks implementations are based on the framework developed in Jraph, which makes use of a fundamental GraphsTuple object (encoding nodes and edges, along with sender and receiver information for message-passing steps). We also make great of use functionality written in JAX MD for processing crystal structures63, as well as TensorFlow for parallelized data input64.
The main section of the paper describes how large scale generation, evaluation and summarization will use Apache Beam to distribute their processing across a large number of workers and scale to the sizes described. For example, a billion proposal structures require a lot of storage that would fail on a single platform.
Source: Scaling deep learning for materials discovery
NequIP potential with high-level hidden structures: e3nn-jax based training and downstream evaluation using SiLU and Bessel functions
We train a NequIP potential30, implemented in JAX using the e3nn-jax library66, with five layers, hidden features of 128 ℓ = 0 scalars, 64 ℓ = 1 vectors and 32 ℓ = 2 tensors (all even irreducible representations only, 128x0e + 64x1x + 32x2e), as well as an edge-irreducible representation of 0e + 1e + 2e. We use the inner cutoff of 4.5 and the radial cutoff of 5 in the basis of eight Bessel functions. SiLU is also used for the gated, equivariant nonlinearities. We embed the chemical species using a 94-element one-hot encoding and use a self-connection, as proposed in ref. 30. For internal normalization, we divide by 26 after each convolution. The models are trained with a learning rate of 2 103 and a size of 32. In a way, the low energy structures in the beginning of the trajectory are much more similar to one another and often come with small forces than high-energy structures, which will be more diverse in the future. Over sampling of first-step structures improved performance on downstream tasks. After about 23 million steps, the learning rate was lowered to a new value of 2 104 and a further 11 million steps followed before training for a final 2.43 million steps. Training was done on four chips.
mathcalL is related to the following:limits_b=1b
The structural model used for downstream evaluation was trained using the Adam optimizer with a learning rate of 2 × 10−3 and a batch size of 16 for a total of 801 epochs. The learning rate was reduced to 2 104 after a number of epochs. The joint loss function is the same one used in the pretraining and it is based on E + 1.0, F + 0.05 and E + 0.1. The network hyperparameters are the same as those used in the NequIP model. To enable a comparison with ref. 62, we also subtract a linear compositional fit based on the training energies from the reference energies before training. Training was performed on a set of four V100 GPUs.
Source: Scaling deep learning for materials discovery
Classification of superionic materials based on the temperature of the electrical content of the material and the AIMD value greater than 100 mScm
We classify a material if it is found to have superionic behavior if the temperature of its electrical content is greater than 1000 K and the AIMD value is greater than 100 mScm. For calculations, refer to the original paper. For more information, see the supplementary information.
The materials for AIMD simulation are chosen because of their stability and the fact that one of the conducting species is under consideration. The last criterion was not selected to include materials with notable electronic conductivity. Materials are run in their very best shape without vacancies or stuffing. The AIMD simulations were performed using the VASP. The temperature is set at 300 K using a time span of 5 ps and velocity rescaling. A 45ps simulation equilibration using a Nosé–hoover thermostat is followed by this. Simulations are done at a time step.
The first 10 ps of the machine learning simulations are discarded for equilibration in the AIMD analysis. We use the DiffusionAnalyzer class of pymatgen to compute the diffusivity from the final 40 ps.