Coding in the (Potential) Great Leaps in Multidimensional Microscopy and Materials Discovery with Deep Learning
Abstract: Otherwise considered the grand challenge in materials testing and characterization is the translation of information from raw data streams into interpreted information at the scale of living things. Deep learning and augmented analysis have begun to disrupt the materials, microscopy and larger analysis communities at large with advancements made in material-specific models to solve narrow and burdensome tasks. Developments in compressive imaging, where the concept of less is more in compressive modalities has begun to set the stage for high throughput and dose fractionation on our latest microscopes and characterization platforms. Crystallographic determination is crucial to many of those workflows as a whole. At the core, crystallography is deeply rooted in pattern recognition and experts train for years to distinguish minute variations within the data. Determining a crystal’s space group often involves a lengthy process requiring fitting to a series of non-linear equations and intimate knowledge of a sample to be performed properly, including standardized approaches such as Rietveld refinement. The heavy dependence in complex matching and time intensive processes makes it an ideal case for automation with deep learning to benefit materials research as a whole.
Building on recent work to classify crystallography from diffraction data alone, the goal of this research is boosting these models’ predictive capabilities and to provide disambiguation between high-level classifications incorporating additional materials descriptors, such as chemistry and coordination number. Assembled into robust and modular training sets data sets that are collected simultaneously or separately, it is possible to improve the accuracy of these models trained solely on crystallographic data.[2,3] The additional information provided by chemistry data augments the model’s understanding of higher-level materials classifications drawing on open materials data including the Open Crystallography Database and Materials Project Database.
In this presentation, we will present the benefits and challenges associated with learning on multi-modal datasets, including a real-time demonstration of classifying data on the fly in a matter of milliseconds per prediction. Overcoming the technological barrier to data access and extraction is by no means complete. Extracting materials relevant information from multimodal microscopy data, it is necessary to create complex neural networks that utilize multiple data streams through the linear-based normalization and sub-networks to handle additional growing complexity. Sub-networks learn meaningful data-specific features before being concatenated and normalized with the other modules output before classification. Diffraction, chemistry ranked by composition and presence of elements were specifically chosen to be learned on as demonstration with immediate impact. To prevent overfitting and account for variations within experimental data, augmentation functions are employed in conjunction with randomly dropping out different channels. In this presentation, the approach to the workflow in Figure 1 and analysis strategies creating neural networks that incorporate both chemistry and diffraction data will be presented. Results utilizing sub networks to better classify materials with minimal background knowledge in structure or chemistry will be presented, demonstrated, and discussed.
 R.K. Vasudevan, A. Tselev, A.P. Baddorf, S.V. Kalinin, Big-data reflection high energy electron diffraction analysis for understanding epitaxial film growth processes, ACS Nano. 8 (2014). doi:10.1021/nn504730n.
 R.K. Vasudevan, A. Belianinov, A.G. Gianfrancesco, A.P. Baddorf, A. Tselev, S.V. Kalinin, S. Jesse, Big data in reciprocal space: sliding fast Fourier transforms for determining periodicity, Appl Phys Lett. 106 (2015). doi 10.1063/1.4914016.
 J. Dongarra, P. Beckman, T. Moore, P. Aerts, G. Aloisio, J.C. Andre, D. Barkai, J.Y. Berthou, T. Boku, B. Braunschweig, F. Cappello, B. Chapman, X. Chi, A. Choudhary, S. Dosanjh, T. Dunning, S. Fiore, A. Geist, B. Gropp, R. Harrison, M. Hereld, M. Heroux, A. Hoisie, K. Hotta, Z. Jin, Y. Ishikawa, F. Johnson, S. Kale, R. Kenway, D. Keyes, The international exascale software project roadmap, Int J High Perform Comput Appl. 25 (2011). doi:10.1177/1094342010391989.
 Work supported through the INL Laboratory Directed Research & Development (LDRD) Program under DOE Idaho Operations Office Contract DE-AC07-05ID14517. This work was performed, in part, at the Center for Integrated Nanotechnologies, an Office of Science User Facility operated for the U.S. Department of Energy (DOE) Office of Science. Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. DOE’s National Nuclear Security Administration under contract DE-NA-0003525. The views expressed in the article do not necessarily represent the views of the U.S. DOE or the United States Government. In part, this research was conducted at the Center for Nanophase Materials Sciences, which is a DOE Office of Science User Facility.
All seminars are held on Wednesdays from 12:00 noon-1:00 p.m. in the Bowen Hall Auditorium Room 222. A light lunch is provided at 11:30 a.m. in the Bowen Hall Atrium immediately prior to the seminar.