Tuesday, May 19, 2026 - 10:00 am
Room 2267

  DISSERTATION DEFENSE

Author : Lai Wei
Advisors: Dr. Jianjun Hu
Date: May 19, 2026
Time: 10:00 am
Place: Room 2267
Virtual (Zoom)
Link: https://sc-edu.zoom.us/j/4997546955#success

Abstract


The discovery of novel inorganic materials is fundamental to technological progress, yet traditional experimental approaches are often slow and resource-intensive. While computational methods have emerged to accelerate this process, significant bottlenecks remain in both the generative design of new chemical compositions and the rapid, accurate prediction of their crystal structures. Furthermore, the field of Crystal Structure Prediction (CSP) has historically lacked a standardized, quantitative framework for evaluating and comparing the performance of diverse algorithms, hindering reproducible progress. Finally, the rapid proliferation of deep generative models for CSP has introduced a new challenge: the absence of a reproducible, leakage-controlled evaluation protocol tailored to this emerging paradigm.


This dissertation presents a comprehensive, end-to-end methodology that addresses these interconnected challenges. The research narrative begins with the development of the Blank-filling Language Model for Materials (BLMM), a deep learning language model that learns the "chemical grammar" of materials to generate novel, chemically plausible compositions. To bridge the gap between composition and structure, we then introduce the Template-Based Crystal Structure Prediction (TCSP) algorithm, a high-throughput method for rapidly predicting structures for these newly generated formulas.

In the process of developing these tools, we identified the critical need for a universal evaluation framework. To this end, we propose CSPMetrics, a systematic suite of quantitative metrics, and CSPBench, a benchmark platform for the fair and rigorous assessment of a broad range of CSP algorithms. Leveraging insights gained directly from CSPBench, which highlighted the efficacy of template-based approaches, we developed TCSP 2.0, a significantly improved algorithm incorporating superior oxidation state prediction and refined chemical heuristics.

Building on this evaluation framework, we further conduct a standardized, leakage-controlled assessment of twelve representative deep generative CSP algorithms, spanning latent-variable, diffusion-based, flow-based, and autoregressive architectures. We introduce a rigorously filtered CLEAN test subset to assess true generalization capability, and perform an ablation study revealing that current diffusion-based models remain substantially dependent on structural prototype coverage in their training data.

This work establishes a complete research cycle: we generate, we predict, we establish a standard to evaluate, we use that standard to innovate and improve, and we apply that standard to comprehensively assess the next generation of deep learning methods. Together, these contributions provide the materials science community with an integrated, data-driven framework to accelerate the discovery of next-generation functional materials.