GPU-Accelerated Solution of the Bethe–Salpeter Equation for Large and Heterogeneous Systems
At a Glance
Section titled “At a Glance”| Metadata | Details |
|---|---|
| Publication Date | 2024-12-11 |
| Journal | Journal of Chemical Theory and Computation |
| Authors | Victor Yu, Yu Jin, Giulia Galli, Marco Govoni |
| Institutions | University of Chicago, University of Modena and Reggio Emilia |
| Citations | 5 |
| Analysis | Full AI Review Included |
Executive Summary
Section titled “Executive Summary”This research details the development and benchmarking of WEST-BSE, a massively parallel, GPU-accelerated implementation of the Bethe-Salpeter Equation (BSE) for large and complex material systems.
- Scalability Breakthrough: Achieves unprecedented scale for GW-BSE calculations, successfully simulating systems up to 1727 atoms and 6910 electrons, demonstrating efficient scaling up to 4096 NVIDIA A100 GPUs.
- Computational Efficiency: Circumvents traditional bottlenecks—the slowly converging sums over virtual states and the inversion of large dielectric matrices—using density matrix perturbation theory and the Projective Dielectric Eigenpotentials (PDEP) technique.
- Defect Convergence: Enables rigorous finite-size scaling studies, confirming that supercells up to 1000 atoms are necessary to obtain converged Vertical Excitation Energies (VEEs) for critical spin defects.
- Quantum Defect Analysis: Provides highly accurate VEEs and radiative lifetimes for key quantum technology defects, including the nitrogen-vacancy (NV-) center in diamond and the divacancy (VV0) in 3C silicon carbide (SiC).
- Extended Defect Modeling: First demonstration of GW-BSE capability to explore the interaction between a point defect (NV-) and an extended defect (dislocation core), revealing significant symmetry breaking and energy splitting (> 0.5 eV).
- Optimized Algorithms: Utilizes a GPU-accelerated Wannier localization technique (JADE algorithm) and a multilevel parallelization strategy to maximize performance and minimize MPI communication overhead.
Technical Specifications
Section titled “Technical Specifications”The following table summarizes key numerical results and performance metrics derived from the GPU-accelerated GW-BSE calculations.
| Parameter | Value | Unit | Context |
|---|---|---|---|
| Maximum System Size | 1727 | Atoms | NV- at 90° dislocation core in diamond. |
| Maximum Electron Count | 6910 | Electrons | Largest system simulated. |
| Computational Scaling | O(N4) | N/A | Overall complexity of the GW-BSE method implemented. |
| Maximum GPU Utilization | 4096 | NVIDIA A100 GPUs | Demonstrated scalability on the NERSC Perlmutter supercomputer. |
| NV- (Diamond) 3E VEE | 2.373 | eV | Calculated VEE using 5x5x5 supercell (999 atoms). |
| NV- (Dislocation Core) 3E Splitting | > 0.5 | eV | Splitting of degenerate triplet excited states (VEEs: 2.08 eV and 2.76 eV). |
| VV0 (3C SiC) Lowest VEE (Extrapolated) | 1.478 | eV | Extrapolated VEE to the dilute limit. |
| VV0 (3C SiC) Radiative Lifetime | 25 | ns | Lowest-energy excited state, close to experimental estimate (23 ns). |
| Si Supercell Size (Benchmark) | 8x8x8 | N/A | Used for optical absorption spectrum calculation (1024 atoms, 4096 electrons). |
| Wannier Overlap Threshold | 0.001 | N/A | Used to truncate screened Coulomb integrals (τvv’). |
Key Methodologies
Section titled “Key Methodologies”The WEST-BSE code employs several advanced algorithms and parallelization strategies to achieve high performance and scalability:
- BSE Formulation: Solved using the linearized Liouville equation and density matrix perturbation theory (DMPT), which implicitly includes all empty states, eliminating the need for explicit summation over virtual states.
- Screened Coulomb Interaction (W): Calculated using the Projective Dielectric Eigenpotentials (PDEP) technique, which relies on a low-rank decomposition of the dielectric matrix, scaling efficiently as O(Nocc x NPDEP x Npw).
- Quasiparticle (QP) Energies: Computed using the full-frequency G0W0 method. QP corrections for higher virtual states are approximated using an average energy shift (scissor operator).
- Integral Reduction via Localization: The number of screened Coulomb integrals (τvv’) is reduced by exploiting the nearsightedness of the density matrix. Integrals are set to zero if the overlap between the corresponding localized Wannier functions falls below a preset threshold (0.001).
- Wannier Localization Algorithm: The occupied Kohn-Sham wave functions are localized using the GPU-accelerated Joint Approximate Diagonalization of Eigen-matrices (JADE) algorithm.
- Multilevel Parallelization: A hierarchical strategy partitions processes into:
- Images: Facilitate iterative diagonalization (Davidson method).
- Spin Pools: Handle spin channels in polarized systems.
- Band Groups: Compute subsets of single-particle states, minimizing communication.
- GPU Implementation: The code utilizes OpenACC directives for portability across different GPU architectures (AMD, Intel, NVIDIA) and leverages optimized third-party libraries (cuFFT and cuBLAS) for core linear algebra operations.
- Communication Mitigation: MPI_Allgather operations for transforming vectors between KS and Wannier spaces are initiated in a non-blocking fashion, allowing data transfer to proceed concurrently with computation of the D and K1e terms.
Commercial Applications
Section titled “Commercial Applications”The ability to accurately and efficiently model excited-state properties in large, defective wide-bandgap materials is crucial for several high-value engineering sectors:
- Quantum Sensing and Computing:
- Qubit Design: Essential for optimizing the electronic structure and optical interfaces of solid-state spin qubits (e.g., NV centers in diamond, SiV centers in SiC) used in quantum processors and memory.
- Defect Engineering: Enables the study of how manufacturing imperfections (dislocations, extended defects) affect qubit performance, coherence, and optical readout fidelity.
- Optoelectronics and Photonics:
- UV/Vis Spectroscopy: Accurate prediction of optical absorption spectra for novel semiconductor materials and nanostructures.
- High-Power Electronics: Modeling wide-bandgap semiconductors (SiC, Diamond) used in high-voltage switches and power converters, where defect states influence device reliability and efficiency.
- Materials Discovery and Design:
- Heterogeneous Systems: Simulation of complex interfaces, surfaces, and large supercells containing multiple defects, which are inaccessible to conventional BSE methods.
- Photovoltaics and Energy Conversion:
- Excited State Dynamics: Provides fundamental insight into exciton formation and dynamics in solar cell materials and photocatalysts.
View Original Abstract
We present a massively parallel GPU-accelerated implementation of the Bethe-Salpeter equation (BSE) for the calculation of the vertical excitation energies (VEEs) and optical absorption spectra of condensed and molecular systems, starting from single-particle eigenvalues and eigenvectors obtained with density functional theory. The algorithms adopted here circumvent the slowly converging sums over empty and occupied states and the inversion of large dielectric matrices through a density matrix perturbation theory approach and a low-rank decomposition of the screened Coulomb interaction, respectively. Further computational savings are achieved by exploiting the nearsightedness of the density matrix of semiconductors and insulators to reduce the number of screened Coulomb integrals. We scale our calculations to thousands of GPUs with a hierarchical loop and data distribution strategy. The efficacy of our method is demonstrated by computing the VEEs of several spin defects in wide-band-gap materials, showing that supercells with up to 1000 atoms are necessary to obtain converged results. We discuss the validity of the common approximation that solves the BSE with truncated sums over empty and occupied states. We then apply our GW-BSE implementation to a diamond lattice with 1727 atoms to study the symmetry breaking of triplet states caused by the interaction of a point defect with an extended line defect.