Classifying global state preparation via deep reinforcement learning
At a Glance
Section titled âAt a Glanceâ| Metadata | Details |
|---|---|
| Publication Date | 2020-11-05 |
| Journal | Machine Learning Science and Technology |
| Authors | Tobias Haug, Wai-Keong Mok, Jia-Bin You, Wenzu Zhang, Ching Eng Png |
| Citations | 35 |
| Analysis | Full AI Review Included |
Executive Summary
Section titled âExecutive SummaryâThis research demonstrates a scalable Deep Reinforcement Learning (DRL) approach for generating and classifying optimal quantum control protocols across a continuous subspace of target states.
- Global Control Solution: The method learns all control protocols for a continuous two-dimensional state subspace (the Bloch sphere) simultaneously, overcoming the limitations of single-state numerical optimization.
- System and Speed: Applied to the complex multi-level electron spin in Nitrogen-Vacancy (NV) centers, achieving arbitrary superposition state preparation in approximately 0.5 ns. This is significantly faster than comparable adiabatic methods.
- Efficiency: Near-optimal protocols are achieved using only 9 piece-wise constant control steps, minimizing the impact of dissipation.
- Methodology: Deep Q-learning utilizing the Proximal Policy Optimization (PPO) actor-critic algorithm, where Neural Networks (NN) parameterize the time-dependent driving protocols.
- Protocol Clustering: The DRL approach automatically groups near-optimal protocols into distinct clusters or âphasesâ (R1, R2, R3). These phases are characterized by similar driving sequences and specific preparation timescales (T), which can be used to extract physical insights and constraints.
- Scalability: The method is naturally suited for parallelization and can potentially scale to higher dimensional control problems where conventional optimization fails due to highly disordered landscapes.
Technical Specifications
Section titled âTechnical SpecificationsâThe following specifications relate to the optimized control of the NV center electron spin triplet ground state manifold.
| Parameter | Value | Unit | Context |
|---|---|---|---|
| Target System | NV Center Electron Spin | N/A | Triplet ground state manifold ( |
| Target State Subspace | Two-dimensional (Bloch Sphere) | N/A | Superposition of |
| Mean Fidelity (F) | 0.972 | N/A | Average fidelity achieved for arbitrary state preparation (Closed System Approx.) |
| Protocol Time (T) Range | 0.2 < T < 0.8 | ns | Total duration of the 9-step protocol |
| Protocol Steps (NT) | 9 | N/A | Number of piece-wise constant control steps |
| Driving Strength (Omega1,2) Range | -20 to 20 | GHz | Applied laser Rabi frequencies |
| Detuning (Delta1) | 50 | GHz | Relative detuning of Laser 1 |
| Detuning (Delta2) | 0 | N/A | Relative detuning of Laser 2 |
| External Magnetic Field (Bext) | 0.15 | T | Used to lift degeneracy between |
| Ground State Splitting (Dgs) | 2.88 | GHz | Zero-field splitting of the ms = ±1 and ms = 0 sublevels |
| Dissipation Timescale | T << 13 | ns | Protocol time is much faster than the excited state decay rate (1/24 ns-1 to 1/666 ns-1) |
| Training Epochs (NE) | 800,000 | N/A | Total training runs for the neural network |
| Neural Network Neurons (NH) | 600 | N/A | Neurons per layer (two fully-connected layers) |
Key Methodologies
Section titled âKey MethodologiesâThe control protocols were generated using a Deep Reinforcement Learning framework based on the actor-critic method.
- Quantum System Approximation: The NV center is modeled as an effective closed eight-level system, as the protocol time (T â 0.5 ns) is much faster than the fastest dissipation channel (T << 13 ns).
- Protocol Definition: The control protocol Beta(t) is a piece-wise constant function defined by NT = 9 steps. Each step k is parameterized by the driving strengths (Omega1(k), Omega2(k)) and the timestep length (Delta t(k)).
- DRL Implementation: Proximal Policy Optimization (PPO) is used. The neural network (NN) consists of two parts: the Actor (generates the protocol parameters) and the Critic (estimates the expected future reward/fidelity).
- Input and Output: The NN input includes the current system wavefunction Psi(tk) and the randomly sampled target state Psitarget(theta, phi). The NN output determines the driving parameters for the next step.
- Reward Maximization: The NN is trained to maximize the final state fidelity F = |<Psi(NT)|Psitarget>|2.
- Global Training Strategy: Target states are sampled randomly across the continuous Bloch sphere subspace, allowing the NN to learn a generalized function for all states simultaneously, rather than optimizing grid points sequentially.
- Low-Fidelity Biasing: To ensure convergence across the entire state space, target states are sampled with a probability distribution P(theta, phi) that is biased towards regions where previously achieved fidelity was low.
- Protocol Classification: The NNâs effective interpolation results in the automatic clustering of similar near-optimal protocols into distinct âphasesâ (R1, R2, R3), separated by sharp lines of high protocol gradient.
Commercial Applications
Section titled âCommercial ApplicationsâThe ability to generate fast, high-fidelity, and generalized control protocols for complex quantum systems has direct implications for several emerging technologies.
- Quantum Computing:
- Enables the construction of high-fidelity quantum gates for solid-state qubits (like NV centers) by providing tailored, fast control pulses that circumvent complex indirect coupling mechanisms.
- Offers a method to retrain and correct all control protocols simultaneously, saving time when mitigating errors or drifts in quantum processors.
- Quantum Sensing:
- Improves the performance of quantum sensing devices (e.g., NV magnetometers) by allowing rapid, high-fidelity preparation of arbitrary superposition states necessary for sensitive measurements.
- Quantum Simulation and Control:
- Provides a generalized framework for finding optimal control unitaries for continuous variable quantum computation.
- Applicable to various quantum processing platforms and optimal control problems involving complex, multi-level dynamics.
- Physics Research and Insight:
- The automatic classification of protocols into distinct clusters can help researchers identify underlying physical principles, constraints, and âphase transitionsâ in quantum control landscapes.
View Original Abstract
Abstract Quantum information processing often requires the preparation of arbitrary quantum states, such as all the states on the Bloch sphere for two-level systems. While numerical optimization can prepare individual target states, they lack the ability to find general control protocols that can generate many different target states. Here, we demonstrate global quantum control by preparing a continuous set of states with deep reinforcement learning. The protocols are represented using neural networks, which automatically groups the protocols into similar types, which could be useful for finding classes of protocols and extracting physical insights. As application, we generate arbitrary superposition states for the electron spin in complex multi-level nitrogen-vacancy centers, revealing classes of protocols characterized by specific preparation timescales. Our method could help improve control of near-term quantum computers, quantum sensing devices and quantum simulations.