Classifying global state preparation via deep reinforcement learning

At a Glance

Metadata	Details
Publication Date	2020-11-05
Journal	Machine Learning Science and Technology
Authors	Tobias Haug, Wai-Keong Mok, Jia-Bin You, Wenzu Zhang, Ching Eng Png
Citations	35
Analysis	Full AI Review Included

Technical Documentation & Analysis: Deep Reinforcement Learning for NV Center Control

This document analyzes the research paper “Classifying global state preparation via deep reinforcement learning” (arXiv:2005.12759v1) and connects its material requirements to 6CCVD’s advanced MPCVD diamond capabilities, focusing on Single Crystal Diamond (SCD) substrates essential for Nitrogen-Vacancy (NV) center research.

Executive Summary

Core Achievement: Demonstration of global quantum control using Deep Reinforcement Learning (DRL) to generate protocols for preparing arbitrary superposition states in multi-level Nitrogen-Vacancy (NV) centers.
Performance Metrics: Achieved high mean fidelity (F > 0.972) for state preparation across the continuous two-dimensional Bloch sphere subspace.
Speed Breakthrough: Protocols achieved state preparation in approximately $T \approx 0.5$ ns using only 9 timesteps, significantly faster than conventional adiabatic methods (e.g., STIRAP) and reducing the impact of dissipation.
Methodological Insight: The DRL approach, utilizing Proximal Policy Optimization (PPO), automatically clusters near-optimal protocols into distinct “phases,” providing physical insights into optimal preparation timescales and constraints.
Material Requirement: The success of this quantum control relies fundamentally on high-purity, low-strain Single Crystal Diamond (SCD) to host the NV centers and maintain long electron spin coherence times ($T_2$).
6CCVD Value Proposition: 6CCVD provides the necessary Optical Grade SCD substrates, custom dimensions, and advanced polishing ($R_a < 1$ nm) required to replicate and scale this high-speed quantum control research.

Technical Specifications

The following hard data points were extracted from the experimental results and simulation parameters detailed in the paper:

Parameter	Value	Unit	Context
Mean Fidelity (F)	0.972	N/A	Optimized protocol (closed system approximation)
Protocol Duration (T)	~0.5	ns	Time required for arbitrary state preparation
Maximum Protocol Time	0.8	ns	Upper bound for variable time per step
Number of Timesteps ($N_T$)	9	N/A	Optimized discrete steps in the protocol
Driving Strength Range ($\Omega_{1,2}$)	±20	GHz	Range of applied laser Rabi frequencies
Detuning ($\delta_1$)	50	GHz	Relative detuning of the first driving laser
External Magnetic Field ($B_{ext}$)	0.15	T	Applied along the NV quantization axis to lift degeneracy
NV Ground State Splitting ($D_{gs}$)	$2\pi \times 2.88$	GHz	Zero-field splitting of the spin-1 triplet
Dissipation Timescale	$\gg 13$	ns	Protocol time must be much faster than this limit
Neural Network Neurons ($N_H$)	600	N/A	Used in two fully-connected hidden layers

Key Methodologies

The experiment utilized a sophisticated quantum control simulation driven by Deep Reinforcement Learning (DRL).

System Modeling: The physical system is a multi-level Nitrogen-Vacancy (NV) center, modeled as a 10-level system (3 ground states, 6 excited states, 1 metastable state). For the fast control regime ($T \ll 13$ ns), the system is approximated as an effective closed 8-level system.
Control Mechanism: Coherent control between the $|-1\rangle$ and $|+1\rangle$ triplet ground states is achieved by applying two time-dependent driving lasers ($\Omega_1(t), \Omega_2(t)$) that couple the ground states indirectly via the excited state manifold.
Protocol Structure: The control protocol $\beta(t)$ is a piece-wise constant function defined by $N_T=9$ steps, where each step determines the driving strengths ($\Omega_1^{(k)}, \Omega_2^{(k)}$) and the timestep length ($\Delta t^{(k)}$).
Optimization Algorithm: Deep Reinforcement Learning (DRL) was implemented using the Actor-Critic method with Proximal Policy Optimization (PPO).
Training Strategy: The neural network was trained over 800,000 epochs using randomly sampled target states $\Psi_{target}(\theta, \phi)$. The sampling was biased towards areas of lower fidelity to ensure global convergence and prevent the algorithm from getting stuck in local minima.
Reward Maximization: The goal of the training was to maximize the fidelity $F = |\langle\Psi(t_{N_T})|\Psi_{target}\rangle|^2$ as the reward function.

6CCVD Solutions & Capabilities

The successful implementation of high-speed quantum control in NV centers is critically dependent on the quality of the host diamond material. 6CCVD specializes in providing the high-specification MPCVD diamond required for cutting-edge quantum research.

Applicable Materials

To replicate and advance this research, the primary material requirement is high-purity, low-strain Single Crystal Diamond (SCD).

Material Specification	6CCVD Material Grade	Relevance to NV Center Research
NV Host Substrate	Optical Grade Single Crystal Diamond (SCD)	Essential for hosting NV centers. Our SCD features ultra-low nitrogen content (< 1 ppb), maximizing electron spin coherence time ($T_2$).
Doping Options	Controlled Nitrogen Doping	Allows for precise control over NV concentration and depth, crucial for optimizing optical coupling and minimizing surface effects.
Alternative Substrates	High-Purity Polycrystalline Diamond (PCD)	While SCD is preferred for coherence, high-quality PCD (up to 125mm) can be used for large-area sensor arrays or structural components where coherence requirements are less stringent.

Customization Potential

6CCVD’s in-house capabilities directly address the complex engineering needs of quantum device fabrication, offering flexibility far beyond standard commercial wafers.

Research Requirement	6CCVD Customization Capability	Benefit to Quantum Engineers
Specific Dimensions/Shapes	Custom dimensions for plates/wafers up to 125mm (PCD) and large-area SCD substrates (up to 10mm thickness).	Supports integration into complex optical setups and scaling up device prototypes.
Thin Film NV Layers	SCD growth control from 0.1 µm up to 500 µm thickness.	Enables creation of shallow NV layers for enhanced coupling to external fields or deep layers for bulk sensing applications.
Integrated Control Structures	Internal metalization services: Au, Pt, Pd, Ti, W, Cu.	Allows for direct integration of on-chip microwave antennas or electrodes, necessary for hybrid laser/microwave control schemes often used in NV systems.
Optical Surface Quality	Advanced polishing services achieving $R_a < 1$ nm for SCD and $R_a < 5$ nm for inch-size PCD.	Minimizes optical scattering and loss, ensuring efficient coupling of the driving lasers ($\Omega_1, \Omega_2$) to the NV centers.

Engineering Support

6CCVD maintains an in-house team of PhD-level material scientists and engineers specializing in MPCVD diamond for quantum applications. We offer comprehensive support for projects involving high-speed quantum control and NV center fabrication. Our team can assist researchers in selecting the optimal diamond grade, orientation, and processing parameters (e.g., surface termination, doping levels) necessary to achieve target coherence times and optical performance for similar quantum sensing and quantum simulation projects.

For custom specifications or material consultation, visit 6ccvd.com or contact our engineering team directly.

View Original Abstract

Abstract Quantum information processing often requires the preparation of arbitrary quantum states, such as all the states on the Bloch sphere for two-level systems. While numerical optimization can prepare individual target states, they lack the ability to find general control protocols that can generate many different target states. Here, we demonstrate global quantum control by preparing a continuous set of states with deep reinforcement learning. The protocols are represented using neural networks, which automatically groups the protocols into similar types, which could be useful for finding classes of protocols and extracting physical insights. As application, we generate arbitrary superposition states for the electron spin in complex multi-level nitrogen-vacancy centers, revealing classes of protocols characterized by specific preparation timescales. Our method could help improve control of near-term quantum computers, quantum sensing devices and quantum simulations.

Tech Support

Original Source

DOI: https://doi.org/10.1088/2632-2153/abc81f