**Quantum Computation & Learning**

**Projective Simulation Learning Agents**

The Projective Simulation (PS) model, introduced in [Briegel, De las Cuevas, Sci. Rep.
2, 400 (2012)], is a framework for the description of autonomous embodied learning agents in artificial intelligence (whose main components are illustrated in the figure on the right-hand side). Recently, the PS model has been shown to perform well in standard RL problems [see, e.g., Mauter, Makmal, Manzano, Tiersch, Briegel, New Gener. Comput. 33, 69-114 (2015),arXiv:1305.1578; Melnikov, Makmal, Briegel, IEEE Access 6, 64639-64648 (2018), arXiv:1804.08607] and in advanced robotics applications [Hangl, Ugur, Szedmak, Piater, in 2016 IEEE/RSJ International Conferenceon Intelligent Robots and Systems (IROS) (2016) pp. 2799-2804, arXiv:1603.00794]. Moreover, its memory structure provides a dynamic framework for generalization [Melnikov, Makmal, Dunjko, Briegel, Sci. Rep. 7, 14430 (2017), arXiv:1504.02247].The central element of the PS agent is an episodic and compositional memory, storing a representation of its past experience, and allowing the agent to simulate future action. The memory can be described as a stochastic network of so-called clips (blue and green dots in the figure), some of which may represent percepts (blue), some of them represent actions (green). The decision-making process of the agent is then realized by a stochastic random walk in the clip network. In a Reflecting Projective Simulation (RPS) variant of the PS model [Paparo, Dunjko, Makmal, Martín-Delgado, Briegel, Phys. Rev.X 4 031002 (2014)], the goal is to output actions according to a particular distribution which can be updated and hence changes throughout the learning process. The RPS framework is amenable to quantization, and indeed, provides a quadratic speed-up in the decision-making with respect to its classical counterpart, which has recently been implemented in an ion trap quantum processor [Sriarunothai, Wölk, Giri, Friis, Dunjko, Briegel, Wunderlich, Quant. Sci. Techn. 4, 015014 (2019), arXiv:1709.01366]. |
Scheme for a projective simulation agent that interacts with its environment via sensory input (percepts), and action on the environment that is conducted using a set of actuators.
The sensors and actuators are linked to the memory, which relates new perceptual input to the agent’s past experience. Figure from [Sriarunothai et al., Quant. Sci. Techn. 4, 015014 (2019)].
The quantum-mechanically sped-up decision-making process of the RPS agent depends on the ability to implement operations conditionally on the state of a control register. At the same time, the updating procedure of the memory that goes along with learning means that it is desireable for the agent to have a fixed hardware that can add control to arbitrary unitary operations. Since there is no generic way of achieving this, we have investigated the possibilities of implementing this coherent controlization procedure specifically in trapped ions [Dunjko, Friis, Briegel, New J. Phys.
17, 023006 (2015)] and superconducting qubits [Friis, Melnikov, Kirchmair, Briegel, Sci. Rep. 5, 18036 (2015)]. |

**Coherent controlization in trapped-ion qubits**

Trapped ions are one of the most promising candidates to implement quantum computational tasks. Seminal theoretical proposals such as [Cirac, Zoller, Phys. Rev. Lett.
74, 4091 (1995)], and [Mølmer, Sørensen, Phys. Rev. Lett. 82, 1835 (1999)], paired with the precise control of modern ion-trap setups [see, e.g., Schmidt-Kaler et al., Nature 422, 408 (2003); Barreiro et al., Nature 470, 486 (2011)] allow universal quantum computation with several qubits, represented by the internal energy levels of the ions. In other words, discrete universal sets of quantum gates can be realized, and any arbitrary quantum gate may be approximated by sequentially applying a combination of the universal gates.However, an apparent issue arises when an unknown operation -a black box- is introduced into this picture. It was shown in [Araújo, Feix, Costa, Brukner, New J. Phys. 16 093026 (2014), arXiv:1309.7976] and [Thompson, Gu, Modi, Vedral, New J. Phys. 20, 013004 (2018), arXiv:1310.2927] that inserting a single unknown subroutine U into a quantum circuit that is independent of U, cannot realize ctrl-U, i.e., a single operation U may not generally be conditioned on the value of a control qubit.Nonetheless, control can be added when the "single use" of the unknown operation is interpreted in practical terms, for instance, when a single physical device is provided that acts on internal degrees of freedom of a photon [see Zhou, Ralph, Kalasuwan, Zhang, Peruzzo, Lanyon, O'Brien, Nat. Commun. 2, 413 (2011)]. |

In our paper [Friis, Dunjko, Dür, Briegel, Phys. Rev. A

**89**, 030303(R) (2014), e-print arXiv:1401.8128] we show how the techniques for quantum computation with trapped ions can be harnessed to allow quantum control to be added to arbitrary, unknown single-qubit unitaries*U*that are realized by a single laser pulse on a single trapped ion. By using the famous method of Cirac and Zoller [Phys. Rev. Lett.**74**, 4091 (1995)] the state of the control qubit, realized in one ion, is swapped to the axial centre-of-mass vibrational mode of two ions, see the Figure above. The ground state |0> and excited state |1> of the vibrational mode generate two submanifolds within the ions electronic levels. Two red-detuned laser pulses, H1 and H2, are then used to "hide" one of these manifolds from the laser pulse realizing the unknown unitary*U*, which drives the transition between the qubit levels |g> and |e>.**Coherent controlization using superconducting qubits**

In our recent work [Friis, Melnikov, Kirchmair, Briegel, Sci. Rep.
5, 18036 (2015)] we have found an in principle scalable method to implement coherent controlization in a system of superconducting transmon qubits. These systems, which are very resilient against charge noise, see [Koch et al., Phys. Rev. A 76, 042319 (2007)] have long coherence times (roughly up to 100μs) and are therefore a promising system. Coupling transmon qubits to a microwave resonator then allows to split and recombine resonator state components that correspond to different qubits states (coloured circles in the figure on the right). Using unconditional displacements D(α), waiting periods Δt and single-qubit unitaries U(θ) that are conditioned on the ground state of the resonator, we can then add control to these single-qubit unitaries, realizing the circuits shown below, or construct more complicated conditioned operations. |