{"id":44452,"date":"2022-03-02T13:12:14","date_gmt":"2022-03-02T21:12:14","guid":{"rendered":"https:\/\/2.zoppoz.workers.dev:443\/https\/developer.nvidia.com\/blog\/?p=44452"},"modified":"2023-06-12T14:04:56","modified_gmt":"2023-06-12T21:04:56","slug":"scaling-quantum-circuit-simulation-with-cutensornet","status":"publish","type":"post","link":"https:\/\/2.zoppoz.workers.dev:443\/https\/developer.nvidia.com\/blog\/scaling-quantum-circuit-simulation-with-cutensornet\/","title":{"rendered":"Scaling Quantum Circuit Simulation with NVIDIA cuTensorNet"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Quantum computing aspires to deliver more powerful computation in faster time for problems that cannot currently be addressed with classical computing. NVIDIA recently announced the <a href=\"https:\/\/2.zoppoz.workers.dev:443\/https\/developer.nvidia.com\/cuquantum-sdk\">cuQuantum SDK<\/a>, a high-performance library for accelerating the development of quantum information science. cuQuantum recently was used to <a href=\"https:\/\/2.zoppoz.workers.dev:443\/https\/blogs.nvidia.com\/blog\/2021\/11\/09\/cuquantum-world-record\/\">break the world record for the MaxCut quantum algorithm simulation running on the DGX SuperPOD<\/a>, with 8x more qubits than prior work.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The initial target application for cuQuantum is acceleration of <em>quantum circuit simulations,<\/em> and it consists of two major libraries:&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>cuStateVec<\/strong>: Accelerates state vector simulations.<\/li>\n\n\n\n<li><strong>cuTensorNet<\/strong>: Accelerates tensor network simulations.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">In this post, we provide an overview of both libraries, with a more detailed discussion of cuTensorNet.<\/p>\n\n\n\n<h2 id=\"why_use_custatevec\"  class=\"wp-block-heading\">Why use cuStateVec?<a href=\"#why_use_custatevec\" aria-label=\"Scroll to Why use cuStateVec? section\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The cuStateVec library from the cuQuantum SDK provides a high-performance solution for state vector-based simulation through optimized GPU kernels for most use cases that arise in simulators. While the state vector method is great for running deep quantum circuits, simulations of quantum circuits with large numbers of qubits, which grow exponentially, are impossible to run even on today&#8217;s largest supercomputers.&nbsp;<\/p>\n\n\n\n<h2 id=\"why-cutensornet\"  class=\"wp-block-heading\" >Why use cuTensorNet?<a href=\"#why-cutensornet\" aria-label=\"Scroll to Why use cuTensorNet? section\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">As an alternative, the tensor network method is a technique that represents the quantum state of <em>N<\/em> qubits as a series of tensor contractions. This enables quantum circuit simulators to handle circuits with many qubits by trading space required by the algorithms with computation. Depending on circuit topology and depth, this can also get prohibitively expensive. Then, the main challenge is to compute these tensor contractions efficiently. <\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter is-resized\"><a href=\"https:\/\/2.zoppoz.workers.dev:443\/https\/lh3.googleusercontent.com\/mg8BM0dwQTXTFU0NOtxS9Xola8koyQV3VetrrBLfleeheYplCsdxU2k6DoGlssVt_xqTVfS31rNFrbB5siIOW09dyfV9WPkGBgYro8yYrQJzRUZHE_dwSm0ww8ww7kC_1Rpl_ODP\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/2.zoppoz.workers.dev:443\/https\/lh3.googleusercontent.com\/mg8BM0dwQTXTFU0NOtxS9Xola8koyQV3VetrrBLfleeheYplCsdxU2k6DoGlssVt_xqTVfS31rNFrbB5siIOW09dyfV9WPkGBgYro8yYrQJzRUZHE_dwSm0ww8ww7kC_1Rpl_ODP\" alt=\"Diagram shows that cuTensorNet accepts a quantum circuit expressed as a tensor network, and offers both C and Python APIs with optimized performance for both pathfinding and contraction on NVIDIA GPU backends.\" width=\"615\" height=\"532\"\/><\/a><figcaption class=\"wp-element-caption\"><em><em>Figure 1. Schematic diagram of the software stack for quantum circuit simulation<\/em><\/em><\/figcaption><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">The cuTensorNet library from the cuQuantum SDK provides a high-performance solution for these types of tensor network computations.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The cuTensorNet library offers both C and Python APIs to provide access to high-performance tensor network computations for accelerating quantum circuit simulation. The APIs are flexible, enabling you to control, explore, and investigate each of the algorithmic techniques implemented.&nbsp;&nbsp;<\/p>\n\n\n\n<h2 id=\"cutensornet_algorithmic_description&nbsp;\"  class=\"wp-block-heading\">cuTensorNet algorithmic description&nbsp;<a href=\"#cutensornet_algorithmic_description&nbsp;\" aria-label=\"Scroll to cuTensorNet algorithmic description&nbsp; section\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In this section, we discuss the different algorithms and techniques used in cuTensorNet. It includes two main components: <em>pathfinder<\/em> and <em>execution<\/em>.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The pathfinder provides an optimal contraction path with minimal cost in a short elapsed time and the execution step computes that path on the GPU using efficient kernels. These two components are independent of each other and are interoperable with any other external library providing similar functionality.&nbsp;&nbsp;<\/p>\n\n\n\n<h3 id=\"pathfinder\"  class=\"wp-block-heading\">Pathfinder<a href=\"#pathfinder\" aria-label=\"Scroll to Pathfinder section\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">At a high level, the approach taken in cuTensorNet is hyper-optimization around a graph partitioning-based pathfinder. For more information, see <a rel=\"noreferrer noopener\" href=\"https:\/\/2.zoppoz.workers.dev:443\/https\/arxiv.org\/abs\/2002.01935\" target=\"_blank\">Hyper-optimized tensor network contraction<\/a>. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The role of a pathfinder is to find a contraction path that minimizes the cost of contracting the tensor network. Many algorithmic advancements and optimization were developed to make this step fast, and it will become even faster.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Finding an optimal contraction path is strongly dependent on the size of the network. The larger the network, the more techniques and computational effort are needed to find the optimal contraction path.&nbsp;&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The cuTensorNet pathfinder consists of three algorithmic modules (Figure 2).&nbsp;<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a40931279a85&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a40931279a85\" class=\"aligncenter is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/2.zoppoz.workers.dev:443\/https\/lh3.googleusercontent.com\/2rbOmq9ywQBdDBQfeG_hSB7pVNf0dpSAhsgbqUYvFKKFO0EUmL8VwUpjqAaXKaoTGX_g7AA8gOzYxZGGfThyDV_JE9UUwdsR6MNs25W_t5EvFqexbmip597LFrefwjFmN1olUz4g\" alt=\"Flowchart shows how a tensor network is first simplified, then run through a hyperoptimization loop to find the optimal path. Then that path is sent to the execution module for planning and contraction.\" width=\"641\" height=\"449\"\/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"https:\/\/2.zoppoz.workers.dev:443\/http\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><figcaption class=\"wp-element-caption\"><em>Figure 2. cuTensorNet flowchart for the pathfinding and contraction execution submodules<\/em><\/figcaption><\/figure>\n<\/div>\n\n\n<ul class=\"wp-block-list\" type=\"1\">\n<li><strong>Simplification<\/strong>: A technique that preprocesses the tensor network to find all sets of obvious straightforward contractions. It removes them from the network and replaces each set by its final tensor. The result is a smaller network that is easier to process in the following modules.<\/li>\n\n\n\n<li><strong>Path computation<\/strong>: The heart of the pathfinder component. It is based on a graph-partitioning step, followed by a second step that uses a reconfiguration adjustment and slicing technique. The graph partitioning is called recursively to split the network and form a contraction path (for example, a pairwise contraction tree).<\/li>\n\n\n\n<li><strong>Hyper-optimizer:<\/strong> A loop over the path computation module where at each iteration a contraction path is formed. For each iteration, the hyper-optimizer creates a different configuration of parameters for the path computation while keeping track of the best path found. You change or fix any of these configuration parameters as you like. All configuration parameters can be set by <code>cutensornetContractionOptimizerConfigSetAttribute<\/code>. For more information, see the <a href=\"https:\/\/2.zoppoz.workers.dev:443\/https\/docs.nvidia.com\/cuda\/cuquantum\/cutensornet\/index.html\">cuTensorNet documentation<\/a>.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">The generated path from the first step might not be close to optimal, so the reconfiguration adjustment is usually performed. Reconfiguration chooses several small subtrees within the overall contraction tree and attempts to improve their contraction cost, decreasing the overall cost if possible.&nbsp;&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Another feature of the path computation module is the slicing technique. The primary goal of slicing is to fit the network contraction process into the available device memory. Slicing accomplishes this by excluding certain tensor modes and explicitly unrolling their extents. This generates many similar contraction trees, or <em>slices<\/em>, where each corresponds to one of the excluded modes. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The contraction path, or tree, does not change. Only some modes are excluded in this case and the computation of each slice is independent from the others. Consequently, slicing can be considered as one of the best techniques to create independent work for different devices.&nbsp;&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Practical experience indicates that finding an optimal contraction path can be sensitive to the choice of configuration parameters of each of the techniques used here. To increase the probability of finding the best contraction path, we encapsulate this module inside a hyper-optimizer.&nbsp;<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Pathfinding performance <\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">There are two relevant metrics when considering the performance of a pathfinder: the quality of the path found, and the time taken to find that path. The former is plotted in Figure 3, measured by the cost of the resulting contraction in FLOPS. The circuits used for benchmarking are random quantum circuits from Google Quantum AI\u2019s <a href=\"https:\/\/2.zoppoz.workers.dev:443\/https\/quantum-journal.org\/papers\/q-2021-03-15-410\/\">2019 quantum supremacy paper<\/a>, at depth 12, 14, and 20.&nbsp;<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter is-resized\"><a href=\"https:\/\/2.zoppoz.workers.dev:443\/https\/lh3.googleusercontent.com\/BeCzj0qNpnWOLAeaecf9rXgsPmEL_Cp5674o_y-oWNxL1S7lO-g5KI5lfHUw0DHtu3nbiLunnGNOfEkkluUvfkduUMv-LyVuwxoQFzshpzpglzryR4uhX0ac3ZMdkfovvdGhmZzj\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/2.zoppoz.workers.dev:443\/https\/lh3.googleusercontent.com\/BeCzj0qNpnWOLAeaecf9rXgsPmEL_Cp5674o_y-oWNxL1S7lO-g5KI5lfHUw0DHtu3nbiLunnGNOfEkkluUvfkduUMv-LyVuwxoQFzshpzpglzryR4uhX0ac3ZMdkfovvdGhmZzj\" alt=\"Bar chart shows that test circuits are random quantum circuits of depth 12, 14, and 20.\" width=\"1027\" height=\"464\"\/><\/a><figcaption class=\"wp-element-caption\"><em>Figure 3<strong>.<\/strong> cuTensorNet pathfinding performance compared to similar packages, measured in FLOPs for the resulting contraction<\/em><\/figcaption><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">cuTensorNet performs well compared to the <code><a href=\"https:\/\/2.zoppoz.workers.dev:443\/https\/developer-blogs.nvidia.com\/wp-admin\/httpsL\/pypi.org\/project\/opt-einsum\/\">opt_einsum<\/a><\/code> library in finding an optimal path, and slightly better than Cotengra for these circuits.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">cuTensorNet also finds a high-quality path quickly. The time taken to find a contraction for cuTensorNet compared to Cotengra is plotted in Figure 4, for the Sycamore quantum circuits problems with different depth. For the most complex problem with over 3,000 tensors in the network, cuTensorNet still finds its optimal path in just 40 seconds.  <\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter is-resized\"><a href=\"https:\/\/2.zoppoz.workers.dev:443\/https\/lh5.googleusercontent.com\/yoHzxuvxQ3rxwPhL5Pj2hF_Rw9PZpcA2cQpDK_uW48yY66fDKYFvP5TatQKg_RrbeIf3z2bdHM1mCQl2VTDSHpSjmuAl236_VRerpquEa7s1GXINNdjX6pWgWD3PsnEf10W79_RM\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/2.zoppoz.workers.dev:443\/https\/lh5.googleusercontent.com\/yoHzxuvxQ3rxwPhL5Pj2hF_Rw9PZpcA2cQpDK_uW48yY66fDKYFvP5TatQKg_RrbeIf3z2bdHM1mCQl2VTDSHpSjmuAl236_VRerpquEa7s1GXINNdjX6pWgWD3PsnEf10W79_RM\" alt=\"Bar chart shows that, for the largest instances of random quantum circuits, cuTensorNet finds an optimal path nearly 20x faster\" width=\"800\" height=\"496\"\/><\/a><figcaption class=\"wp-element-caption\"><em>Figure 4. Time to solution for cuTensorNet pathfinding, compared to Cotengra, for the Sycamore quantum circuits problems.<\/em><\/figcaption><\/figure>\n<\/div>\n\n\n<h3 id=\"execution\"  class=\"wp-block-heading\">Execution<a href=\"#execution\" aria-label=\"Scroll to Execution section\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The execution component relies on the cuTENSOR library as the backend for efficient execution on the GPU. It consists of the following phases:&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\" type=\"1\">\n<li><strong>Planning<\/strong>: The decision engine of the execution component. It analyzes the contraction path, deciding the best way to execute it on GPU using the minimal workspace. It also decides on the best kernels to be used for each of the pairwise contractions.&nbsp;<\/li>\n\n\n\n<li><strong>Computation<\/strong>: This phase computes all the pairwise contractions using the cuTENSOR library.&nbsp;&nbsp;<\/li>\n\n\n\n<li><strong>Autotuning<\/strong>: (Optional) Different kernels based on different heuristics are tried for pairwise contraction and the best is chosen.&nbsp;<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Execution performance<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Figure 5 measures the speedup of the contraction execution for cuTensorNet compared to CuPy, for several different circuits. Depending on the circuit, cuTensorNet offers around an 8-20x speedup for the contraction execution.&nbsp;&nbsp;<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a4093127b024&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a4093127b024\" class=\"aligncenter is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/2.zoppoz.workers.dev:443\/https\/lh5.googleusercontent.com\/95AAywXX-Mqkv-HU_GQ4hd1m7WldXmNoK24pBejRSsixnZfPIvhgq5EAbE06lOAmNWDGF7toKbRH6Rp1nf-cZu_KlyMJcg6TPXKoYuQMdSy1nc0YPws7-3QPcg4_2gCxEZWig3G1\" alt=\"Bar chart shows that cuTensorNet offers roughly an 8\u201320x speedup in the contraction phase.\" width=\"852\" height=\"500\"\/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"https:\/\/2.zoppoz.workers.dev:443\/http\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><figcaption class=\"wp-element-caption\"><em>Figure 5.<strong> <\/strong>Contraction speedup of cuTensorNet vs. cuPy on a single A100 GPU, for several key quantum algorithms.<\/em><\/figcaption><\/figure>\n<\/div>\n\n\n<h2 id=\"cutensornet_example&nbsp;\"  class=\"wp-block-heading\">cuTensorNet example&nbsp;<a href=\"#cutensornet_example&nbsp;\" aria-label=\"Scroll to cuTensorNet example&nbsp; section\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">cuTensorNet provides both C and Python APIs that allow you to compute tensor network contractions efficiently without requiring any expertise on how to find the best contraction path or how to execute it on GPUs.&nbsp;&nbsp;<\/p>\n\n\n\n<h3 id=\"high-level_python_apis&nbsp;\"  class=\"wp-block-heading\">High-level Python APIs&nbsp;<a href=\"#high-level_python_apis&nbsp;\" aria-label=\"Scroll to High-level Python APIs&nbsp; section\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">cuTensorNet offers high-level Python APIs that are interoperable with NumPy and CuPy ndarrays and PyTorch tensors. For example, the <code>einsum<\/code> expression of a tensor network can be used in a single function call to <code>contract<\/code>. cuTensorNet performs all the required steps, returning the contracted network as a result.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">import cupy as cp\nimport cuquantum \n \n# Compute D_{m,x,n,y} = A_{m,h,k,n} B_{u,k,h} C_{x,u,y} \n# Create an array of extents (shapes) for each tensor \nextentA = (96, 64, 64, 96) \nextentB = (96, 64, 64) \nextentC = (64, 96, 64) \nextentD = (96, 64, 96, 64) \n \n# Generate input tensor data directly on GPU \nA_d = cp.random.random(extentA, dtype=cp.float32) \nB_d = cp.random.random(extentB, dtype=cp.float32) \nC_d = cp.random.random(extentC, dtype=cp.float32) \n \n# Set the pathfinder options \noptions = cuquantum.OptimizerOptions() \noptions.slicing.disable_slicing = 1  # disable slicing \noptions.samples = 100                # number of hyper-optimizer samples \n \n# Run the contraction on a CUDA stream \nstream = cp.cuda.Stream() \nD_d, info = cuquantum.contract( \n    'mhkn,ukh,xuy-&gt;mxny', A_d, B_d, C_d, \n    optimize=options, stream=stream, return_info=True) \nstream.synchronize() \n \n# Check the optimizer info \nprint(f\"{info[1].opt_cost\/1e9} GFLOPS\") \n<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">From this code example, you can see that all cuTensorNet operations are encapsulated in the single <code>contract<\/code> API. The output for this example is 14.495514624 GFLOPS: the number of floating-point operations estimated based on the contraction path found by the path finder. To perform the same steps manually, you can also use the <a href=\"https:\/\/2.zoppoz.workers.dev:443\/https\/docs.nvidia.com\/cuda\/cuquantum\/python\/api\/generated\/cuquantum.Network.html#cuquantum.Network\">cuQuantum.Network object<\/a>.<\/p>\n\n\n\n<h3 id=\"low-level_apis&nbsp;&nbsp;\"  class=\"wp-block-heading\">Low-level APIs&nbsp;&nbsp;<a href=\"#low-level_apis&nbsp;&nbsp;\" aria-label=\"Scroll to Low-level APIs&nbsp;&nbsp; section\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">As previously discussed, the C and Python APIs are designed in a straightforward expressive fashion. You can call the pathfinder function to get an optimized path, followed by a call to perform the contraction on the GPU using that path.&nbsp;&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For advanced users, the cuTensorNet library API is designed to grant access to all algorithmic choices available to enable research in this field. For example, you can control how many hyper-optimizer samples the pathfinder can try to find the best contraction path. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">There are dozens of parameters that you can modify or control. These are accessible through the helper functions and allow the simple functionalities API to remain unchanged. You are also allowed to provide your own path. For more information about the lower-level options and examples of how to use them, see <a rel=\"noreferrer noopener\" href=\"https:\/\/2.zoppoz.workers.dev:443\/https\/docs.nvidia.com\/cuda\/cuquantum\/python\/api\/generated\/cuquantum.Network.html#cuquantum.Network\" target=\"_blank\">cuquantum.Network<\/a>.&nbsp;&nbsp;<\/p>\n\n\n\n<h2 id=\"summary\"  class=\"wp-block-heading\">Summary<a href=\"#summary\" aria-label=\"Scroll to Summary section\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The cuTensorNet library of the NVIDIA cuQuantum SDK aims to accelerate tensor network computation on GPUs. In this post, we showed the speedup over state-of-the-art tensor network libraries on key quantum algorithms.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">There is extensive development to improve cuTensorNet and expand it with new algorithmic advancements as well as multi-node, multi-GPU execution.&nbsp;&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The cuTensorNet library goal is to provide a useful tool for groundbreaking developments in quantum computing. Have feedback and suggestions on how we can improve the cuQuantum libraries? Send email to <a href=\"mailto:cuquantum-feedback@nvidia.com\">cuquantum-feedback@nvidia.com<\/a>.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For more information, see the following resources:&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/2.zoppoz.workers.dev:443\/https\/developer.nvidia.com\/cuquantum-sdk\">cuQuantum beta<\/a> (includes cuTensorNet)<\/li>\n\n\n\n<li><a rel=\"noreferrer noopener\" href=\"https:\/\/2.zoppoz.workers.dev:443\/https\/docs.nvidia.com\/cuda\/cuquantum\/index.html\" target=\"_blank\">cuQuantum documentation<\/a><\/li>\n\n\n\n<li><a rel=\"noreferrer noopener\" href=\"https:\/\/2.zoppoz.workers.dev:443\/https\/github.com\/NVIDIA\/cuQuantum\" target=\"_blank\">NVIDIA\/cuQuantum<\/a> GitHub repo<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>We present benchmarks and usage of cuTensorNet, a cuQuantum library providing high-performance tensor network computations for quantum circuit simulation.<\/p>\n","protected":false},"author":1356,"featured_media":44759,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"publish_to_discourse":"","publish_post_category":"318","wpdc_auto_publish_overridden":"","wpdc_topic_tags":"","wpdc_pin_topic":"","wpdc_pin_until":"","discourse_post_id":"936926","discourse_permalink":"https:\/\/2.zoppoz.workers.dev:443\/https\/forums.developer.nvidia.com\/t\/scaling-quantum-circuit-simulation-with-nvidia-cutensornet\/204917","wpdc_publishing_response":"success","wpdc_publishing_error":"","nv_subtitle":"","ai_post_summary":"<ul><li>The cuTensorNet library from NVIDIA&#039;s cuQuantum SDK accelerates tensor network computations on GPUs, providing a high-performance solution for quantum circuit simulations.<\/li><li>cuTensorNet&#039;s pathfinder component uses a hyper-optimization approach based on graph partitioning to find an optimal contraction path, minimizing the cost of contracting the tensor network.<\/li><li>cuTensorNet offers significant speedup over other libraries, such as CuPy, with 8-20x faster contraction execution on a single A100 GPU for key quantum algorithms.<\/li><\/ul>","footnotes":"","_links_to":"","_links_to_target":""},"categories":[503],"tags":[2734,2733,453,608,2735,2377],"coauthors":[2736,2737],"class_list":["post-44452","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-simulation-modeling-design","tag-cuquantum","tag-cutensornet","tag-featured","tag-hpc","tag-quantum-computing","tag-tutorial","tagify_workload-data-center-cloud","tagify_workload-simulation-modeling-design"],"acf":{"post_industry":["HPC \/ Scientific Computing"],"post_products":["cuQuantum","cuTensorNet"],"post_learning_levels":"","post_content_types":["Tutorial"],"post_collections":""},"jetpack_featured_media_url":"https:\/\/2.zoppoz.workers.dev:443\/https\/developer-blogs.nvidia.com\/wp-content\/uploads\/2022\/03\/cuTensorNet.png","primary_category":{"category":"Simulation \/ Modeling \/ Design","link":"https:\/\/2.zoppoz.workers.dev:443\/https\/developer.nvidia.com\/blog\/category\/simulation-modeling-design\/","id":503,"data_source":""},"nv_translations":[{"language":"zh_CN","title":"\u5229\u7528 NVIDIA cuTensorNet \u8fdb\u884c\u91cf\u5b50\u7535\u8def\u6a21\u62df","post_id":3185}],"jetpack_shortlink":"https:\/\/2.zoppoz.workers.dev:443\/https\/wp.me\/pcCQAL-byY","jetpack_likes_enabled":true,"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/2.zoppoz.workers.dev:443\/https\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/posts\/44452","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/2.zoppoz.workers.dev:443\/https\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/2.zoppoz.workers.dev:443\/https\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/2.zoppoz.workers.dev:443\/https\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/users\/1356"}],"replies":[{"embeddable":true,"href":"https:\/\/2.zoppoz.workers.dev:443\/https\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/comments?post=44452"}],"version-history":[{"count":34,"href":"https:\/\/2.zoppoz.workers.dev:443\/https\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/posts\/44452\/revisions"}],"predecessor-version":[{"id":66679,"href":"https:\/\/2.zoppoz.workers.dev:443\/https\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/posts\/44452\/revisions\/66679"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/2.zoppoz.workers.dev:443\/https\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/media\/44759"}],"wp:attachment":[{"href":"https:\/\/2.zoppoz.workers.dev:443\/https\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/media?parent=44452"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/2.zoppoz.workers.dev:443\/https\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/categories?post=44452"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/2.zoppoz.workers.dev:443\/https\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/tags?post=44452"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/2.zoppoz.workers.dev:443\/https\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/coauthors?post=44452"}],"curies":[{"name":"wp","href":"https:\/\/2.zoppoz.workers.dev:443\/https\/api.w.org\/{rel}","templated":true}]}}