Fixed links

stephenchouca · stephenchouca · commit 0fc848c40407 · 2019-05-30T16:01:04.000-04:00
diff --git a/sdh-docs/benchmarking/index.html b/sdh-docs/benchmarking/index.html
@@ -184,12 +184,12 @@
 TACO's computational performance, since timing those functions will
 include the time it takes to generate code for performing the computation.</p>
 </div>
-<p><strong>The time it takes to construct the initial input tensors should also not be
+<p><strong>The time it takes to construct the initial operand tensors should also not be
 measured</strong>, since again this overhead can often be amortized in practice.  By
 default, <code>pytaco.read</code> and functions for converting NumPy arrays and SciPy
 matrices to TACO tensors return fully constructed tensors.  If you add nonzero
-elements to an input tensor by calling <code>insert</code> though, then <code>pack</code> must also
-be explicitly invoked before any benchmarking is done:</p>
+elements to an operand tensor by invoking its <code>insert</code> method though, then
+<code>pack</code> must also be explicitly invoked before any benchmarking is done:</p>
 <pre class="highlight"><code class="language-python">import pytaco as pt
 from pytaco import compressed, dense
 import numpy as np
diff --git a/sdh-docs/data_analytics/index.html b/sdh-docs/data_analytics/index.html
@@ -151,7 +151,7 @@
 </p>
 <p>where <script type="math/tex">A</script>, <script type="math/tex">C</script>, and <script type="math/tex">D</script> are typically dense matrices, <script type="math/tex">B</script> is a
 three-dimensional tensor (matricizied along the first mode), and <script type="math/tex">\odot</script>
-denotes the Khatri-Rao product. This operation can also be expressed in <a href="computations.md#specifying-tensor-algebra-computations">index
+denotes the Khatri-Rao product. This operation can also be expressed in <a href="../pycomputations/index.html#specifying-tensor-algebra-computations">index
 notation</a> as </p>
 <p>
 <script type="math/tex; mode=display">A_{ij} = B_{ikl} \cdot D_{lj} \cdot C_{kj}.</script>
diff --git a/sdh-docs/index.html b/sdh-docs/index.html
@@ -219,5 +219,5 @@ <h1 id="system-requirements">System Requirements</h1>
 
 <!--
 MkDocs version : 0.17.2
-Build Date UTC : 2019-05-30 18:35:12
+Build Date UTC : 2019-05-30 19:59:15
 -->
diff --git a/sdh-docs/machine_learning/index.html b/sdh-docs/machine_learning/index.html
@@ -150,7 +150,7 @@
 </p>
 <p>where <script type="math/tex">A</script> and <script type="math/tex">B</script> are sparse matrices, <script type="math/tex">C</script> and <script type="math/tex">D</script> are dense matrices,
 and <script type="math/tex">\circ</script> denotes component-wise multiplication. This operation can also be
-expressed in <a href="computations.md#specifying-tensor-algebra-computations">index
+expressed in <a href="../pycomputations/index.html#specifying-tensor-algebra-computations">index
 notation</a> as </p>
 <p>
 <script type="math/tex; mode=display">A_{ij} = B_{ij} \cdot C_{ik} \cdot C_{kj}.</script>
diff --git a/sdh-docs/pytensors/index.html b/sdh-docs/pytensors/index.html
@@ -161,7 +161,7 @@ <h1 id="declaring-tensors">Declaring Tensors</h1>
 the TACO Python library. You can can declare a new tensor by specifying the
 sizes of each dimension, the <a href="index.html#defining-tensor-formats">format</a>
 that will be used to store the tensor, and the
-<a href="index.html#tensor-datatypes">datatype</a> of the tensor's nonzero elements:</p>
+<a href="../reference/rst_files/dtype_object.html">datatype</a> of the tensor's nonzero elements:</p>
 <pre class="highlight"><code class="language-python"># Import the TACO Python library
 import pytaco as pt
 from pytaco import dense, compressed
diff --git a/sdh-docs/scientific_computing/index.html b/sdh-docs/scientific_computing/index.html
@@ -148,7 +148,7 @@
 <script type="math/tex; mode=display">y = Ax + z,</script>
 </p>
 <p>where <script type="math/tex">A</script> is a sparse matrix and <script type="math/tex">x</script>, <script type="math/tex">y</script>, and <script type="math/tex">z</script>
-are dense vectors. The computation can also be expressed in <a href="computations.md#specifying-tensor-algebra-computations">index
+are dense vectors. The computation can also be expressed in <a href="../pycomputations/index.html#specifying-tensor-algebra-computations">index
 notation</a> as </p>
 <p>
 <script type="math/tex; mode=display">y_i = A_{ij} \cdot x_j + z_i.</script>
diff --git a/sdh-docs/search/search_index.json b/sdh-docs/search/search_index.json
@@ -112,7 +112,7 @@
         }, 
         {
             "location": "/benchmarking/index.html", 
-            "text": "The performance of Python applications that use TACO can be measured using\nPython's built-in \ntime.perf_counter\n function with minimal changes to the\napplications.  As an example, we can benchmark the performance of the\nscientific computing application shown \nhere\n as\nfollows:\n\n\nimport pytaco as pt\nfrom pytaco import compressed, dense\nimport numpy as np\nimport time\n\ncsr = pt.format([dense, compressed])\ndv  = pt.format([dense])\n\nA = pt.read(\"pwtk.mtx\", csr)\nx = pt.from_array(np.random.uniform(size=A.shape[1]))\nz = pt.from_array(np.random.uniform(size=A.shape[0]))\ny = pt.tensor([A.shape[0]], dv)\n\ni, j = pt.get_index_vars(2)\ny[i] = A[i, j] * x[j] + z[i]\n\n# Tell TACO to generate code to perform the SpMV computation\ny.compile()\n\n# Benchmark the actual SpMV computation\nstart = time.perf_counter()\ny.compute()\nend = time.perf_counter()\n\nprint(\"Execution time: {0} seconds\".format(end - start))\n\n\n\nIn order to accurately measure TACO's computational performance, \nonly the\ntime it takes to actually perform a computation should be measured.  The time\nit takes to generate code under the hood for performing that computation should\nnot be measured\n, since this overhead can be quite variable but can often be\namortized in practice.  By default though, TACO will only generate and compile\ncode it needs for performing a computation immediately before it has to\nactually perform the computation.  As the example above demonstrates, by\nmanually calling the result tensor's \ncompile\n method, we can tell TACO to\ngenerate code needed for performing the computation before benchmarking starts,\nletting us measure only the performance of the computation itself.\n\n\n\n\nWarning\n\n\npytaco.evaluate\n and \npytaco.einsum\n should not be used to benchmark\nTACO's computational performance, since timing those functions will\ninclude the time it takes to generate code for performing the computation.\n\n\n\n\nThe time it takes to construct the initial input tensors should also not be\nmeasured\n, since again this overhead can often be amortized in practice.  By\ndefault, \npytaco.read\n and functions for converting NumPy arrays and SciPy\nmatrices to TACO tensors return fully constructed tensors.  If you add nonzero\nelements to an input tensor by calling \ninsert\n though, then \npack\n must also\nbe explicitly invoked before any benchmarking is done:\n\n\nimport pytaco as pt\nfrom pytaco import compressed, dense\nimport numpy as np\nimport random\nimport time\n\ncsr = pt.format([dense, compressed])\ndv  = pt.format([dense])\n\nA = pt.read(\"pwtk.mtx\", csr)\nx = pt.tensor([A.shape[1]], dv)\nz = pt.tensor([A.shape[0]], dv)\ny = pt.tensor([A.shape[0]], dv)\n\n# Insert random values into x and z and pack them into dense arrays\nfor k in range(A.shape[1]):\n  x.insert([k], random.random())\nx.pack()\nfor k in range(A.shape[0]):\n  z.insert([k], random.random())\nz.pack()\n\ni, j = pt.get_index_vars(2)\ny[i] = A[i, j] * x[j] + z[i]\n\ny.compile()\n\nstart = time.perf_counter()\ny.compute()\nend = time.perf_counter()\n\nprint(\"Execution time: {0} seconds\".format(end - start))\n\n\n\nTACO avoids regenerating code for performing the same computation though as\nlong as the computation is redefined with the same index variables and with the\nsame operand and result tensors.  Thus, if your application executes the same\ncomputation many times in a loop and if the computation is executed on\nsufficiently large data sets, TACO will naturally amortize the overhead\nassociated with generating code for performing the computation.  In such \nscenarios, it is acceptable to include the initial code generation overhead \nin the performance measurement:\n\n\nimport pytaco as pt\nfrom pytaco import compressed, dense\nimport numpy as np\nimport time\n\ncsr = pt.format([dense, compressed])\ndv  = pt.format([dense])\n\nA = pt.read(\"pwtk.mtx\", csr)\nx = pt.tensor([A.shape[1]], dv)\nz = pt.tensor([A.shape[0]], dv)\ny = pt.tensor([A.shape[0]], dv)\n\nfor k in range(A.shape[1]):\n  x.insert([k], random.random())\nx.pack()\nfor k in range(A.shape[0]):\n  z.insert([k], random.random())\nz.pack()\n\ni, j = pt.get_index_vars(2)\n\n# Benchmark the iterative SpMV computation, including overhead for \n# generating code in the first iteration to perform the computation\nstart = time.perf_counter()\nfor k in range(1000):\n  y[i] = A[i, j] * x[j] + z[i]\n  y.evaluate()\n  x[i] = y[i]\n  x.evaluate()\nend = time.perf_counter()\n\nprint(\"Execution time: {0} seconds\".format(end - start))\n\n\n\n\n\nWarning\n\n\nIn order to avoid regenerating code for performing a computation, the\ncomputation must be redefined with the exact same index variable \nobjects\n\nand also with the exact same tensor objects for operands and result.  In\nthe example above, every loop iteration redefines the computation of \ny\n\nand \nx\n using the same tensor and index variable objects costructed outside\nthe loop, so TACO will only generate code to compute \ny\n and \nx\n in the\nfirst iteration.  If the index variables were constructed inside the loop\nthough, TACO would regenerate code to compute \ny\n and \nx\n in every loop\niteration, and the compilation overhead would not be amortized. \n\n\n\n\n\n\nNote\n\n\nAs a rough rule of thumb, if a computation takes on the order of seconds or\nmore in total to perform across all invocations with identical operands and\nresult (and is always redefined with identical index variables), then it is\nacceptable to include the overhead associated with generating code for\nperforming the computation in performance measurements.", 
+            "text": "The performance of Python applications that use TACO can be measured using\nPython's built-in \ntime.perf_counter\n function with minimal changes to the\napplications.  As an example, we can benchmark the performance of the\nscientific computing application shown \nhere\n as\nfollows:\n\n\nimport pytaco as pt\nfrom pytaco import compressed, dense\nimport numpy as np\nimport time\n\ncsr = pt.format([dense, compressed])\ndv  = pt.format([dense])\n\nA = pt.read(\"pwtk.mtx\", csr)\nx = pt.from_array(np.random.uniform(size=A.shape[1]))\nz = pt.from_array(np.random.uniform(size=A.shape[0]))\ny = pt.tensor([A.shape[0]], dv)\n\ni, j = pt.get_index_vars(2)\ny[i] = A[i, j] * x[j] + z[i]\n\n# Tell TACO to generate code to perform the SpMV computation\ny.compile()\n\n# Benchmark the actual SpMV computation\nstart = time.perf_counter()\ny.compute()\nend = time.perf_counter()\n\nprint(\"Execution time: {0} seconds\".format(end - start))\n\n\n\nIn order to accurately measure TACO's computational performance, \nonly the\ntime it takes to actually perform a computation should be measured.  The time\nit takes to generate code under the hood for performing that computation should\nnot be measured\n, since this overhead can be quite variable but can often be\namortized in practice.  By default though, TACO will only generate and compile\ncode it needs for performing a computation immediately before it has to\nactually perform the computation.  As the example above demonstrates, by\nmanually calling the result tensor's \ncompile\n method, we can tell TACO to\ngenerate code needed for performing the computation before benchmarking starts,\nletting us measure only the performance of the computation itself.\n\n\n\n\nWarning\n\n\npytaco.evaluate\n and \npytaco.einsum\n should not be used to benchmark\nTACO's computational performance, since timing those functions will\ninclude the time it takes to generate code for performing the computation.\n\n\n\n\nThe time it takes to construct the initial operand tensors should also not be\nmeasured\n, since again this overhead can often be amortized in practice.  By\ndefault, \npytaco.read\n and functions for converting NumPy arrays and SciPy\nmatrices to TACO tensors return fully constructed tensors.  If you add nonzero\nelements to an operand tensor by invoking its \ninsert\n method though, then\n\npack\n must also be explicitly invoked before any benchmarking is done:\n\n\nimport pytaco as pt\nfrom pytaco import compressed, dense\nimport numpy as np\nimport random\nimport time\n\ncsr = pt.format([dense, compressed])\ndv  = pt.format([dense])\n\nA = pt.read(\"pwtk.mtx\", csr)\nx = pt.tensor([A.shape[1]], dv)\nz = pt.tensor([A.shape[0]], dv)\ny = pt.tensor([A.shape[0]], dv)\n\n# Insert random values into x and z and pack them into dense arrays\nfor k in range(A.shape[1]):\n  x.insert([k], random.random())\nx.pack()\nfor k in range(A.shape[0]):\n  z.insert([k], random.random())\nz.pack()\n\ni, j = pt.get_index_vars(2)\ny[i] = A[i, j] * x[j] + z[i]\n\ny.compile()\n\nstart = time.perf_counter()\ny.compute()\nend = time.perf_counter()\n\nprint(\"Execution time: {0} seconds\".format(end - start))\n\n\n\nTACO avoids regenerating code for performing the same computation though as\nlong as the computation is redefined with the same index variables and with the\nsame operand and result tensors.  Thus, if your application executes the same\ncomputation many times in a loop and if the computation is executed on\nsufficiently large data sets, TACO will naturally amortize the overhead\nassociated with generating code for performing the computation.  In such \nscenarios, it is acceptable to include the initial code generation overhead \nin the performance measurement:\n\n\nimport pytaco as pt\nfrom pytaco import compressed, dense\nimport numpy as np\nimport time\n\ncsr = pt.format([dense, compressed])\ndv  = pt.format([dense])\n\nA = pt.read(\"pwtk.mtx\", csr)\nx = pt.tensor([A.shape[1]], dv)\nz = pt.tensor([A.shape[0]], dv)\ny = pt.tensor([A.shape[0]], dv)\n\nfor k in range(A.shape[1]):\n  x.insert([k], random.random())\nx.pack()\nfor k in range(A.shape[0]):\n  z.insert([k], random.random())\nz.pack()\n\ni, j = pt.get_index_vars(2)\n\n# Benchmark the iterative SpMV computation, including overhead for \n# generating code in the first iteration to perform the computation\nstart = time.perf_counter()\nfor k in range(1000):\n  y[i] = A[i, j] * x[j] + z[i]\n  y.evaluate()\n  x[i] = y[i]\n  x.evaluate()\nend = time.perf_counter()\n\nprint(\"Execution time: {0} seconds\".format(end - start))\n\n\n\n\n\nWarning\n\n\nIn order to avoid regenerating code for performing a computation, the\ncomputation must be redefined with the exact same index variable \nobjects\n\nand also with the exact same tensor objects for operands and result.  In\nthe example above, every loop iteration redefines the computation of \ny\n\nand \nx\n using the same tensor and index variable objects costructed outside\nthe loop, so TACO will only generate code to compute \ny\n and \nx\n in the\nfirst iteration.  If the index variables were constructed inside the loop\nthough, TACO would regenerate code to compute \ny\n and \nx\n in every loop\niteration, and the compilation overhead would not be amortized. \n\n\n\n\n\n\nNote\n\n\nAs a rough rule of thumb, if a computation takes on the order of seconds or\nmore in total to perform across all invocations with identical operands and\nresult (and is always redefined with identical index variables), then it is\nacceptable to include the overhead associated with generating code for\nperforming the computation in performance measurements.", 
             "title": "Guide to Benchmarking"
         }
     ]
diff --git a/sdh_documentation/docs/pytensors.md b/sdh_documentation/docs/pytensors.md
@@ -4,7 +4,7 @@
 the TACO Python library. You can can declare a new tensor by specifying the
 sizes of each dimension, the [format](pytensors.md#defining-tensor-formats)
 that will be used to store the tensor, and the
-[datatype](pytensors.md#tensor-datatypes) of the tensor's nonzero elements:
+[datatype](reference/rst_files/dtype_object.html) of the tensor's nonzero elements:
 
 ```python
 # Import the TACO Python library

Original file line number	Diff line number	Diff line change
`@@ -112,7 +112,7 @@`
`112`	`112`	`},`
`113`	`113`	`{`
`114`	`114`	`"location": "/benchmarking/index.html",`
`115`		- "text": "The performance of Python applications that use TACO can be measured using\nPython's built-in \ntime.perf_counter\n function with minimal changes to the\napplications. As an example, we can benchmark the performance of the\nscientific computing application shown \nhere\n as\nfollows:\n\n\nimport pytaco as pt\nfrom pytaco import compressed, dense\nimport numpy as np\nimport time\n\ncsr = pt.format([dense, compressed])\ndv = pt.format([dense])\n\nA = pt.read(\"pwtk.mtx\", csr)\nx = pt.from_array(np.random.uniform(size=A.shape[1]))\nz = pt.from_array(np.random.uniform(size=A.shape[0]))\ny = pt.tensor([A.shape[0]], dv)\n\ni, j = pt.get_index_vars(2)\ny[i] = A[i, j] * x[j] + z[i]\n\n# Tell TACO to generate code to perform the SpMV computation\ny.compile()\n\n# Benchmark the actual SpMV computation\nstart = time.perf_counter()\ny.compute()\nend = time.perf_counter()\n\nprint(\"Execution time: {0} seconds\".format(end - start))\n\n\n\nIn order to accurately measure TACO's computational performance, \nonly the\ntime it takes to actually perform a computation should be measured. The time\nit takes to generate code under the hood for performing that computation should\nnot be measured\n, since this overhead can be quite variable but can often be\namortized in practice. By default though, TACO will only generate and compile\ncode it needs for performing a computation immediately before it has to\nactually perform the computation. As the example above demonstrates, by\nmanually calling the result tensor's \ncompile\n method, we can tell TACO to\ngenerate code needed for performing the computation before benchmarking starts,\nletting us measure only the performance of the computation itself.\n\n\n\n\nWarning\n\n\npytaco.evaluate\n and \npytaco.einsum\n should not be used to benchmark\nTACO's computational performance, since timing those functions will\ninclude the time it takes to generate code for performing the computation.\n\n\n\n\nThe time it takes to construct the initial input tensors should also not be\nmeasured\n, since again this overhead can often be amortized in practice. By\ndefault, \npytaco.read\n and functions for converting NumPy arrays and SciPy\nmatrices to TACO tensors return fully constructed tensors. If you add nonzero\nelements to an input tensor by calling \ninsert\n though, then \npack\n must also\nbe explicitly invoked before any benchmarking is done:\n\n\nimport pytaco as pt\nfrom pytaco import compressed, dense\nimport numpy as np\nimport random\nimport time\n\ncsr = pt.format([dense, compressed])\ndv = pt.format([dense])\n\nA = pt.read(\"pwtk.mtx\", csr)\nx = pt.tensor([A.shape[1]], dv)\nz = pt.tensor([A.shape[0]], dv)\ny = pt.tensor([A.shape[0]], dv)\n\n# Insert random values into x and z and pack them into dense arrays\nfor k in range(A.shape[1]):\n x.insert([k], random.random())\nx.pack()\nfor k in range(A.shape[0]):\n z.insert([k], random.random())\nz.pack()\n\ni, j = pt.get_index_vars(2)\ny[i] = A[i, j] * x[j] + z[i]\n\ny.compile()\n\nstart = time.perf_counter()\ny.compute()\nend = time.perf_counter()\n\nprint(\"Execution time: {0} seconds\".format(end - start))\n\n\n\nTACO avoids regenerating code for performing the same computation though as\nlong as the computation is redefined with the same index variables and with the\nsame operand and result tensors. Thus, if your application executes the same\ncomputation many times in a loop and if the computation is executed on\nsufficiently large data sets, TACO will naturally amortize the overhead\nassociated with generating code for performing the computation. In such \nscenarios, it is acceptable to include the initial code generation overhead \nin the performance measurement:\n\n\nimport pytaco as pt\nfrom pytaco import compressed, dense\nimport numpy as np\nimport time\n\ncsr = pt.format([dense, compressed])\ndv = pt.format([dense])\n\nA = pt.read(\"pwtk.mtx\", csr)\nx = pt.tensor([A.shape[1]], dv)\nz = pt.tensor([A.shape[0]], dv)\ny = pt.tensor([A.shape[0]], dv)\n\nfor k in range(A.shape[1]):\n x.insert([k], random.random())\nx.pack()\nfor k in range(A.shape[0]):\n z.insert([k], random.random())\nz.pack()\n\ni, j = pt.get_index_vars(2)\n\n# Benchmark the iterative SpMV computation, including overhead for \n# generating code in the first iteration to perform the computation\nstart = time.perf_counter()\nfor k in range(1000):\n y[i] = A[i, j] * x[j] + z[i]\n y.evaluate()\n x[i] = y[i]\n x.evaluate()\nend = time.perf_counter()\n\nprint(\"Execution time: {0} seconds\".format(end - start))\n\n\n\n\n\nWarning\n\n\nIn order to avoid regenerating code for performing a computation, the\ncomputation must be redefined with the exact same index variable \nobjects\n\nand also with the exact same tensor objects for operands and result. In\nthe example above, every loop iteration redefines the computation of \ny\n\nand \nx\n using the same tensor and index variable objects costructed outside\nthe loop, so TACO will only generate code to compute \ny\n and \nx\n in the\nfirst iteration. If the index variables were constructed inside the loop\nthough, TACO would regenerate code to compute \ny\n and \nx\n in every loop\niteration, and the compilation overhead would not be amortized. \n\n\n\n\n\n\nNote\n\n\nAs a rough rule of thumb, if a computation takes on the order of seconds or\nmore in total to perform across all invocations with identical operands and\nresult (and is always redefined with identical index variables), then it is\nacceptable to include the overhead associated with generating code for\nperforming the computation in performance measurements.",
	`115`	+ "text": "The performance of Python applications that use TACO can be measured using\nPython's built-in \ntime.perf_counter\n function with minimal changes to the\napplications. As an example, we can benchmark the performance of the\nscientific computing application shown \nhere\n as\nfollows:\n\n\nimport pytaco as pt\nfrom pytaco import compressed, dense\nimport numpy as np\nimport time\n\ncsr = pt.format([dense, compressed])\ndv = pt.format([dense])\n\nA = pt.read(\"pwtk.mtx\", csr)\nx = pt.from_array(np.random.uniform(size=A.shape[1]))\nz = pt.from_array(np.random.uniform(size=A.shape[0]))\ny = pt.tensor([A.shape[0]], dv)\n\ni, j = pt.get_index_vars(2)\ny[i] = A[i, j] * x[j] + z[i]\n\n# Tell TACO to generate code to perform the SpMV computation\ny.compile()\n\n# Benchmark the actual SpMV computation\nstart = time.perf_counter()\ny.compute()\nend = time.perf_counter()\n\nprint(\"Execution time: {0} seconds\".format(end - start))\n\n\n\nIn order to accurately measure TACO's computational performance, \nonly the\ntime it takes to actually perform a computation should be measured. The time\nit takes to generate code under the hood for performing that computation should\nnot be measured\n, since this overhead can be quite variable but can often be\namortized in practice. By default though, TACO will only generate and compile\ncode it needs for performing a computation immediately before it has to\nactually perform the computation. As the example above demonstrates, by\nmanually calling the result tensor's \ncompile\n method, we can tell TACO to\ngenerate code needed for performing the computation before benchmarking starts,\nletting us measure only the performance of the computation itself.\n\n\n\n\nWarning\n\n\npytaco.evaluate\n and \npytaco.einsum\n should not be used to benchmark\nTACO's computational performance, since timing those functions will\ninclude the time it takes to generate code for performing the computation.\n\n\n\n\nThe time it takes to construct the initial operand tensors should also not be\nmeasured\n, since again this overhead can often be amortized in practice. By\ndefault, \npytaco.read\n and functions for converting NumPy arrays and SciPy\nmatrices to TACO tensors return fully constructed tensors. If you add nonzero\nelements to an operand tensor by invoking its \ninsert\n method though, then\n\npack\n must also be explicitly invoked before any benchmarking is done:\n\n\nimport pytaco as pt\nfrom pytaco import compressed, dense\nimport numpy as np\nimport random\nimport time\n\ncsr = pt.format([dense, compressed])\ndv = pt.format([dense])\n\nA = pt.read(\"pwtk.mtx\", csr)\nx = pt.tensor([A.shape[1]], dv)\nz = pt.tensor([A.shape[0]], dv)\ny = pt.tensor([A.shape[0]], dv)\n\n# Insert random values into x and z and pack them into dense arrays\nfor k in range(A.shape[1]):\n x.insert([k], random.random())\nx.pack()\nfor k in range(A.shape[0]):\n z.insert([k], random.random())\nz.pack()\n\ni, j = pt.get_index_vars(2)\ny[i] = A[i, j] * x[j] + z[i]\n\ny.compile()\n\nstart = time.perf_counter()\ny.compute()\nend = time.perf_counter()\n\nprint(\"Execution time: {0} seconds\".format(end - start))\n\n\n\nTACO avoids regenerating code for performing the same computation though as\nlong as the computation is redefined with the same index variables and with the\nsame operand and result tensors. Thus, if your application executes the same\ncomputation many times in a loop and if the computation is executed on\nsufficiently large data sets, TACO will naturally amortize the overhead\nassociated with generating code for performing the computation. In such \nscenarios, it is acceptable to include the initial code generation overhead \nin the performance measurement:\n\n\nimport pytaco as pt\nfrom pytaco import compressed, dense\nimport numpy as np\nimport time\n\ncsr = pt.format([dense, compressed])\ndv = pt.format([dense])\n\nA = pt.read(\"pwtk.mtx\", csr)\nx = pt.tensor([A.shape[1]], dv)\nz = pt.tensor([A.shape[0]], dv)\ny = pt.tensor([A.shape[0]], dv)\n\nfor k in range(A.shape[1]):\n x.insert([k], random.random())\nx.pack()\nfor k in range(A.shape[0]):\n z.insert([k], random.random())\nz.pack()\n\ni, j = pt.get_index_vars(2)\n\n# Benchmark the iterative SpMV computation, including overhead for \n# generating code in the first iteration to perform the computation\nstart = time.perf_counter()\nfor k in range(1000):\n y[i] = A[i, j] * x[j] + z[i]\n y.evaluate()\n x[i] = y[i]\n x.evaluate()\nend = time.perf_counter()\n\nprint(\"Execution time: {0} seconds\".format(end - start))\n\n\n\n\n\nWarning\n\n\nIn order to avoid regenerating code for performing a computation, the\ncomputation must be redefined with the exact same index variable \nobjects\n\nand also with the exact same tensor objects for operands and result. In\nthe example above, every loop iteration redefines the computation of \ny\n\nand \nx\n using the same tensor and index variable objects costructed outside\nthe loop, so TACO will only generate code to compute \ny\n and \nx\n in the\nfirst iteration. If the index variables were constructed inside the loop\nthough, TACO would regenerate code to compute \ny\n and \nx\n in every loop\niteration, and the compilation overhead would not be amortized. \n\n\n\n\n\n\nNote\n\n\nAs a rough rule of thumb, if a computation takes on the order of seconds or\nmore in total to perform across all invocations with identical operands and\nresult (and is always redefined with identical index variables), then it is\nacceptable to include the overhead associated with generating code for\nperforming the computation in performance measurements.",
`116`	`116`	`"title": "Guide to Benchmarking"`
`117`	`117`	`}`
`118`	`118`	`]`