Skip to content
This repository was archived by the owner on Apr 28, 2023. It is now read-only.

Commit a4cc51d

Browse files
committed
Address feedback on tutorials and other nits
1 parent ea0c39d commit a4cc51d

11 files changed

+135
-93
lines changed

docs/source/framework/pytorch_integration/autograd_with_tc.rst

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,11 @@ a training layer with TC and be able to run backwards as well if the layer is pa
66
of a network. In order to write a training layer with TC, you need to follow the
77
steps below:
88

9-
1. Define your TC language that has two definitions: one for the forward layer
10-
and the other for the backward layer and pass it to :code:`tc.define` call. In
11-
addition, also pass :code:`training=True` and the name of the backward TC :code:`backward`.
9+
1. Define your TC language that has two definitions: one for the forward layer and the other for the backward layer and pass it to :code:`tc.define` call. In addition, also pass :code:`training=True` and the name of the backward TC :code:`backward`.
1210

13-
2. Create the Input Variables and Parameters. For example, weights should be marked
14-
as Parameters and the inputs tensors as Variables.
11+
2. Create the Input Variables and Parameters. For example, weights should be marked as Parameters and the inputs tensors as Variables.
1512

16-
3. Run the layer and get the output of forward pass
13+
3. Run the layer and get the output of forward pass.
1714

1815
4. To see that the backward call works fine, you can call backward on the outputs.
1916

@@ -79,7 +76,7 @@ them, the example for that would be:
7976
In order to obtain options via autotuning for backward and forward layer, keep reading further.
8077

8178

82-
Autotuning Training Layer
79+
Autotuning training layer
8380
-------------------------
8481

8582
You can autotune a training layer easily. The forward and backward layers will
@@ -114,7 +111,7 @@ You will find two cache files created: :code:`convolution_train.cuda/options` ha
114111
options for the forward layer and :code:`convolution_train_backward.cuda/options` file
115112
has options for the grad layer.
116113

117-
Reordering Grad Outputs
114+
Reordering grad outputs
118115
-----------------------
119116

120117
In the backward pass, TC uses the list of input tensors in the forward pass and appends

docs/source/framework/pytorch_integration/autotuning_layers.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
.. _pytorch_autotune_layers:
22

3-
Autotuning Layers
3+
Autotuning layers
44
=================
55

66
TC provides a genetic search based autotuner that can be used to optimize a TC on
@@ -47,7 +47,7 @@ my_layer.autotune
4747

4848
.. _autotune_parameters:
4949

50-
Autotuning Parameters
50+
Autotuning parameters
5151
---------------------
5252

5353
Autotuner exposes various parameters that can be adjusted to control amount of tuning.
@@ -120,7 +120,7 @@ An example for how to pass options:
120120
121121
.. _autotuner_cache_choices:
122122

123-
Caching Autotuned options
123+
Caching autotuned options
124124
-------------------------
125125

126126
As user autotunes kernels on given input tensor sizes, user can also cache the options
@@ -195,7 +195,7 @@ For example:
195195
out2 = matmul(mat1, mat2)
196196
197197
198-
Using Tuple sizes to Autotune
198+
Using tuple sizes to autotune
199199
-----------------------------
200200

201201
If you want to autotune a kernel on variety of sizes and store the cache for later
@@ -227,7 +227,7 @@ The API description is given below:
227227

228228
.. autofunction:: decode
229229

230-
Decoding Example
230+
Decoding example
231231
^^^^^^^^^^^^^^^^
232232

233233
Below is example describing the above usage:

docs/source/framework/pytorch_integration/debugging.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ In order to use enable these flags, you need to call :code:`tc.GlobalDebugInit`
2020
and set the proper flags to :code:`true`. All of these flags are :code:`boolean`
2121
flags that take values :code:`true` or :code:`false`.
2222

23-
Example Usage
23+
Example usage
2424
-------------
2525

2626
.. code-block:: python

docs/source/framework/pytorch_integration/frequently_asked_questions.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,8 +67,8 @@ This TC is invalid because :code:`tmp` and :code:`O(n, d)` have cyclic dependenc
6767
}
6868
6969
70-
Autotuner FAQ
71-
-------------
70+
Autotuner
71+
---------
7272

7373
At the start of new generation, I see high kernel runtime, Why?
7474
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

docs/source/framework/pytorch_integration/getting_started.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,9 @@ A **few cases** where TC can be useful:
1616

1717
* you are interested in fusing layers like group convolution, ReLU, FC *or*
1818

19-
* if you have a different new layer, let's call it :code:`hconv` (a variant of convolution), for which you wish you had an efficient kernel available
19+
* if you have a different new layer, let's call it :code:`hconv` (a variant of convolution), for which you wish you had an efficient kernel available *or*
20+
21+
* if you have standard operation on different data layouts that you didn't want to use because you couldn't get good kernels for them
2022

2123
TC makes its very trivial to get CUDA code for such cases and many more. By providing
2224
TC integration with PyTorch, we hope to make it further easy for PyTorch users

docs/source/framework/pytorch_integration/layers_database.rst

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ An example to do so:
2727
Pooling Layers
2828
--------------
2929

30-
Average Pooling
30+
Average pooling
3131
^^^^^^^^^^^^^^^
3232

3333
.. code::
@@ -37,7 +37,7 @@ Average Pooling
3737
}}
3838
3939
40-
Max Pooling
40+
Max pooling
4141
^^^^^^^^^^^
4242

4343
.. code::
@@ -46,7 +46,7 @@ Max Pooling
4646
output(b, c, h, w) max= input(b, c, h * {sH} + kh, w * {sW} + kw) where kh in 0:{kH}, kw in 0:{kW}
4747
}}
4848
49-
Convolution Layers
49+
Convolution layers
5050
------------------
5151

5252
Simple Convolution
@@ -99,7 +99,7 @@ Group Convolution Strided
9999
O(n, g, f, h, w) = O(n, g, f, h, w) + B(g, f)
100100
}}
101101
102-
Linear Layers
102+
Linear layers
103103
-------------
104104

105105
Fully Connected layer
@@ -277,7 +277,7 @@ Scale
277277
O(m, n) = I(m, n) * {s}
278278
}}
279279
280-
Fused Layers
280+
Fused layers
281281
------------
282282

283283
FCRelu
@@ -307,7 +307,7 @@ Small MobileNet
307307
O2(c2, h, w) = fmax(O2(c2, h, w), 0)
308308
}
309309
310-
Normalization Layers
310+
Normalization layers
311311
--------------------
312312

313313
Batch Normalization
@@ -358,7 +358,7 @@ Cosine Similarity
358358
359359
What operations can not be expressed
360360
------------------------------------
361-
* **Reshaping** Tensors inside the language
362-
* **Dropout** : RNGs are not suppported inside TC language, because TC doesn't do internal allocations
363-
* **Strided "tensors"** : input Tensors have to be contiguous. If they are not contiguous, they are made contiguous before passing to the TC backend.
364-
* **RNNs** : TC language doesn't have loops yet. You can write them unrolled if you want.
361+
* **Reshape**: Reshaping tensors inside the language.
362+
* **Dropout**: RNGs are not supported inside TC language, because TC doesn't do internal allocations.
363+
* **Strided tensors**: Input tensors have to be contiguous. If they are not contiguous, they are made contiguous before passing to the TC backend.
364+
* **RNNs**: TC language doesn't have loops yet. You can write them unrolled if you want.

docs/source/framework/pytorch_integration/note_about_performance.rst

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
Note about Performance/Autotuning
2-
=================================
1+
Note about Performance / Autotuning
2+
===================================
33

4-
Reuse Outputs
4+
Reuse outputs
55
-------------
66

77
TC depends on a tensor library to do the allocations for temporary variables or output tensors.
@@ -29,19 +29,21 @@ argument when you run the TC. For a concrete example:
2929
matmul(mat3, mat4, outputs=out) # outputs re-used
3030
3131
32-
Static sizes for Autotuning
32+
Static sizes for autotuning
3333
---------------------------
3434

3535
Tensor Comprehensions have an autotuner that uses evolutionary search to find
3636
faster kernels. TC tries to specialize the kernels to the given input sizes.
3737
If the sizes are parametric, then the search space will become bigger and the performance
3838
is not as good static input sizes. Hence, for now, TC takes static input sizes. More
39-
concretely,
39+
concretely:
40+
4041

4142
1. you can not tune a kernel for parametric size ranges like batchsize between 16 and 32.
4243

43-
2. you can tune a kernel let's say :code:`avgpool` for input shape :code:`(16, 32, 24, 23)`
44-
by simply calling:
44+
45+
2. you can tune a kernel let's say :code:`avgpool` for input shape :code:`(16, 32, 24, 23)` by simply calling:
46+
4547

4648
.. code::
4749

docs/source/framework/pytorch_integration/writing_layers.rst

Lines changed: 21 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -57,13 +57,13 @@ There are two ways to set the :code:`Options`:
5757

5858
* **Autotuning**: You can autotune the kernel the kernel on certain input tensor sizes, cache the options and use them to run the layer. See :ref:`pytorch_autotune_layers` for how to autotune kernels.
5959

60-
* **Default Mapping**: We provide various default options that can be chosen to closely represent the kernel. THe defaults provided are:
60+
* **Default Mapping**: We provide various default options that can be chosen to closely represent the kernel. The defaults provided are:
6161

6262
* :code:`pointwise`: if kernel resembles a pointwise operation
6363
* :code:`mlp`: if kernel resembles an Linear layer operation
6464
* :code:`conv`: if kernel resembles a convolution operation
65-
* :code:`group_conv`: if kernel resembles a convolution operation
66-
* :code:`naive`: if none of the above, then chose naive Default
65+
* :code:`group_conv`: if kernel resembles a group convolution operation
66+
* :code:`naive`: if none of the above, then chose naive default
6767

6868
An example for how to pass options:
6969

@@ -126,9 +126,12 @@ happens only once and then you can keep running the layer.
126126
Multiple TC definitions in language
127127
-----------------------------------
128128

129-
Let's say you want to define all of your TCs in one string and later keep running
130-
them. You an do so easily. Every time you want to run a different layer, you can
131-
make a :code:`tc.define` call and get the layer.
129+
Let's say you want to define all of your TCs in one string and later use that string
130+
for running different operations defined in the string. You an do so easily. You
131+
can define a :code:`lang` variable that holds the TC definition for all your operations.
132+
Every time you want to run a different operation, you can make a :code:`tc.define` call
133+
on the :code:`lang` variable, specify the :code:`name` corresponding to the operation
134+
definition and get the TC layer for it. Below is an example for how to do this:
132135

133136
.. code-block:: python
134137
@@ -215,7 +218,7 @@ adopt whatever feels more convenient.
215218
out = avgpool(inp)
216219
217220
218-
Manually Injecting external CUDA code
221+
Manually injecting external CUDA code
219222
-------------------------------------
220223

221224
If you have an external efficient CUDA code that you want to use rather than
@@ -248,17 +251,19 @@ call. For example:
248251
a, b = torch.randn(100).cuda(), torch.randn(100).cuda()
249252
out = add(a, b, grid=[1, 1, 1], block=[100, 1, 1])
250253
251-
In such cases, please note that TC doesn't modify the injected CUDA kernel. It will
252-
simply run the kernel injected as is and TC will also not guarantee the performance
253-
of the kernel. User needs to specify the :code:`grid` and :code:`block` values
254-
when running the layer and TC will simply use those settings.
254+
.. note::
255+
256+
In such cases, please note that TC doesn't modify the injected CUDA kernel. It will
257+
simply run the kernel injected as is and TC will also not guarantee the performance
258+
of the kernel. User needs to specify the :code:`grid` and :code:`block` values
259+
when running the layer and TC will simply use those settings.
255260

256261

257-
Builtin Functions
258-
-----------------
262+
Built-in Functions
263+
------------------
259264

260-
TC allows using some CUDA builtin functions as well when defining the TC language.
261-
During the execution, CUDA API will be called for those builtin functions. For example,
265+
TC allows using some CUDA built-in functions as well when defining the TC language.
266+
During the execution, CUDA API will be called for those built-in functions. For example,
262267
let's say we want to use :code:`fmax` CUDA function in our TC language. An example
263268
for how this would be done is below:
264269

@@ -275,7 +280,7 @@ for how this would be done is below:
275280
inp = torch.randn(100, 128).cuda()
276281
out = relu(inp)
277282
278-
TC supports only a few builtin CUDA functions and not all. You can find the documentation
283+
TC only supports a subset of built-in CUDA functions. You can find the documentation
279284
for these functions at the official CUDA documentation `here <http://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__SINGLE.html#group__CUDA__MATH__SINGLE>`_.
280285
The functions supported in TC are:
281286

docs/source/introduction.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ More generally the only requirement to integrate TC into a workflow is to use a
3434
simple tensor library with a few basic functionalities. For more details, see
3535
:ref:`integrating_ml_frameworks`.
3636

37+
.. _tc_einstein_notation:
38+
3739
Tensor Comprehension Notation
3840
-----------------------------
3941
TC borrow three ideas from Einstein notation that make expressions concise:

docs/source/tutorials/index.rst

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,22 +4,24 @@ Tensor Comprehensions Tutorials
44
**Author**: `Priya Goyal <https://github.com/prigoyal>`_
55

66
Tensor Comprehensions (TC) is a framework agnostic library to **automatically**
7-
synthesize high-performance Machine Learning kernels. TC relies on
7+
synthesize high-performance machine learning kernels. TC relies on
88
`Halide <https://github.com/halide/Halide>`_ IR to express computation and analysis
99
tools to reason about it. TC uses :code:`polyhedral` compilation techniques to
1010
(semi-)automatically decide how to perform this computation efficiently and produce
1111
fast code. We also provide TC integration with PyTorch and Caffe2.
1212

13+
To automatically tune the performance of the kernel, we provide a genetic algorithms
14+
based **Autotuner** details of which are available at :ref:`pytorch_autotune_layers`.
15+
1316
To read more about Tensor Comprehensions, see our documentation available
1417
at https://facebookresearch.github.io/TensorComprehensions/ and C++ API documentation is
1518
available at https://facebookresearch.github.io/TensorComprehensions/api.
1619

1720
We provide many **python examples** for expressing and running various different ML layers
1821
with TC. The examples can be found `here <https://github.com/facebookresearch/TensorComprehensions/tree/master/test_python/layers>`_.
1922

20-
To read more about Framework integrations, checkout our documentation on `PyTorch <https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/getting_started.html>`_ integration
21-
and `Caffe2 <https://facebookresearch.github.io/TensorComprehensions/framework/caffe2_integration/integration_with_example.html>`_
22-
integration.
23+
To read more about Framework integrations, checkout our documentation on `PyTorch integration <https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/getting_started.html>`_
24+
and `Caffe2 integration <https://facebookresearch.github.io/TensorComprehensions/framework/caffe2_integration/integration_with_example.html>`_.
2325

2426
If you want to **integrate your framework** with TC, it's easy and the instructions are
2527
available at https://facebookresearch.github.io/TensorComprehensions/integrating_any_ml_framework.html

0 commit comments

Comments
 (0)