Skip to content

Commit 5f0b6e8

Browse files
Merge pull request #218 from ShikharJ/Test
* Refactor FastGRNN CUDA Setup * Update c_reference README.md * Update pytorch/requirements * Add FastGRNNCUDA to FastCells and Fix torch.randn() Argument Errors * Update README
2 parents cbba9f8 + f10b009 commit 5f0b6e8

File tree

10 files changed

+52
-40
lines changed

10 files changed

+52
-40
lines changed

c_reference/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ and is to be adapted as needed for other embedded platforms.
1111
The `EdgeML/c_reference/` directory is broadly structured into the following sub-directories:
1212

1313
- **include/**: Contains the header files for various lower level operators and layers.
14-
- **models/**: Contains the optimized source code and header files for various models built by stiching together different layers and operators. Also contains the layer weights and hyper-parameters for the corresponding models as well (stored using `Git LFS`).
14+
- **models/**: Contains the optimized source code and header files for various models built by stiching together different layers and operators. Also contains the layer weights and hyper-parameters for the corresponding models as well (stored using `Git LFS`). (**Note:** Cloning the repo without installing `Git LFS` would fail to clone the actual headers. It's recommended to follow instructions on setting up `LFS` from [here](https://git-lfs.github.com/) before cloning.)
1515
- **src/**: Contains the optimized source code files for various lower level operators and layers.
1616
- **tests/**: Contains extensive test cases for individual operators and layers, as well as the implemented models. The executables are generated in the main directory itself, while the test scripts and their configurations can be accessed in the appropriate sub-directories.
1717

examples/pytorch/FastCells/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@ features like low-rank parameterisation and custom non-linearities. Akin to
1515
Bonsai and ProtoNN, the three-phase training routine for FastRNN and FastGRNN
1616
is decoupled from the custom cells to facilitate a plug and play behaviour of
1717
the custom RNN cells in other architectures (NMT, Encoder-Decoder etc.).
18-
Additionally, numerically equivalent CUDA-based implementations FastRNNCuda
19-
and FastGRNNCuda are provided for faster training.
18+
Additionally, numerically equivalent CUDA-based implementations **FastRNNCUDA**
19+
and **FastGRNNCUDA** are provided for faster training.
2020
`edgeml_pytorch.graph.rnn` also contains modified RNN cells of **UGRNNCell**,
2121
**GRUCell**, and **LSTMCell**, which can be substituted for Fast(G)RNN,
2222
as well as untrolled RNNs which are equivalent to `nn.LSTM` and `nn.GRU`.
@@ -67,9 +67,9 @@ Final Test Accuracy: 0.9347
6767
6868
Non-Zeros: 1932 Model Size: 7.546875 KB hasSparse: False
6969
```
70-
`usps10/` directory will now have a consolidated results file called `FastRNNResults.txt` or
71-
`FastGRNNResults.txt` depending on the choice of the RNN cell. A directory `FastRNNResults` or
72-
`FastGRNNResults` with the corresponding models with each run of the code on the `usps10` dataset.
70+
`usps10/` directory will now have a consolidated results file called `FastRNNResults.txt`,
71+
`FastGRNNResults.txt` or `FastGRNNCUDAResults.txt` depending on the choice of the RNN cell. A directory `FastRNNResults`,
72+
`FastGRNNResults` or `FastGRNNCUDAResults` with the corresponding models with each run of the code on the `usps10` dataset.
7373

7474
Note that the scalars like `alpha`, `beta`, `zeta` and `nu` correspond to the values before
7575
the application of the sigmoid function.

examples/pytorch/FastCells/fastcell_example.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,11 @@ def main():
5757
gate_nonlinearity=gate_non_linearity,
5858
update_nonlinearity=update_non_linearity,
5959
wRank=wRank, uRank=uRank)
60+
elif cell == "FastGRNNCUDA":
61+
FastCell = FastGRNNCUDACell(inputDims, hiddenDims,
62+
gate_nonlinearity=gate_non_linearity,
63+
update_nonlinearity=update_non_linearity,
64+
wRank=wRank, uRank=uRank)
6065
elif cell == "FastRNN":
6166
FastCell = FastRNNCell(inputDims, hiddenDims,
6267
update_nonlinearity=update_non_linearity,

examples/pytorch/FastCells/helpermethods.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -88,8 +88,8 @@ def getArgs():
8888
'train.npy and test.npy')
8989

9090
parser.add_argument('-c', '--cell', type=str, default="FastGRNN",
91-
help='Choose between [FastGRNN, FastRNN, UGRNN' +
92-
', GRU, LSTM], default: FastGRNN')
91+
help='Choose between [FastGRNN, FastGRNNCUDA, FastRNN,' +
92+
' UGRNN, GRU, LSTM], default: FastGRNN')
9393

9494
parser.add_argument('-id', '--input-dim', type=checkIntNneg, required=True,
9595
help='Input Dimension of RNN, each timestep will ' +

pytorch/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ Install appropriate CUDA and cuDNN [Tested with >= CUDA 8.1 and cuDNN >= 6.1]
6868
```
6969
pip install -r requirements-gpu.txt
7070
pip install -e .
71+
pip install -e edgeml_pytorch/cuda/
7172
```
7273

7374
**Note**: For using the optimized FastGRNNCUDA implementation, it is recommended to use CUDA v10.1, gcc 7.5 and cuDNN v7.6 and torch==1.4.0. Also, there are some known issues when compiling custom CUDA kernels on Windows [pytorch/#11004](https://github.com/pytorch/pytorch/issues/11004).
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
import setuptools #enables develop
2+
import os
3+
from torch.utils.cpp_extension import BuildExtension, CUDAExtension
4+
from edgeml_pytorch.utils import findCUDA
5+
6+
if findCUDA() is not None:
7+
setuptools.setup(
8+
name='fastgrnn_cuda',
9+
ext_modules=[
10+
CUDAExtension('fastgrnn_cuda', [
11+
'fastgrnn_cuda.cpp',
12+
'fastgrnn_cuda_kernel.cu',
13+
]),
14+
],
15+
cmdclass={
16+
'build_ext': BuildExtension
17+
}
18+
)

pytorch/edgeml_pytorch/graph/rnn.py

Lines changed: 16 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,12 @@
99

1010
import edgeml_pytorch.utils as utils
1111

12-
if utils.findCUDA() is not None:
13-
import fastgrnn_cuda
12+
try:
13+
if utils.findCUDA() is not None:
14+
import fastgrnn_cuda
15+
except:
16+
print("Running without FastGRNN CUDA")
17+
pass
1418

1519

1620
# All the matrix vector computations of the form Wx are done
@@ -351,29 +355,29 @@ def __init__(self, input_size, hidden_size, gate_nonlinearity="sigmoid",
351355
self._name = name
352356

353357
if wRank is None:
354-
self.W = nn.Parameter(0.1 * torch.randn([hidden_size, input_size], self.device))
358+
self.W = nn.Parameter(0.1 * torch.randn([hidden_size, input_size], device=self.device))
355359
self.W1 = torch.empty(0)
356360
self.W2 = torch.empty(0)
357361
else:
358362
self.W = torch.empty(0)
359-
self.W1 = nn.Parameter(0.1 * torch.randn([wRank, input_size], self.device))
360-
self.W2 = nn.Parameter(0.1 * torch.randn([hidden_size, wRank], self.device))
363+
self.W1 = nn.Parameter(0.1 * torch.randn([wRank, input_size], device=self.device))
364+
self.W2 = nn.Parameter(0.1 * torch.randn([hidden_size, wRank], device=self.device))
361365

362366
if uRank is None:
363-
self.U = nn.Parameter(0.1 * torch.randn([hidden_size, hidden_size], self.device))
367+
self.U = nn.Parameter(0.1 * torch.randn([hidden_size, hidden_size], device=self.device))
364368
self.U1 = torch.empty(0)
365369
self.U2 = torch.empty(0)
366370
else:
367371
self.U = torch.empty(0)
368-
self.U1 = nn.Parameter(0.1 * torch.randn([uRank, hidden_size], self.device))
369-
self.U2 = nn.Parameter(0.1 * torch.randn([hidden_size, uRank], self.device))
372+
self.U1 = nn.Parameter(0.1 * torch.randn([uRank, hidden_size], device=self.device))
373+
self.U2 = nn.Parameter(0.1 * torch.randn([hidden_size, uRank], device=self.device))
370374

371375
self._gate_non_linearity = NON_LINEARITY[gate_nonlinearity]
372376

373-
self.bias_gate = nn.Parameter(torch.ones([1, hidden_size], self.device))
374-
self.bias_update = nn.Parameter(torch.ones([1, hidden_size], self.device))
375-
self.zeta = nn.Parameter(self._zetaInit * torch.ones([1, 1], self.device))
376-
self.nu = nn.Parameter(self._nuInit * torch.ones([1, 1], self.device))
377+
self.bias_gate = nn.Parameter(torch.ones([1, hidden_size], device=self.device))
378+
self.bias_update = nn.Parameter(torch.ones([1, hidden_size], device=self.device))
379+
self.zeta = nn.Parameter(self._zetaInit * torch.ones([1, 1], device=self.device))
380+
self.nu = nn.Parameter(self._nuInit * torch.ones([1, 1], device=self.device))
377381

378382
@property
379383
def name(self):

pytorch/requirements-cpu.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,6 @@ numpy==1.16.4
44
pandas==0.23.4
55
scikit-learn==0.21.2
66
scipy==1.3.0
7-
torch
8-
torchvision
7+
torch==1.4.0
8+
torchvision==0.5.0
99
requests

pytorch/requirements-gpu.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,6 @@ numpy==1.16.4
44
pandas==0.23.4
55
scikit-learn==0.21.2
66
scipy==1.3.0
7-
torch
8-
torchvision
7+
torch==1.4.0
8+
torchvision==0.5.0
99
requests

pytorch/setup.py

Lines changed: 0 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,5 @@
11
import setuptools #enables develop
22
import os
3-
from torch.utils.cpp_extension import BuildExtension, CUDAExtension
4-
from edgeml_pytorch.utils import findCUDA
5-
6-
if findCUDA() is not None:
7-
setuptools.setup(
8-
name='fastgrnn_cuda',
9-
ext_modules=[
10-
CUDAExtension('fastgrnn_cuda', [
11-
'edgeml_pytorch/cuda/fastgrnn_cuda.cpp',
12-
'edgeml_pytorch/cuda/fastgrnn_cuda_kernel.cu',
13-
]),
14-
],
15-
cmdclass={
16-
'build_ext': BuildExtension
17-
}
18-
)
193

204
setuptools.setup(
215
name='edgeml',

0 commit comments

Comments
 (0)