Transformer

This project aims to crack Attention is all you need paper and write it in PyTorch 🔥.
"Transformer" model is proposed in this paper and has revolutionized Seq2Seq task. It performs better than earlier state-of-the-art (SOTA) RNNs in terms of training time and accuracy.

The reasons for that are

Transformer considers all input Seq at the same time. There isn't Information Bottleneck issue as RNN that sequentially process the input.
The design of Data in / Data out and Causal Mask Attention makes the training process parallelizable.

Architecture of Transformer

.

Shortly, the model consists of two building blocks

Encoder : Encoder constructs embedded feature vector of the input sequence.
Decoder : Decoder combines embedded feature vector from encoder and previous outputs to predict next token (output)

You can read more details about Transformer's Architecture on my Blog soon!!

Test Data (Mock)

I've mock the data to train and validate Seq2Seq behavior of Transformer model that I've written.
For that, an output data has to be sequential-dependent on sequence of input data.

So, Reverse Sequence, Inverse Sum dataset is used as a mock data.
Here's an example for generating the data. If we have input as

x: [3, 9, 5, 6]

We reverse order first and do Inverse Sum. Let's say the number need to sum up to 17

reversed order  : [6, 5, 9, 3]
inverse sum     : [11, 12, 8, 14]

Finally we have will the data like this

x: [3, 9, 5, 6]
y: [11, 12, 8, 14]

In this experiment, I preserved token 1 to be a start token and 2 to be a stop token and 0 for padding. After adding start and stop token the data should look like this

x: [1, 3, 9, 5, 6, 2]
y: [1, 11, 12, 8, 14, 2]

After shifting and modifing data according to Attention is all you need, this is what the final data looks like.

Encoder input:  [1, 3, 9, 5, 6, 2]
Decoder input:  [1, 11, 12, 8, 14]
Decoder output: [11, 12, 8, 14, 2]

The Result

Our model can achieve 99.99% on this datasets by training just a few minutes on NVIDIA 1050 GPU !

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.spyproject/config		.spyproject/config
loss		loss
metric		metric
model		model
.gitignore		.gitignore
create_test_dataset.py		create_test_dataset.py
model_analysis.py		model_analysis.py
predict.py		predict.py
readme.md		readme.md
train.py		train.py
val.py		val.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer

Architecture of Transformer

Test Data (Mock)

The Result

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Transformer

Architecture of Transformer

Test Data (Mock)

The Result

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages