Skip to content

Commit c52e126

Browse files
committed
additional comments
1 parent ba2cb92 commit c52e126

File tree

2 files changed

+43
-30
lines changed

2 files changed

+43
-30
lines changed

mkdocs/paper_JOSS/paper.bib

Lines changed: 33 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -326,20 +326,24 @@ @manual{pytorch_widedeep_examples
326326
note="[Online]. Available at: https://github.com/jrzaurin/pytorch-widedeep/tree/master/examples"
327327
}
328328

329-
@manual{torchvision_models,
330-
title={torchvision models},
331-
note="[Online]. Available at: https://pytorch.org/vision/stable/models.html"
329+
@software{torchvision2016,
330+
title = {TorchVision: PyTorch's Computer Vision library},
331+
author = {TorchVision maintainers and contributors},
332+
year = 2016,
333+
journal = {GitHub repository},
334+
publisher = {GitHub},
335+
howpublished = {\url{https://github.com/pytorch/vision}}
332336
}
333337

334-
@manual{torchvision_weight,
335-
title={torchvision multi weight support},
336-
note="[Online]. Available at: https://pytorch.org/blog/introducing-torchvision-new-multi-weight-support-api/"
338+
@software{torch_sample,
339+
title = {TorchSample: Lightweight pytorch functions for neural network featuremap sampling},
340+
author = {TorchSample maintainers and contributors},
341+
year = 2017,
342+
journal = {GitHub repository},
343+
publisher = {GitHub},
344+
howpublished = {\url{https://github.com/ncullen93/torchsample}}
337345
}
338346

339-
@manual{torch_sample,
340-
title={torch sample},
341-
note="[Online]. Available at: https://github.com/ncullen93/torchsample"
342-
}
343347

344348
@misc{chollet2015keras,
345349
title={Keras},
@@ -348,14 +352,25 @@ @misc{chollet2015keras
348352
howpublished={\url{https://keras.io}},
349353
}
350354
351-
@manual{fastai_tokenizer,
352-
title={fastai tokenizer},
353-
note="[Online]. Available at: https://docs.fast.ai/text.transform.html#BaseTokenizer.tokenizer"
354-
}
355-
356-
@manual{dl4cv,
357-
title={dl4cv},
358-
note="[Online]. Available at: https://www.pyimagesearch.com/deep-learning-computer-vision-python-book/"
355+
@Article{info11020108,
356+
AUTHOR = {Howard, Jeremy and Gugger, Sylvain},
357+
TITLE = {Fastai: A Layered API for Deep Learning},
358+
JOURNAL = {Information},
359+
VOLUME = {11},
360+
YEAR = {2020},
361+
NUMBER = {2},
362+
ARTICLE-NUMBER = {108},
363+
URL = {https://www.mdpi.com/2078-2489/11/2/108},
364+
ISSN = {2078-2489},
365+
ABSTRACT = {fastai is a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains, and provides researchers with low-level components that can be mixed and matched to build new approaches. It aims to do both things without substantial compromises in ease of use, flexibility, or performance. This is possible thanks to a carefully layered architecture, which expresses common underlying patterns of many deep learning and data processing techniques in terms of decoupled abstractions. These abstractions can be expressed concisely and clearly by leveraging the dynamism of the underlying Python language and the flexibility of the PyTorch library. fastai includes: a new type dispatch system for Python along with a semantic type hierarchy for tensors; a GPU-optimized computer vision library which can be extended in pure Python; an optimizer which refactors out the common functionality of modern optimizers into two basic pieces, allowing optimization algorithms to be implemented in 4–5 lines of code; a novel 2-way callback system that can access any part of the data, model, or optimizer and change it at any point during training; a new data block API; and much more. We used this library to successfully create a complete deep learning course, which we were able to write more quickly than using previous approaches, and the code was more clear. The library is already in wide use in research, industry, and teaching.},
366+
DOI = {10.3390/info11020108}
367+
}
368+
369+
@Book{adrian2017deep,
370+
title={Deep learning for computer vision with python},
371+
author={Adrian, Rosebrock},
372+
year={2017},
373+
publisher={PyImageSearch.com}
359374
}
360375

361376
@manual{pytorch_widedeep_slack,

mkdocs/paper_JOSS/paper.md

Lines changed: 10 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -38,22 +38,20 @@ With that in mind, we introduce `pytorch-widedeep`, a flexible package for multi
3838

3939
There is a small number of packages available to use DL for tabular data alone (e.g., pytorch-tabular [@joseph2021pytorch], pytorch-tabnet or autogluon-tabular [@erickson2020autogluon]) or that focus mainly on combining text and images (e.g., MMF [@singh2020mmf]). With that in mind, our goal is to provide a modular, flexible, and "_easy-to-use_" framework that allows the combination of a wide variety of models for all data types.
4040

41-
`pytorch-widedeep` is based on Google's Wide and Deep Algorithm [@cheng2016wide], hence its name. However, the library has evolved enormously since its origins, but we prefer to preserve the name for various reasons (the explanation is beyond this paper's scope). The original algorithm is heavily adjusted for multimodal datasets and intended to facilitate the combination of text and images with corresponding tabular data. As opposed to Google's _"Wide and Deep"_ and _"Deep and Cross"_[@wang2017deep] architecture implementations in Keras/Tensorflow, we use the wide/cross and deep model design as an initial building block of PyTorch deep learning models to provide the basis for a plethora of state-of-the-art models and architecture implementations that can be seamlessly assembled with just a few lines of code. Additionally, the individual components do not necessarily have to be a part of the final architecture. The main components of those architectures are shown in \autoref{fig:widedeep_arch}.
41+
`pytorch-widedeep` is based on Google's Wide and Deep Algorithm [@cheng2016wide], hence its name. The original algorithm is heavily adjusted for multimodal datasets and intended to facilitate the combination of text and images with corresponding tabular data. As opposed to Google's _"Wide and Deep"_ and _"Deep and Cross"_ [@wang2017deep] architecture implementations in Keras/Tensorflow, we use the wide/cross and deep model design as an initial building block of PyTorch deep learning models to provide the basis for a plethora of state-of-the-art models and architecture implementations that can be seamlessly assembled with just a few lines of code. Additionally, the individual components do not necessarily have to be a part of the final architecture. The main components of those architectures are shown in \autoref{fig:widedeep_arch}.
4242

43-
![\label{fig:widedeep_arch}](figures/widedeep_arch.png)
43+
![Main components of the pytorch-widedeep architecture. The blue and green boxes in the figure represent the main data types and their corresponding model components, namely `wide`, `deeptabular`, `deeptext` and `deepimage`. The yellow boxes represent _so-called_ fully-connected (FC) heads, simply MLPs that one can optionally add on top of the main components. These are referred to in the figure as `TextHead` and `ImageHead`. The dashed-line rectangles indicate that the outputs from the components inside are concatenated if a final FC head (referred to as `DeepHead` in the figure) is used. The faded-green `deeptabular` box aims to indicate that the output of the deeptabular component will be concatenated directly with the output of the `deeptext` or `deepimage` components or with the FC heads if these are used. Finally, the arrows indicate the connections, which of course, depend on the final architecture that the user chooses to build. \label{fig:widedeep_arch}](figures/widedeep_arch.png)
4444

45-
The blue and green boxes in the figure represent the main data types and their corresponding model components, namely `wide`, `deeptabular`, `deeptext` and `deepimage`. The yellow boxes represent _so-called_ fully-connected (FC) heads, simply MLPs that one can optionally add on top of the main components. These are referred to in the figure as `TextHead` and `ImageHead`. The dashed-line rectangles indicate that the outputs from the components inside are concatenated if a final FC head (referred to as `DeepHead` in the figure) is used. The faded-green `deeptabular` box aims to indicate that the output of the deeptabular component will be concatenated directly with the output of the `deeptext` or `deepimage` components or with the FC heads if these are used. Finally, the arrows indicate the connections, which of course, depend on the final architecture that the user chooses to build.
45+
Following the notation of [@cheng2016wide], the expression for the architecture without a `deephead` component can be formulated as:
4646

47-
In math terms, and following the notation in the original paper [@cheng2016wide], the expression for the architecture without a `deephead` component can be formulated as:
4847

48+
$$pred = \sigma(W_{wide}^{T}[x,\phi(x)] + \sum_{i \in \mathcal{I}} W_{i}^{T}a_{i}^{l_f} + b)$$
4949

50-
$$pred = \sigma(W_{wide}^{T}[x,\phi(x)] + W_{deeptabular}^{T}a_{deeptabular}^{l_f} + W_{deeptext}^{T}a_{deeptext}^{l_f} + W_{deepimage}^{T}a_{deepimage}^{l_f} + b)$$
5150

51+
Where $\mathcal{I} = \{deeptabular, deeptext, deepimage \}$, $\sigma$ is the sigmoid function, $W$ are the weight matrices applied to the wide model and to the final activations of the deep models, $a$ are these final activations, $\phi(x)$ are the cross-product transformations of the original features $x$, and $b$ is the bias term.
5252

53-
Where $\sigma$ is the sigmoid function, $W$ are the weight matrices applied to the wide model and to the final activations of the deep models, $a$ are these final activations, $\phi(x)$ are the cross-product transformations of the original features $x$, and $b$ is the bias term.
5453

55-
56-
While if there is a `deephead` component, the previous expression turns into:
54+
If there is a `deephead` component, the previous expression turns into:
5755

5856

5957
$$pred = \sigma(W_{wide}^{T}[x,\phi(x)] + W_{deephead}^{T}a_{deephead}^{l_f} + b)$$
@@ -68,7 +66,7 @@ This section will briefly introduce the current model components available for e
6866

6967
## The `wide` component
7068

71-
This is a linear model for tabular data where the non-linearities are captured via cross-product transformations (see the description in the previous section). This is the simplest of all components, and we consider it very useful as a benchmark when used on its own.
69+
This is a linear model for tabular data where the non-linearities are captured via cross-product transformations. This is the simplest of all components, and we consider it very useful as a benchmark when used on its own.
7270

7371

7472
## The `deeptabular` component
@@ -77,7 +75,7 @@ Currently, `pytorch-widedeep` offers the following models for the so-called `dee
7775

7876
## The `deepimage` component
7977

80-
The image-related component is fully integrated with the newest version of torchvision [@torchvision_models] (0.13 at the time of writing). This version has Multi-Weight Support [@torchvision_weight]. Therefore, a variety of model variants are available to use with pre-trained weights obtained with different datasets. Currently, the model variants supported by `pytorch-widedeep` are (i) Resnet [@he2016deep], (ii) Shufflenet [@zhang2018shufflenet], (iii) Resnext [@xie2017aggregated], (iv) Wide Resnet [@zagoruyko2016wide], (v) Regnet [@xu2022regnet], (vi) Densenet [@huang2017densely], (vii) Mobilenet [@howard2017mobilenets], (viii) MNasnet [@tan2019mnasnet], (ix) Efficientnet [@tan2019efficientnet] and (x) Squeezenet [@iandola2016squeezenet].
78+
The image-related component is fully integrated with the newest version of torchvision [@torchvision2016] (0.13 at the time of writing). This version has Multi-Weight Support. Therefore, a variety of model variants are available to use with pre-trained weights obtained with different datasets. Currently, the model variants supported by `pytorch-widedeep` are (i) Resnet [@he2016deep], (ii) Shufflenet [@zhang2018shufflenet], (iii) Resnext [@xie2017aggregated], (iv) Wide Resnet [@zagoruyko2016wide], (v) Regnet [@xu2022regnet], (vi) Densenet [@huang2017densely], (vii) Mobilenet [@howard2017mobilenets], (viii) MNasnet [@tan2019mnasnet], (ix) Efficientnet [@tan2019efficientnet] and (x) Squeezenet [@iandola2016squeezenet].
8179

8280
## The `deeptext` component
8381

@@ -96,8 +94,8 @@ Training single or multi-mode models in `pytorch-widedeep` is handled by the dif
9694
We acknowledge the work of other researchers, engineers, and programmers from the following projects and libraries:
9795

9896
* the `Callbacks` and `Initializers` structure and code is inspired by the torchsample library [@torch_sample], which in itself partially inspired by Keras [@chollet2015keras]
99-
* the `TextProcessor` class in this library uses the fastai [@fastai_tokenizer] `Tokenizer` and `Vocab`; the code at `utils.fastai_transforms` is a minor adaptation of their code, so it functions within this library; to our experience, their `Tokenizer` is the best in class
100-
* the `ImageProcessor` class in this library uses code from the fantastic Deep Learning for Computer Vision (DL4CV) [@dl4cv] book by Adrian Rosebrock
97+
* the `TextProcessor` class in this library uses the fastai [@info11020108] `Tokenizer` and `Vocab`; the code at `utils.fastai_transforms` is a minor adaptation of their code, so it functions within this library; to our experience, their `Tokenizer` is the best in class
98+
* the `ImageProcessor` class in this library uses code from the fantastic Deep Learning for Computer Vision (DL4CV) [@adrian2017deep] book by Adrian Rosebrock
10199
* we adjusted and integrated ideas of Label and Feature Distribution Smoothing [@yang2021delving]
102100
* we adjusted and integrated ZILNloss code written in Tensorflow/Keras [@wang2019deep]
103101

0 commit comments

Comments
 (0)