additional comments

5uperpalo · 5uperpalo · commit c52e126c2310 · 2023-04-25T22:37:34.000+02:00
diff --git a/mkdocs/paper_JOSS/paper.bib b/mkdocs/paper_JOSS/paper.bib
@@ -326,20 +326,24 @@ @manual{pytorch_widedeep_examples
   note="[Online]. Available at: https://github.com/jrzaurin/pytorch-widedeep/tree/master/examples"
 }
 
-@manual{torchvision_models,
-  title={torchvision models},
-  note="[Online]. Available at: https://pytorch.org/vision/stable/models.html"
+@software{torchvision2016,
+    title        = {TorchVision: PyTorch's Computer Vision library},
+    author       = {TorchVision maintainers and contributors},
+    year         = 2016,
+    journal      = {GitHub repository},
+    publisher    = {GitHub},
+    howpublished = {\url{https://github.com/pytorch/vision}}
 }
 
-@manual{torchvision_weight,
-  title={torchvision multi weight support},
-  note="[Online]. Available at: https://pytorch.org/blog/introducing-torchvision-new-multi-weight-support-api/"
+@software{torch_sample,
+    title        = {TorchSample: Lightweight pytorch functions for neural network featuremap sampling},
+    author       = {TorchSample maintainers and contributors},
+    year         = 2017,
+    journal      = {GitHub repository},
+    publisher    = {GitHub},
+    howpublished = {\url{https://github.com/ncullen93/torchsample}}
 }
 
-@manual{torch_sample,
-  title={torch sample},
-  note="[Online]. Available at: https://github.com/ncullen93/torchsample"
-}
 
 @misc{chollet2015keras,
   title={Keras},
@@ -348,14 +352,25 @@ @misc{chollet2015keras
   howpublished={\url{https://keras.io}},
 } 
 
-@manual{fastai_tokenizer,
-  title={fastai tokenizer},
-  note="[Online]. Available at: https://docs.fast.ai/text.transform.html#BaseTokenizer.tokenizer"
-}
-
-@manual{dl4cv,
-  title={dl4cv},
-  note="[Online]. Available at: https://www.pyimagesearch.com/deep-learning-computer-vision-python-book/"
+@Article{info11020108,
+  AUTHOR = {Howard, Jeremy and Gugger, Sylvain},
+  TITLE = {Fastai: A Layered API for Deep Learning},
+  JOURNAL = {Information},
+  VOLUME = {11},
+  YEAR = {2020},
+  NUMBER = {2},
+  ARTICLE-NUMBER = {108},
+  URL = {https://www.mdpi.com/2078-2489/11/2/108},
+  ISSN = {2078-2489},
+  ABSTRACT = {fastai is a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains, and provides researchers with low-level components that can be mixed and matched to build new approaches. It aims to do both things without substantial compromises in ease of use, flexibility, or performance. This is possible thanks to a carefully layered architecture, which expresses common underlying patterns of many deep learning and data processing techniques in terms of decoupled abstractions. These abstractions can be expressed concisely and clearly by leveraging the dynamism of the underlying Python language and the flexibility of the PyTorch library. fastai includes: a new type dispatch system for Python along with a semantic type hierarchy for tensors; a GPU-optimized computer vision library which can be extended in pure Python; an optimizer which refactors out the common functionality of modern optimizers into two basic pieces, allowing optimization algorithms to be implemented in 4&ndash;5 lines of code; a novel 2-way callback system that can access any part of the data, model, or optimizer and change it at any point during training; a new data block API; and much more. We used this library to successfully create a complete deep learning course, which we were able to write more quickly than using previous approaches, and the code was more clear. The library is already in wide use in research, industry, and teaching.},
+  DOI = {10.3390/info11020108}
+}
+
+@Book{adrian2017deep,
+  title={Deep learning for computer vision with python},
+  author={Adrian, Rosebrock},
+  year={2017},
+  publisher={PyImageSearch.com}
 }
 
 @manual{pytorch_widedeep_slack,
diff --git a/mkdocs/paper_JOSS/paper.md b/mkdocs/paper_JOSS/paper.md
@@ -38,22 +38,20 @@ With that in mind, we introduce `pytorch-widedeep`, a flexible package for multi
 
 There is a small number of packages available to use DL for tabular data alone (e.g., pytorch-tabular [@joseph2021pytorch], pytorch-tabnet or autogluon-tabular [@erickson2020autogluon]) or that focus mainly on combining text and images (e.g., MMF [@singh2020mmf]). With that in mind, our goal is to provide a modular, flexible, and "_easy-to-use_" framework that allows the combination of a wide variety of models for all data types.
 
-`pytorch-widedeep` is based on Google's Wide and Deep Algorithm [@cheng2016wide], hence its name. However, the library has evolved enormously since its origins, but we prefer to preserve the name for various reasons (the explanation is beyond this paper's scope). The original algorithm is heavily adjusted for multimodal datasets and intended to facilitate the combination of text and images with corresponding tabular data. As opposed to Google's _"Wide and Deep"_ and _"Deep and Cross"_[@wang2017deep] architecture implementations in Keras/Tensorflow, we use the wide/cross and deep model design as an initial building block of PyTorch deep learning models to provide the basis for a plethora of state-of-the-art models and architecture implementations that can be seamlessly assembled with just a few lines of code. Additionally, the individual components do not necessarily have to be a part of the final architecture. The main components of those architectures are shown in \autoref{fig:widedeep_arch}.
+`pytorch-widedeep` is based on Google's Wide and Deep Algorithm [@cheng2016wide], hence its name. The original algorithm is heavily adjusted for multimodal datasets and intended to facilitate the combination of text and images with corresponding tabular data. As opposed to Google's _"Wide and Deep"_ and _"Deep and Cross"_ [@wang2017deep] architecture implementations in Keras/Tensorflow, we use the wide/cross and deep model design as an initial building block of PyTorch deep learning models to provide the basis for a plethora of state-of-the-art models and architecture implementations that can be seamlessly assembled with just a few lines of code. Additionally, the individual components do not necessarily have to be a part of the final architecture. The main components of those architectures are shown in \autoref{fig:widedeep_arch}.
 
-![\label{fig:widedeep_arch}](figures/widedeep_arch.png)
+![Main components of the pytorch-widedeep architecture. The blue and green boxes in the figure represent the main data types and their corresponding model components, namely `wide`, `deeptabular`, `deeptext` and `deepimage`. The yellow boxes represent _so-called_ fully-connected (FC) heads, simply MLPs that one can optionally add on top of the main components. These are referred to in the figure as `TextHead` and `ImageHead`. The dashed-line rectangles indicate that the outputs from the components inside are concatenated if a final FC head (referred to as `DeepHead` in the figure) is used. The faded-green `deeptabular` box aims to indicate that the output of the deeptabular component will be concatenated directly with the output of the `deeptext` or `deepimage` components or with the FC heads if these are used. Finally, the arrows indicate the connections, which of course, depend on the final architecture that the user chooses to build. \label{fig:widedeep_arch}](figures/widedeep_arch.png)
 
-The blue and green boxes in the figure represent the main data types and their corresponding model components, namely `wide`, `deeptabular`, `deeptext` and `deepimage`. The yellow boxes represent _so-called_ fully-connected (FC) heads, simply MLPs that one can optionally add on top of the main components. These are referred to in the figure as `TextHead` and `ImageHead`. The dashed-line rectangles indicate that the outputs from the components inside are concatenated if a final FC head (referred to as `DeepHead` in the figure) is used. The faded-green `deeptabular` box aims to indicate that the output of the deeptabular component will be concatenated directly with the output of the `deeptext` or `deepimage` components or with the FC heads if these are used. Finally, the arrows indicate the connections, which of course, depend on the final architecture that the user chooses to build.
+Following the notation of [@cheng2016wide], the expression for the architecture without a `deephead` component can be formulated as:
 
-In math terms, and following the notation in the original paper [@cheng2016wide], the expression for the architecture without a `deephead` component can be formulated as:
 
+$$pred = \sigma(W_{wide}^{T}[x,\phi(x)] + \sum_{i \in \mathcal{I}} W_{i}^{T}a_{i}^{l_f} + b)$$
 
-$$pred = \sigma(W_{wide}^{T}[x,\phi(x)] + W_{deeptabular}^{T}a_{deeptabular}^{l_f} + W_{deeptext}^{T}a_{deeptext}^{l_f} + W_{deepimage}^{T}a_{deepimage}^{l_f} + b)$$
 
+Where $\mathcal{I} = \{deeptabular, deeptext, deepimage \}$, $\sigma$ is the sigmoid function, $W$ are the weight matrices applied to the wide model and to the final activations of the deep models, $a$ are these final activations, $\phi(x)$ are the cross-product transformations of the original features $x$, and $b$ is the bias term.
 
-Where $\sigma$ is the sigmoid function, $W$ are the weight matrices applied to the wide model and to the final activations of the deep models, $a$ are these final activations, $\phi(x)$ are the cross-product transformations of the original features $x$, and $b$ is the bias term.
 
-
-While if there is a `deephead` component, the previous expression turns into:
+If there is a `deephead` component, the previous expression turns into:
 
 
 $$pred = \sigma(W_{wide}^{T}[x,\phi(x)] + W_{deephead}^{T}a_{deephead}^{l_f} + b)$$
@@ -68,7 +66,7 @@ This section will briefly introduce the current model components available for e
 
 ## The `wide` component
 
-This is a linear model for tabular data where the non-linearities are captured via cross-product transformations (see the description in the previous section). This is the simplest of all components, and we consider it very useful as a benchmark when used on its own.
+This is a linear model for tabular data where the non-linearities are captured via cross-product transformations. This is the simplest of all components, and we consider it very useful as a benchmark when used on its own.
 
 
 ## The `deeptabular` component
@@ -77,7 +75,7 @@ Currently, `pytorch-widedeep` offers the following models for the so-called `dee
 
 ## The `deepimage` component
 
-The image-related component is fully integrated with the newest version of torchvision [@torchvision_models] (0.13 at the time of writing). This version has Multi-Weight Support [@torchvision_weight]. Therefore, a variety of model variants are available to use with pre-trained weights obtained with different datasets. Currently, the model variants supported by `pytorch-widedeep` are (i) Resnet [@he2016deep], (ii) Shufflenet [@zhang2018shufflenet], (iii) Resnext [@xie2017aggregated], (iv) Wide Resnet [@zagoruyko2016wide], (v) Regnet [@xu2022regnet], (vi) Densenet [@huang2017densely], (vii) Mobilenet [@howard2017mobilenets], (viii) MNasnet [@tan2019mnasnet], (ix) Efficientnet [@tan2019efficientnet] and (x) Squeezenet [@iandola2016squeezenet].
+The image-related component is fully integrated with the newest version of torchvision [@torchvision2016] (0.13 at the time of writing). This version has Multi-Weight Support. Therefore, a variety of model variants are available to use with pre-trained weights obtained with different datasets. Currently, the model variants supported by `pytorch-widedeep` are (i) Resnet [@he2016deep], (ii) Shufflenet [@zhang2018shufflenet], (iii) Resnext [@xie2017aggregated], (iv) Wide Resnet [@zagoruyko2016wide], (v) Regnet [@xu2022regnet], (vi) Densenet [@huang2017densely], (vii) Mobilenet [@howard2017mobilenets], (viii) MNasnet [@tan2019mnasnet], (ix) Efficientnet [@tan2019efficientnet] and (x) Squeezenet [@iandola2016squeezenet].
 
 ## The `deeptext` component
 
@@ -96,8 +94,8 @@ Training single or multi-mode models in `pytorch-widedeep` is handled by the dif
 We acknowledge the work of other researchers, engineers, and programmers from the following projects and libraries:
 
 * the `Callbacks` and `Initializers` structure and code is inspired by the torchsample library [@torch_sample], which in itself partially inspired by Keras [@chollet2015keras]
-* the `TextProcessor` class in this library uses the fastai [@fastai_tokenizer] `Tokenizer` and `Vocab`; the code at `utils.fastai_transforms` is a minor adaptation of their code, so it functions within this library; to our experience, their `Tokenizer` is the best in class
-* the `ImageProcessor` class in this library uses code from the fantastic Deep Learning for Computer Vision (DL4CV) [@dl4cv] book by Adrian Rosebrock
+* the `TextProcessor` class in this library uses the fastai [@info11020108] `Tokenizer` and `Vocab`; the code at `utils.fastai_transforms` is a minor adaptation of their code, so it functions within this library; to our experience, their `Tokenizer` is the best in class
+* the `ImageProcessor` class in this library uses code from the fantastic Deep Learning for Computer Vision (DL4CV) [@adrian2017deep] book by Adrian Rosebrock
 * we adjusted and integrated ideas of Label and Feature Distribution Smoothing [@yang2021delving]
 * we adjusted and integrated ZILNloss code written in Tensorflow/Keras [@wang2019deep]