Skip to content

Commit 3f9cdc6

Browse files
Merge pull request #772 from shashank3959/notebook-update
Update DLE READMEs
2 parents a08c306 + 1f02ed1 commit 3f9cdc6

File tree

52 files changed

+1146
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+1146
-0
lines changed

PyTorch/Classification/README.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Image Classification
2+
3+
Image classification is the task of categorizing an image into one of several predefined classes, often also giving a probability of the input belonging to a certain class. This task is crucial in understanding and analyzing images, and it comes quite effortlessly to human beings with our complex visual systems. Most powerful image classification models today are built using some form of Convolution Neural Networks (CNNs), which are also the backbone of many other tasks in Computer Vision.
4+
5+
![What is Image Classification?](img/1_image-classification-figure-1.PNG)
6+
7+
[Source](https://github.com/NVlabs/stylegan)
8+
9+
In this overview, we will cover
10+
- Types of image Classification
11+
- How does it work?
12+
- How is the performance evaluated?
13+
- Use cases and applications
14+
- Where to get started
15+
16+
---
17+
## Types of image Classification
18+
Image Classification can be broadly divided into either Binary or Multi-class problems depending on the number of categories. Binary image classification problems entail predicting one of two classes. An example of this would be to predict whether an image is that of a dog or not. A subtly different problem is that of single-class (one vs all) classification, where the goal is to recognize data from one class and reject all other. This is beneficial when there is an overabundance of data from one of the classes, also called a class imbalance.
19+
20+
![Input and Outputs for Image Classification](img/1_image-classification-figure-2.PNG)
21+
22+
In Multi-class classification problems, models categorize instances into one of three or more categories. Multi-class models often also return confidence scores (or probabilities) of an image belonging to each of the possible classes. This should not be confused with multi-label classification, where a model assigns multiple labels to an instance.
23+
24+
---
25+
## How does it work?
26+
In recent years, Convolutional Neural Networks (CNNs) have led the way to massive breakthroughs in Computer Vision. Most state-of-the-art Image Classification models today employ CNNs in some form. Convolutional Layers are the building blocks of CNNs, and similar to Neural Networks they are composed of neurons that learn parameters like weights and biases. Most CNNs are composed of many Convolutional layers that work like feature extractors, and coupled with Fully Connected (FC) layers they learn to identify patterns in images to return confidence scores in different categories.
27+
28+
But what makes Convolutional Networks special? Well, CNNs are built with the assumption that input is in the form of images, and exploiting this fact they can be vastly more efficient than a standard Neural Network for a given level of performance.
29+
30+
![Typical CNN architecture](img/1_image-classification-figure-3.PNG)
31+
32+
Network depth (number of layers) and the number of learnable parameters have been found to be of crucial importance in performance. Top models can typically have over a hundred layers and hundreds of millions of parameters. Much of recent research in visual recognition has been focused around “network engineering”, i.e. designing better architectures, even employing Machine Learning algorithms to search for one, such as in the case of Neural Architecture Search.
33+
34+
---
35+
## How is the performance evaluated?
36+
Image Classification performance is often reported as Top-1 or Top-5 scores. In top-1 score, classification is considered correct if the top predicted class (with the highest predicted probability) matches the true class for a given instance. In top-5, we check if one of the top 5 predictions matches the true class. The score is just the number of correct predictions divided by the total number of instances evaluated.
37+
38+
---
39+
## Use cases and applications
40+
### Categorizing Images in Large Visual Databases
41+
Businesses with visual databases may accumulate large amounts of images with missing tags or meta-data. Unless there is an effective way to organize such images, they may not be much use at all. On the contrary, they may hog precious storage space. Automated image classification algorithms can classify such untagged images into predefined categories. Businesses can avoid expensive manual labor by employing automated image classification algorithms.
42+
43+
A related task is that of Image Organization in smart devices like mobile phones. With Image Classification techniques, images and videos can be organized for improved accessibility.
44+
45+
### Visual Search
46+
Visual Search or Image-based search has risen to popularity over the recent years. Many prominent search engines already provide this feature where users can search for visual content similar to a provided image. This has many applications in the e-commerce and retail industry where users can take a snap and upload an image of a product they are interested in purchasing. This makes the shopping experience much more efficient for customers, and can increase sales for businesses.
47+
48+
49+
### Healthcare
50+
Medical Imaging is about creating visual images of internal body parts for clinical purposes. This includes health monitoring, medical diagnosis, treatment, and keeping organized records. Image Classification algorithms can play a crucial role in Medical Imaging by assisting medical professionals detect presence of illness and having consistency in clinical diagnosis.
51+
52+
---
53+
## Where to get started?
54+
In this Collection, you will find state-of-the-art implementations of Image Classification models and their containers. A good place to get started with Image Classification is with the [ResNet-50](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Classification/ConvNets/resnet50v1.5) model.
55+
56+
ResNets (Residual Networks) are very popular Convolutional Neural Network architectures built with blocks utilizing skip connections to jump over some layers. As the name suggests, ResNet-50 is a variant that is 50 layers deep! But why do we need these “skip” connections? As it turns out building better CNN architectures is not as simple as stacking more and more layers. In practice, If we just keep adding depth to a CNN, at some point the performance stagnates or may start getting worse. Very deep networks are notoriously difficult to train, because of the vanishing gradient problem. In simpler terms, as the depth increases, repeated multiplications during back-propagation may end up making the gradient vanishingly small. This may prevent weights from changing. In ResNets, the skip connects are meant to act like a “gradient superhighway” allowing the gradient to flow unrestrained thus alleviating the problem of the vanishing gradients. ResNets were very influential in the development of subsequent Convolutional Network architectures, and there is much more to them than the brief summary above!
294 KB
Loading
112 KB
Loading
105 KB
Loading

PyTorch/Detection/README.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Object Detection
2+
3+
A natural progression from image classification would be classification and localization of the subject of the image. We can take this idea one step further and localize objects in a given image. Simply put, object detection refers to identifying which object(s) are there in an image.
4+
5+
![](img/2_object-detection-figure-1.png)
6+
7+
Source: [Joseph Redmon, Ali Farhadi, “YOLO9000:Better, Faster, Stronger”](https://arxiv.org/abs/1612.08242)
8+
9+
## Introduction to Object Detection
10+
In this section we will try to answer the following questions:
11+
- What is object detection?
12+
- Why is object detection important?
13+
14+
Object Detection is about not only detecting the presence and location of objects in images and videos, but also categorizing them into everyday objects. Oftentimes, there is a confusion between Image Classification and Object Detection. Simply put, the difference between them is the same as the difference between saying “This is a cat” and pointing to a cat and saying “There is the cat”.
15+
16+
To build autonomous systems, perception is the main challenge to be solved. Perception, in terms of autonomous systems refers to the ability of understanding the surroundings of the autonomous agent. This means that the agent needs to be able to figure out where and what objects are in its immediate vicinity.
17+
18+
Object detection can help keep humans away from toxic environments and hazardous situations. Challenges like garbage segregation, oil rig monitoring, nightly surveillance, cargo port maintenance and other high risk applications can be aided by robots/cameras which can detect objects. Essentially, any environment that requires visual inspection or analysis and is too dangerous for humans, object detection pipelines can be used to shield from any onsite hazard.
19+
20+
21+
## How does it work?
22+
While this has been a topic of research since before Deep Learning became mainstream, the best performing models today use one or more Deep Neural Networks.
23+
24+
Many architectures have networks pretrained on a different, simpler task, like Image Classification. As one can imagine, the inputs to this task can be images or videos, and the outputs are usually a set of bounding box coordinates that enclose each of the detected objects, as well as a class label for each detected object. With advances in research and the use of GPUs, it is possible to have object detection in real time with really impressive accuracies!
25+
26+
![](img/2_object-detection-figure-2.png)
27+
28+
Source: [Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg, “SSD: Single Shot MultiBox Detector”](https://arxiv.org/abs/1512.02325)
29+
30+
Single Shot Detector(SSD) is one of the state-of-the-art models for object detection and localization. It is based on a feed-forward convolutional neural network which always yields a fixed set of bounding boxes and a confidence score which represents how confident the network is about the bounding box containing an object. This is followed by a non maximum suppression step which outputs the final detections.
31+
32+
This network can be understood as two networks stacked on top of each other. The first network is a simple convolutional neural network which “extracts important features” which is the same as the image classification networks.
33+
34+
The second network is a multiscale feature map network built using another set of convolutional layers which are progressively smaller in size to allow detections on multiple scales. Simply put, the progressively smaller layers help detect objects of different sizes. Each layer in this set of layers outputs a number of detections and the final layer passes the output to a non maxima suppression which yields a final set of detections.
35+
36+
This Collection contains models and containers for object detection achieving state-of-the-art accuracies, tested and maintained by Nvidia.
37+
38+
39+
## Applications and Use cases
40+
41+
### Autonomous Vehicles
42+
Autonomous vehicles need to perceive and interact with real world objects in order to blend in with the environment. For instance a self-driving car needs to detect other vehicles, pedestrians, objects on the road, traffic signals and any and all obstacles on road and also understand the exact location of these objects. This perception information helps the agent avoid obstacles and understand how to interact with objects like traffic lights.
43+
44+
### Warehouses
45+
Warehouses have many conveyor belts and segregation platforms. These tasks have traditionally been handled manually. As factories and warehouses scale, manually sorting and managing inventory cannot be scaled proportionally. Object detection pipelines deployed on robots can reduce operational friction and enable easy scale up solutions for businesses.
46+
47+
### Surveillance
48+
Surveillance systems typically accumulate large volumes of video data which needs to be analyzed for all sorts of anomalies. Given the number of video sources even a small store has, analysing surveillance data from a large operation is a challenge. Object detection networks can help automate much of the pipeline to highlight sections where there is an object of interest. It can also be trained to identify anomalies in video streams.
49+
50+
### Hazardous tasks
51+
Humans work at waste processing plants, nuclear power plants, oil rigs and around heavy machinery, which tend to be extremely hazardous and dangerous which pose health risks. These tasks essentially require human presence for visual tasks and confirmations which revolve around recognizing objects and relaying locations of objects. Risky tasks like these can be completed with a help of a object detection pipeline deployed on a camera or a robot which can reduce operational risks and costs.
593 KB
Loading
66.3 KB
Loading

PyTorch/LanguageModeling/README.md

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# Language Modeling
2+
3+
4+
Language modeling (LM) is a natural language processing (NLP) task that determines the probability of a given sequence of words occurring in a sentence.
5+
6+
In an era where computers, smartphones and other electronic devices increasingly need to interact with humans, language modeling has become an indispensable technique for teaching devices how to communicate in natural languages in human-like ways.
7+
8+
But how does language modeling work? And what can you build with it? What are the different approaches, what are its potential benefits and limitations, and how might you use it in your business?
9+
10+
In this guide, you’ll find answers to all of those questions and more. Whether you’re an experienced machine learning engineer considering implementation, a developer wanting to learn more, or a product manager looking to explore what’s possible with natural language processing and language modeling, this guide is for you.
11+
12+
Here’s a look at what we’ll cover:
13+
14+
- Language modeling – the basics
15+
- How does language modeling work?
16+
- Use cases and applications
17+
- Getting started
18+
19+
20+
## Language modeling – the basics
21+
22+
### What is language modeling?
23+
24+
"*Language modeling is the task of assigning a probability to sentences in a language. []
25+
Besides assigning a probability to each sequence of words, the language models also assign a
26+
probability for the likelihood of a given word (or a sequence of words) to follow a sequence
27+
of words.*" Source: Page 105, [Neural Network Methods in Natural Language Processing](http://amzn.to/2wt1nzv), 2017.
28+
29+
30+
### Types of language models
31+
32+
There are primarily two types of Language Models:
33+
34+
- Statistical Language Models: These models use traditional statistical techniques like N-grams, Hidden Markov Models (HMM), and certain linguistic rules to learn the probability distribution of words.
35+
- Neural Language Models: They use different kinds of Neural Networks to model language, and have surpassed the statistical language models in their effectiveness.
36+
37+
"*We provide ample empirical evidence to suggest that connectionist language models are
38+
superior to standard n-gram techniques, except their high computational (training)
39+
complexity.*" Source: [Recurrent neural network based language model](http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf), 2010.
40+
41+
Given the superior performance of neural language models, we include in the container two popular state-of-the-art neural language models: BERT and Transformer-XL.
42+
43+
### Why is language modeling important?
44+
45+
Language modeling is fundamental in modern NLP applications. It enables machines to understand qualitative information, and enables people to communicate with machines in the natural languages that humans use to communicate with each other.
46+
47+
Language modeling is used directly in a variety of industries, including tech, finance, healthcare, transportation, legal, military, government, and more -- actually, you probably have just interacted with a language model today, whether it be through Google search, engaging with a voice assistant, or using text autocomplete features.
48+
49+
50+
## How does language modeling work?
51+
52+
The roots of modern language modeling can be traced back to 1948, when Claude Shannon
53+
published a paper titled "A Mathematical Theory of Communication", laying the foundation for information theory and language modeling. In the paper, Shannon detailed the use of a stochastic model called the Markov chain to create a statistical model for the sequences of letters in English text. The Markov models, along with n-gram, are still among the most popular statistical language models today.
54+
55+
However, simple statistical language models have serious drawbacks in scalability and fluency because of its sparse representation of language. Overcoming the problem by representing language units (eg. words, characters) as a non-linear, distributed combination of weights in continuous space, neural language models can learn to approximate words without being misled by rare or unknown values.
56+
57+
Therefore, as mentioned above, we introduce two popular state-of-the-art neural language models, BERT and Transformer-XL, in Tensorflow and PyTorch. More details can be found in the [NVIDIA Deep Learning Examples Github Repository ](https://github.com/NVIDIA/DeepLearningExamples)
58+
59+
60+
## Use cases and applications
61+
62+
### Speech Recognition
63+
64+
Imagine speaking a phrase to the phone, expecting it to convert the speech to text. How does
65+
it know if you said "recognize speech" or "wreck a nice beach"? Language models help figure it out
66+
based on the context, enabling machines to process and make sense of speech audio.
67+
68+
69+
### Spelling Correction
70+
71+
Language-models-enabled spellcheckers can point to spelling errors and possibly suggest alternatives.
72+
73+
74+
### Machine translation
75+
76+
Imagine you are translating the Chinese sentence "我在开车" into English. Your translation system gives you several choices:
77+
78+
- I at open car
79+
- me at open car
80+
- I at drive
81+
- me at drive
82+
- I am driving
83+
- me am driving
84+
85+
A language model tells you which translation sounds the most natural.
86+
87+
## Getting started
88+
89+
90+
NVIDIA provides examples for Language Modeling on [Deep Learning Examples Github Repository](https://github.com/NVIDIA/DeepLearningExamples). These examples provide you with easy to consume and highly optimized scripts for both training and inferencing. The quick start guide at our GitHub repository will help you in setting up the environment using NGC Docker Images, download pre-trained models from NGC and adapt the model training and inference for your application/use-case.
91+
92+
These models are tested and maintained by NVIDIA, leveraging mixed precision using tensor cores on our latest GPUs for faster training times while maintaining accuracy.
93+

0 commit comments

Comments
 (0)