basics-docs/Artificial Intelligence Basics.txt at master · lwooden/basics-docs · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
Artificial Intelligence Basics


History of AI
-----------------
- the Logic Theorist is considered the first application of artificial intelligence, predating the term “artificial intelligence”
- the Logic Theorist was the brainchild of Herbert Simon and Allen Newell, with some contributions by Cliff Shaw. Simon and Newell were attempting to teach a computer to think.
- the Logic Theorist was able to produce better, more detailed mathematical proofs than contemporary mathematicians Alfred North Whitehead and Bertrand Russell
- the term “artificial intelligence” would not exist until the RAND Corporation hosted the Dartmouth Summer Research Project on Artificial Intelligence in 1956. Then, prominent researcher John McCarthy (the original author of Lisp) and computer scientist coined the term “artificial intelligence,” unveiling it at this conference.


Timeline
------------
1956 - the term "Artificial Intelligence" was coined by John McCarthy at the Dartmouth Summer Research Project
1997 - Garry Kasparov was defeated in chess by "Deep Blue"
2011 - Ken Jennings lost to IBM's "Watson" in Jeopardy
2014 - Generative Adversarial Networks (GANs) were invented
2015 - OpenAI was founded by Elon Musk and IIya Sutskever (former employee of Larry Page - Google)
2017 - "Transformers" were introduced which led to Large Language Models (LLMs) like GPT-3 being born
2019 - OpenAI released GPT-2
2020 - OpenAI released GPT-3
2023 - OpenAI released GPT-4


What is Generative AI?
--------------------------------
- Generative AI is a subset of artificial intelligence
- there are (3) main types of Generative AI

[ Generative Adversarial Networks - GANs ]
- use two neural networks: one called generator and one called the discriminator.
- The generator network generates fake data based off of the training data set. The discriminator tries to identify fake data.
- These networks are adversarial in nature as the generator attempts to create data that is indistinguishable from the real data and the discriminator attempts to discern if the data is real or fake

[ Variational Autoencoders - VANs ]
- use two networks are well: one for encoding and one for decoding. In one sense the encoding network simplifies the input by reducing the data into a lower-dimensional representation. The decoding network then maps this lower-dimensional representation back to the original data space. The whole point of this is to be able to generate new data through sampling

[ Transformer-Based Language Models ]
- learn the distribution of data
- examples are ChatGPT, OpenAI Codex


Core Large Language Model Architecture
-------------------------------------------------------
- LLMs are models that simply strive to generate text by leveraging probabilities and predefined rules

Recurring Neural Networks (RNNs) - a subset of neural language models that helped expand LLMs capability of handling long-range dependencies in sequences of text; only able to process words one at a time; this limits how well it can understand relationships between large text based datasets

Long Short-Term Memory (LSTM) Networks - advancement of RNNs that helped address some of the limitations of RNNs when it comes to remembering context across long sequences

Transformers - quantum leap in advancing LLMs; able to process entire corpuses in parallel using self-attention mechanisms to better understand words and how they relate to one another

Transformers have found numerous applications in various aspects of NLP, improving upon the performance of previous models:
  Language translation
  Text summarization
  Sentiment analysis
  Question answering

  [ Transformer Architecture ]

  1) Input Embedding - take raw input and convert each token (character, word, or symbol) into a high-dimensional vector; this representation provides key context around the semantic and syntactic essence of each token; words are translated into numerical arrays (vectors) which serve as digital representations of words or entire sentences. Within these arrays, the meaning and nuanced context of the text are encapsulated, creating a multi-dimensional space where the richness of language is encoded in numerical form (perfect for machine processing)

  // Dense Vectors (Lower Dimensional)
  the = [0.1, 0.2, 0.3]
  cat = [0.4, 0.5, 0.6]
  sat = [0.7, 0.8, 0.9]

  2) Positional Encoding - after converting raw input into high-dimensional vectors, we must enrich the vectors with positional information (where words are located in a given sequence); this is done by assigning a distinct tag or code to each word; the goal is to discern the sequential structure of words

  [ [ 0.1, 0.2, 0.3],  [0.4, 0.5, 0.6], [0.7, 0.8, 0.9] ]

  3) Encoder -

  4) Decoder -


  Inference Types
  --------------------
  Zero-Shot - send a request to a LLM without providing any context or examples; this is a good way to test the adaptability of a model to apply its learning to a task it has not been trained on

  One-Shot - send a request to a LLM while providing ONE example as context to the LLM to consider and generate output based on

  Few-Shot - send a request to a LLM while providing MULTIPLE examples as context to the LLM to consider and generate output based on


  Tuning Model Parameters
  -----------------------------------
  When working with foundational models, it is important to know that I can tailor the type of response that be returned based on the parameters that are configured

  [ Controlling Randomness and Creativity ]

  Temperature - controls the randomness in the prediction distribution; lower temp means more predictable responses; higher temp means greater degree of creativity and diversity in responses

  Top N - a strategy where the model selects from the smallest set of words whose cumulative probability exceeds the the threshold P; adds diversity to the models output while still remaining coherent

  Top K - sampling strategy that limits the model to the top K most likely tokens during generation, balancing creativity and reliability

  [ Controlling Length ]

  Max Tokens - max upper limit on the number of tokens the model can generate in a single prompt

  Stop Tokens - tokens used to determine when the model will stop generating text


What is a Modelfile?
---------------------------

The Ollama Modelfile is a configuration file that defines and manages models on the Ollama platform. Create new models or modify and adjust existing models through model files to cope with some special application scenarios. The Ollama Modelfile simplifies the process of managing and running LLMs locally

--------
ollama show llama2:latest --modelfile
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
# FROM llama2:latest

FROM C:\Users\Administrator\.ollama\models\blobs\sha256-8934d96d3f08982e95922b2b7a2c626a1fe873d7c3b06e8e56d7bc0a1fef9246

TEMPLATE """[INST] <<SYS>><</SYS>>

 [/INST]
"""
PARAMETER stop "[INST]"
PARAMETER stop "[/INST]"
PARAMETER stop "<<SYS>>"
PARAMETER stop "<</SYS>>"
----------

TEMPLATE - defines the prompt template that will be sent to the language model.
/INST - special delimiter used by the model


What are the different Model formats?
--------------------------------------------------

[ .GGUF ]
- used to store models for inference with GGML and other libraries that depend on it, like the very popular llama.cpp or whisper.cpp.
- file format supported by the Hugging Face Hub with features allowing for quick inspection of tensors and metadata within the file.
- designed as a “single-file-format” where a single file usually contains both the configuration attributes, the tokenizer vocabulary and other attributes, as well as all tensors to be loaded in the model
- these files come in different formats according to the quantization type of the file


What is quantization?
---------------------------
- the process of representing a large set of continuous values with a smaller set of discreet values
- the purpose of quantization is often to reduce complexity, save memory, or decrease computation requirements, typically at the cost of some loss in precision.
- this involves "packing down" a model into a manageable size so that it can be transferred and ran efficiently


What is RAG?
-------------------
- Retrieval Augmented Generation
- a method used to train LLMs on privately hosted or externally hosted data
- best used for creating customer service chat bots, knowledge management use cases, and search engines
- parametric vs non-parametric memory


[ The Three Stages of RAG ]

Indexing
  Retrieval
     Generation

Indexing Pipeline
  Data Loading
  Text Splitting or Chunking
  Convert Text to Embeddings
  Storing Embeddings in Vector Databases

Retrieval Pipeline
  Semantic Search against Vector Database

Generation Pipeline
  Templating a Prompt Holding Retrieved Context

Evaluation & Monitoring


Embeddings - vector representations of words or sentences

Vectors -


Content Generation

Agentic Workflow

Tools


Serving an LLM Locally
-------------------------------
- install Ollama
- Ollama simplifies the process of downloading, installing, interacting, and serving of a wide range of LLM's locally


Reputable Small Models
--------------------------------
llama 3.2 (1B Parameters) - 2.0 GB


Open Source Vector Databases
------------------------------------------
- Marqo
- Facebook (FAISS)
- LanceDb