Following are some text processing you must think of doing, it is not necessary to do all these, it depends on the nlp task what text processing you want to do before doing that task
- String Manipulation using Regex
- Tokenization
- Stemming & Lemmatization
- Removing Stopwords
We need to represent language mathematically i.e. given a corpus you need to convert this corpus into its numerical form. This mathematical representation is called an embedding/context and the process is called representation learning. Why do this?? Because computers understand only numbers and not texts. We can do this in several ways:
- Via Sentence Embedding
- Via Word Embedding
- Via Character Embedding
- Via Subword Embedding (everyone uses this)
- With foundation models that are able to do multiple tasks, you just need to do prompting to solve a single downstream task problem.
- But many times prompting does not work well, this is called HALLUCINATION PROBLEM. The model would sometimes give wrong answers to prompted questions (incases where such a task was not trained during the training of multitask foundation model)
- To solve this hallucination probelm you can finetune the foundation models for specific tasks. More about this here
- Now above we saw how to finetune a foundational LLM model for different downstream tasks using SFT but for all those tasks we can also finetune a foundational LLM model using RL.
- It has been proved previously that its better to first finetune LLM on any task using SFT and then finetune on the same task using RL, it gives better outcomes.
- Now once OpenAI made ChatGPT they found that if asked about some harmful activities like ‘tell me techniques to make rat poison at home’ then it would answer such questions too !! If tempted it would also use curse words / …. Hence it was lacking HUMAN ETHICS and if gotten in wrong hands could lead to bigger concerns. Hence researchers wanted to ALIGN the LLM outputs with human preferences.
- This was called as PREFERENCE PROBLEM
- Above we saw how to finetune an LLM using RL and here we will finetune on the preference dataset created by humans and hence it is called RLHF.
- Methods to solve preference problem are called preference alignment. There are two ways to do so
- Fine Tuning LLM with human preference using Reinforcement Learning – RLHF Algorithm
- Fine tuning LLM with human preferences using Supervised Learning – DPO Algorithm
- More information available here
Now we finetune the LLM so that it can do tool / function calling, more on this here.
There are many good frameworks but some of the good ones are Langraph and Autogen (supports communication between llms)
- ReACT Paper
- Reflection Paper