-
-
Notifications
You must be signed in to change notification settings - Fork 362
Dolly 20 Free ChatGPT like Model for Commercial Use How To Install And Use Locally On Your PC
Full tutorial link > https://www.youtube.com/watch?v=ku6UvK1bsp4
Databricks’ #Dolly v2 is a free, open source, commercially useable ChatGPT-style #AI model. Dolly 2.0 could spark a new wave of fully open source LLMs similar to #ChatGPT . Open source community working hardest to bring up a model that can compete with GPT4. Our discord: https://bit.ly/SECoursesDiscord
If I have been of assistance to you and you would like to show your support for my work, please consider becoming a patron on 🥰 https://www.patreon.com/SECourses
Technology & Science: News, Tips, Tutorials, Tricks, Best Applications, Guides, Reviews
https://www.youtube.com/playlist?list=PL_pbwdIyffsnkay6X91BWb9rrfLATUMr3
Playlist of StableDiffusion Tutorials, Automatic1111 and Google Colab Guides, DreamBooth, Textual Inversion / Embedding, LoRA, AI Upscaling, Pix2Pix, Img2Img
https://www.youtube.com/playlist?list=PL_pbwdIyffsmclLl0O144nQRnezKlNdx3
Gist file used in the video. The scripts are shared there
https://gist.github.com/FurkanGozukara/c8eb2e2213a30182edb25333faea4dc5
Model link on Hugging Face databricks/dolly-v2-12b
https://huggingface.co/databricks/dolly-v2-12b
How To Install Python, Setup Virtual Environment VENV
BitSandBytes windows fork
https://github.com/Keith-Hon/bitsandbytes-windows
databricks/dolly-v2-7b
https://huggingface.co/databricks/dolly-v2-7b
databricks/dolly-v2-3b
https://huggingface.co/databricks/dolly-v2-3b
00:00:00 Introduction to how to install and use Databricks’ Dolly v2
00:01:12 This is a video that is kind of teach a man how to fish not give a fish
00:01:26 I am sharing a Gradio interface to use Dolly v2 model with performance optimization
00:01:52 How to download / clone a big repository from Hugging Face
00:02:41 How to make a venv and install Dolly 2 model running requirements
00:05:19 Requirements installed. How to run Dolly v2 to do inference / text generation
00:07:14 How to download and use Gradio script to do inference with Dolly v2
00:08:16 How to use quantization : load in 8 bit of Hugging Face models
00:08:24 How to install and use BitSandBytes in Windows
00:08:55 How to run Gradio interface of Dolly v2
00:10:58 How to improve results you get from Dolly v2
00:11:55 How to use Microsoft Visual Studio to quickly run Python apps and debug them
00:12:21 How to change Python environment in Microsoft Visual Studio Community Free Edition
00:13:04 How to debug a Python application in Microsoft Visual Studio
Databricks’ dolly-v2-12b, an instruction-following large language model trained on the Databricks machine learning platform that is licensed for commercial use. Based on pythia-12b, Dolly is trained on ~15k instruction/response fine tuning records databricks-dolly-15k generated by Databricks employees in capability domains from the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA and summarization. dolly-v2-12b is not a state-of-the-art model, but does exhibit surprisingly high quality instruction following behavior not characteristic of the foundation model on which it is based.
Model Overview
dolly-v2-12b is a 12 billion parameter causal language model created by Databricks that is derived from EleutherAI’s Pythia-12b and fine-tuned on a ~15K record instruction corpus generated by Databricks employees and released under a permissive license (CC-BY-SA)
Performance Limitations
dolly-v2-12b is not a state-of-the-art generative language model and, though quantitative benchmarking is ongoing, is not designed to perform competitively with more modern model architectures or models subject to larger pretraining corpuses.
The Dolly model family is under active development, and so any list of shortcomings is unlikely to be exhaustive, but we include known limitations and misfires here as a means to document and share our preliminary findings with the community.
In particular, dolly-v2-12b struggles with: syntactically complex prompts, programming problems, mathematical operations, factual errors, dates and times, open-ended question answering, hallucination, enumerating lists of specific length, stylistic mimicry, having a sense of humor, etc.
Dataset Limitations
Like all language models, dolly-v2-12b reflects the content and limitations of its training corpuses.
The Pile: GPT-J’s pre-training corpus contains content mostly collected from the public internet, and like most web-scale datasets, it contains content many users would find objectionable. As such, the model is likely to reflect these shortcomings, potentially overtly in the case it is explicitly asked to produce objectionable content, and sometimes subtly, as in the case of biased or harmful implicit associations.
databricks-dolly-15k: The training data on which dolly-v2-12b is instruction tuned represents natural language instructions generated by Databricks employees during a period spanning March and April 2023 and includes passages from Wikipedia as references passages for instruction categories like closed QA and summarization.
-
00:00:00 Greetings everyone.
-
00:00:01 In this video I will introduce you and show you how to use Dolly version 2.
-
00:00:08 Databricks’ dolly-v2-12b billion parameters model is an instruction following large language
-
00:00:13 model trained on the Databricks machine learning platform that is licensed for commercial use.
-
00:00:20 Now this is very important.
-
00:00:22 This model allows you to use for commercial purposes based on Pythia 12 billion parameters
-
00:00:30 model.
-
00:00:31 Dolly is trained on 15,000 instruction/response fine tuning records generated by Databricks
-
00:00:38 employees in capability domains from the Instruct GPT paper including brainstorming, classification,
-
00:00:46 closed questions and answering, generation, information extraction, open questions and
-
00:00:51 answering and summarization.
-
00:00:53 This new large language model is also covered by Ars Technica.
-
00:00:58 A really big deal Dolly is a free open source chat GPT style AI model.
-
00:01:04 Their subtitle is Dolly version 2 could spark a new wave of fully open source LLMs similar
-
00:01:10 to the ChatGPT.
-
00:01:12 So this video will be more like give a man a fish and you feed him for a day.
-
00:01:16 Not like that.
-
00:01:18 Teach a man to fish and you feed him for a lifetime.
-
00:01:21 So I will teach you how to use such models whenever a new one gets released.
-
00:01:26 Moreover, I am sharing a Gradio script so that you will be able to use this model with
-
00:01:32 a Gradio interface as well or the other models that will be released in future.
-
00:01:38 To be able to learn the fundamental concept of this video, you only need to know how to
-
00:01:44 use Python, how to install and generate virtual environment folders.
-
00:01:49 We will begin with cloning this repository into our drive.
-
00:01:53 So I will clone the repository inside my Dolly video folder, opening CMD window.
-
00:01:59 Git clone and the URL of the repository.
-
00:02:01 It will download all of the files.
-
00:02:05 There is one tricky issue here.
-
00:02:07 The big model file is 24 gigabytes and when you are cloning that, it won't show you the
-
00:02:13 process.
-
00:02:14 Therefore, once all of the small files are downloaded like this, you can close this window,
-
00:02:20 download this big file manually, click it, and in here click download and you will see
-
00:02:26 the process of downloading like this.
-
00:02:28 You can also click downloads in your browser and you will see the process of downloading
-
00:02:33 here as well.
-
00:02:34 So once the download is completed, move the model file into your folder like this.
-
00:02:39 Then the instruction is posted on their Hugging Face page.
-
00:02:44 We need to make a virtual environment folder and install these requirements.
-
00:02:50 So now I will do that.
-
00:02:51 To do it, I will make a virtual environment in this folder.
-
00:02:55 Start a CMD.
-
00:02:56 I have posted everything you need in this folder.
-
00:02:59 You don't need to memorize or type them manually.
-
00:03:02 We are going to use Python 3.10.6 version.
-
00:03:05 Usually all of these models are working with Python 3.10 version.
-
00:03:10 So if you use 3.11 version, you may get problems.
-
00:03:14 So this is the command we are going to use to generate our virtual environment.
-
00:03:20 Just run it in the folder where you want the folder to be generated.
-
00:03:24 Once the command is executed, you will see a folder generated like this.
-
00:03:28 Inside there, go to the scripts, open another CMD window here like this.
-
00:03:34 Type activate.
-
00:03:35 Now this virtual environment folder is activated.
-
00:03:38 So these models are working with Torch libraries.
-
00:03:41 Therefore, first I will install Torch.
-
00:03:44 To install Torch, we go to the official website of the Torch, PyTorch.
-
00:03:47 I will install the Torch as shown here, the latest version.
-
00:03:51 So just copy and paste it here and hit enter.
-
00:03:54 This is important that currently we are in the activated virtual environment folder.
-
00:04:00 So whatever we install will be only effective in here and will not cause any problems or
-
00:04:06 any conflictions with other installations that we use.
-
00:04:09 We also need to install the latest CUDA version.
-
00:04:13 So to do that, you can type download and install CUDA.
-
00:04:16 You will get this link.
-
00:04:18 Go to there, select your Windows version, select exe local, download and install it.
-
00:04:24 When installing, you can just click next, next, next.
-
00:04:27 You don't need to do anything.
-
00:04:28 After you have installed CUDA, you should see the CUDA path in your environment variables
-
00:04:33 like I am showing right now.
-
00:04:35 Also, you should see CUDA path in here as well.
-
00:04:38 This is the screen when you click edit path variable here.
-
00:04:41 I have explained all of this in this video.
-
00:04:45 So you should definitely watch it.
-
00:04:47 The Torch installation has been completed.
-
00:04:50 Now we need to install Accelerate and the certain version of Torch.
-
00:04:54 Because you see they are allowing to use any version of Accelerate bigger than 12.
-
00:04:59 However, they are requiring certain version of Transformers.
-
00:05:03 So first, let's copy this command and execute it in our activated virtual environment folder.
-
00:05:09 By the way, you can just remove the certain version and it will get installed like this.
-
00:05:14 Then let's do the same for the Transformers version.
-
00:05:18 If you want to see the full results, just install like this.
-
00:05:21 Okay both requirements are installed and now we are able to execute this command and get
-
00:05:27 the results.
-
00:05:29 Since we have downloaded the model manually, we don't need trust remote code.
-
00:05:33 Moreover, you should use bffloat16 to reduce VRAM usage.
-
00:05:37 With this way, I am able to use it in my 24 gigabyte RTX 3090.
-
00:05:44 However, I will show you a very neat trick.
-
00:05:49 The transformers models are supporting load in 8-bit by using bitsandbytes, and this is
-
00:05:55 significantly reducing the VRAM usage.
-
00:05:58 So let me demonstrate first by using this code they have given in here as an example.
-
00:06:05 Let's go back to our installation folder.
-
00:06:07 Let's make a demo.py file.py.
-
00:06:11 Yes.
-
00:06:12 Let's open it with Notepad++, copy paste the command.
-
00:06:15 Since we have downloaded the model previously, I will change the path.
-
00:06:19 Okay when you are working with Python code, you need to use forward slash like this.
-
00:06:24 Okay.
-
00:06:25 Now we need to use this generate text model to generate a text.
-
00:06:29 So I am going to copy paste the command here like this, and we need to get the result of
-
00:06:35 generate text result like this and print the result to the screen like this.
-
00:06:41 Now, when I run my script I should see the result of explain to me difference between
-
00:06:46 nuclear fission and fusion.
-
00:06:48 Okay we got the results quickly.
-
00:06:50 Okay we have got the results quickly.
-
00:06:53 So these are the results.
-
00:06:54 Let me copy paste it here so you can read it better.
-
00:06:57 However, this script is not optimal at all because each time when we change this prompt,
-
00:07:04 we have to restart entire thing, reload the model into the VRAM.
-
00:07:08 Therefore, I have prepared an amazing gradio.
-
00:07:12 This secret is posted on my GitHub Gist repository, the link will be in the description.
-
00:07:17 Let's download it.
-
00:07:18 Click raw, right click save as same into the folder where you have installed or wherever
-
00:07:22 you want.
-
00:07:23 Just let's type Dolly gradio.py.
-
00:07:27 The extension is important.
-
00:07:29 So right click edit with notepad++.
-
00:07:32 So what you need to change with this script is the model path.
-
00:07:36 If you have downloaded your model, then we will also use load in 8-bit, which is quantization.
-
00:07:42 Because quantization will significantly reduce the VRAM usage and it will make the model
-
00:07:48 run faster.
-
00:07:49 Also, there is one other parameter that you can change, which is max_new_tokens.
-
00:07:54 This will directly impact the duration of the model inference and also the output length
-
00:08:00 that you are going to get for demonstration.
-
00:08:02 Let's make it as 256 like this.
-
00:08:06 Okay for using quantization, we need to use load in 8-bit parameter true like this.
-
00:08:11 There is an example page on Hugging Face documents.
-
00:08:14 So to be able to use load in 8-bit, we need to use BitSandBytes and to use BitSandBytes,
-
00:08:21 we need to install specific version.
-
00:08:23 Because BitSandBytes installation on Windows is not just working as it is.
-
00:08:28 So there is a BitSandBytes Windows GitHub repository that is forked from the BitSandBytes
-
00:08:34 original repository.
-
00:08:35 This fork is made by Keith-Hon and we are going to use it.
-
00:08:39 This command is also available in the gist file.
-
00:08:41 The link of this gist file will be in the description.
-
00:08:44 So it's a public gist file.
-
00:08:46 Let's execute this command in our activated virtual environment.
-
00:08:49 Okay it is installing BitSandBytes Windows version right now.
-
00:08:54 And let's run the Gradio interface and let's see how it is working.
-
00:08:57 So I will also change the model pad for this video demonstration, which is this one.
-
00:09:03 Let's fix the backslashes like this.
-
00:09:06 Okay.
-
00:09:07 For running this Gradio, we still need to be inside our activated virtual environment.
-
00:09:11 Then right click and copy path.
-
00:09:13 Type Python, paste the path, hit enter.
-
00:09:16 Okay.
-
00:09:17 To be able to use Gradio, we also need to install Gradio first.
-
00:09:21 To install Gradio, pip install Gradio.
-
00:09:24 Gradio is very small thing to be installed.
-
00:09:27 Okay Gradio has been installed.
-
00:09:28 I am still inside my activated virtual environment.
-
00:09:31 This is important.
-
00:09:32 Just type Python and paste the path of the file that you have saved as Gradio.
-
00:09:38 Hit enter.
-
00:09:39 For preparing this Gradio, I have spent quite a bit time.
-
00:09:42 It is really optimized.
-
00:09:44 If you don't have a good graphic card, I think it also works on CPU.
-
00:09:48 However, it may work very slow.
-
00:09:50 Therefore, you may be need to use the smaller parameters having models.
-
00:09:55 To find the smaller parameters having models, just click the DataBricks.
-
00:09:59 And in here you will see: Dolly version 2, 7 billion parameters.
-
00:10:03 Dolly version 2, 12 billion parameters.
-
00:10:06 Dolly version 2, 3 billion parameters.
-
00:10:09 And Dolly version 1.
-
00:10:10 So the Dolly version 2, 3 billion parameters is the smallest one.
-
00:10:14 This version should work on even very low VRAM having GPUs with 8-bit quantization,
-
00:10:21 load in 8-bit.
-
00:10:23 Okay once the Gradio code is running, you will get a URL like this.
-
00:10:28 Just copy it, paste it into your browser, and you will get this Gradio interface.
-
00:10:33 Now you can repeatedly ask questions.
-
00:10:37 So let's ask the first question.
-
00:10:39 Instead of making a peanut butter and jelly sandwich, what else could I combine peanut
-
00:10:44 butter with in a sandwich?
-
00:10:45 Give five ideas.
-
00:10:46 You see, even when I am recording a video, this is the VRAM usage currently with the
-
00:10:52 largest model.
-
00:10:54 Okay.
-
00:10:55 This is the results we get.
-
00:10:56 However, we can improve this.
-
00:10:58 How?
-
00:10:59 I have improved the Gradio code and what we can improve is we can load without 8-bit like
-
00:11:05 this by using Torch bfloat16.
-
00:11:08 We can also increase the max length so it will be able to produce better output.
-
00:11:13 So this is the latest version of the Gradio code.
-
00:11:17 It is also updated on the Gist file.
-
00:11:20 Let's restart and see the results we are going to get with this version.
-
00:11:24 Okay we got a much better answer this time.
-
00:11:26 So it looks like instead of using 8-bit, we should use 7 billion parameters or 3 billion
-
00:11:32 parameters if we don't have sufficient amount of VRAM.
-
00:11:35 And these are the results with 7 billion parameters model.
-
00:11:39 This time, I have made the max length with 1024 and the model has 7 billion parameters.
-
00:11:45 It looks like the model is repeating itself when you set the max length token a lot like
-
00:11:51 this.
-
00:11:52 Each time activating virtual environment folder is somewhat bothersome and also if you want
-
00:11:57 to do debugging, you can use Microsoft Visual Studio.
-
00:12:01 So open Microsoft Visual Studio.
-
00:12:02 You can download it from the internet.
-
00:12:04 Click create new project.
-
00:12:06 Select Python application.
-
00:12:08 Click next.
-
00:12:09 Select your folder as you wish.
-
00:12:11 Give a name.
-
00:12:12 Click create.
-
00:12:13 Okay.
-
00:12:14 What we are going to do is we will change the Python environment to our installed Python
-
00:12:18 environment instead of the default one.
-
00:12:21 So right click the Python environment.
-
00:12:23 Click add environment.
-
00:12:24 In here select existing environment and click custom.
-
00:12:28 Click this three dots icon.
-
00:12:31 Find your installed virtual environment.
-
00:12:33 Mine is in here.
-
00:12:34 Select the main folder of virtual environment.
-
00:12:37 Select folder and it will automatically load the virtual environment.
-
00:12:42 You see the parameters are now displayed on the screen.
-
00:12:44 Click add.
-
00:12:45 Then you see now this is my Python environment and this is the py file.
-
00:12:51 Just copy paste the gradio file I have posted on the gist.
-
00:12:55 Change the parameters as you wish.
-
00:12:57 Let's make the max length as 256.
-
00:12:59 With this way you can also debug the application and see what is happening.
-
00:13:04 For example, let's put a breakpoint here and the advantage of this that when you just click
-
00:13:10 f5, it will automatically activate the virtual environment and run your default py file here.
-
00:13:17 So you see the Microsoft Visual Studio started the Python script that we set as default.
-
00:13:23 The gradio is loaded.
-
00:13:25 Let's refresh the url and let's see the how do I make a campfire this time.
-
00:13:29 Click generate.
-
00:13:31 As you said the max length lesser.
-
00:13:33 It is becoming much faster when compared to bigger max length.
-
00:13:38 Microsoft Visual Studio Python applications are great way to run custom Python scripts.
-
00:13:43 Okay, you see it.
-
00:13:45 It was really fast and now we are hit to break point.
-
00:13:48 With breakpoint I can see what is actually happening.
-
00:13:51 Just hover your mouse over the parameters and you will see what are inside them.
-
00:13:57 This is really great.
-
00:13:58 You can see what is happening actually.
-
00:14:00 You can click, view and see what is happening.
-
00:14:02 So Microsoft Visual Studio is the tool that I use to debug Python applications Python
-
00:14:08 scripts.
-
00:14:09 Then you can remove or put breakpoints like this.
-
00:14:11 Just remove them.
-
00:14:13 Hit f5 and this is the result we get for how do I make a campfire with 7 billion parameters
-
00:14:18 model.
-
00:14:19 So in this video, I wanted to teach you how to catch a fish instead of giving you a fish.
-
00:14:25 This is a general workflow for running the models hosted on GitHub repository or Hugging
-
00:14:32 Face.
-
00:14:33 By the way, this answer is pretty decent also.
-
00:14:36 If you have enjoyed, please like, subscribe.
-
00:14:38 Also, leave a comment.
-
00:14:40 I have excellent tutorials on my channel as you are seeing right now related to GPT, chatGPT,
-
00:14:47 artificial intelligence, machine learning, and Stable Diffusion and other stuff.
-
00:14:52 If you click, Join and support us on Youtube, I would appreciate it very much.
-
00:14:56 If you click Patreon and support us on Patreon, I would appreciate that very much as well.
-
00:15:01 Your Patreon support is extremely important for me.
-
00:15:04 In the comments tell me that what you want to see next.
-
00:15:07 Also, in the description of the video, you will find the link for the gist file.
-
00:15:13 Like I have typed in this video, the used script in this video.
-
00:15:16 It will open the gist repository hosted on the GitHub.
-
00:15:19 Moreover, you will find our Discord link and Patreon link as well.
-
00:15:23 Join our discord.
-
00:15:24 Let's discuss topics with you.
-
00:15:26 You can also ask me questions on Discord.
-
00:15:29 The links will be also posted on our pinned comment in the video, so you can also check
-
00:15:34 the pinned comment as well.
-
00:15:36 Thank you so much for your support!
-
00:15:38 Hopefully see you in another awesome video!
-
00:15:40 You can also click our playlist and check out all the playlists that we have.
-
00:15:42 Hopefully see you later.
