diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..72db576 --- /dev/null +++ b/.gitignore @@ -0,0 +1,2 @@ +authkey-local.txt +log.txt diff --git a/README.md b/README.md new file mode 100644 index 0000000..7d25e54 --- /dev/null +++ b/README.md @@ -0,0 +1,21 @@ +# Unlimited chatbot using chatGPT API + +## Introduction +This is a simple chatbot that uses the chatGPT API to generate unlimited responses such as `bad` contents. The chatGPT API is a paid service, so you need to get your own API key to use this chatbot. The UI is built using Gradio. +You can modify `is_refusal()` function in `chatbot.py` to adapt to your own use case. + + +## Usage +1. add your chatGPT API key to `authkey.txt` +2. comment out line 5 and uncomment line 6 in `chatbot.py` to use the chatGPT API +3. run the following commands +```shell +$ pip3 install -r requirements.txt +$ python3 chatbot.py +``` +4. chat with the bot on browser + + + + + diff --git a/authkey.txt b/authkey.txt new file mode 100644 index 0000000..78c9e1c --- /dev/null +++ b/authkey.txt @@ -0,0 +1 @@ +yourkeyhere diff --git a/chatGPTAPIbasics.ipynb b/chatGPTAPIbasics.ipynb deleted file mode 100644 index cabea3d..0000000 --- a/chatGPTAPIbasics.ipynb +++ /dev/null @@ -1,583 +0,0 @@ -{ - "cells": [ - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Working through this notebook on YouTube: https://www.youtube.com/watch?v=c-g6epk3fFE" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "What is going on everyone and welcome to a video going over the ChatGPT API that was recently released by OpenAI. \n", - "\n", - "There has been a ChatGPT implementation where you can chat with ChatGPT extremely easily, so why might we be interested in an API instead?\n", - "\n", - "Essentially, the API just plain gives you far more power and control to do more new and novel things with ChatGPT's responses, as well as the ability to integrate it with other applications.\n", - "\n", - "In order to query this model, we will first need an API key. For this, you'll need an account and to set up billing. Typically, you will get some starting credit, but you may or may not, depending on when you sign up and try to use this API. You can create your account at https://platform.openai.com/\n", - "\n", - "From there, go to the top right, click your profile, manage account, and then billing to add a payment method. From here, on the left side, choose API Keys under \"user.\"\n", - "\n", - "Create a key, and then copy the key's value, you will need this in your program. In the same directory that you're working in, create a \"key.txt\" file and copy and paste the key in there. Save and exit. This particular API costs $0.002, or a fifth of a penny, per 1,000 tokens at the time of my writing.\n", - "\n", - "You will also need the `openai` Python package. You can install it with `pip install --upgrade openai`. The upgrade is there to ensure that you have the latest version, since the ChatGPT API is a new feature." - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [], - "source": [ - "import openai\n", - "\n", - "# load and set our key\n", - "openai.api_key = open(\"key.txt\", \"r\").read().strip(\"\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The way the ChatGPT API works is you need to query the model. Since these models often make use of chat history/context, every query needs to, or can, include a full message history context. \n", - "\n", - "Keep in mind, however that the maximum context length is 4096 tokens, so you need to stay under that. There are lots of options to work around this, the simplest being truncating earlier messages, but you can actually even use ChatGPT to help you to summarize and condense the previous message history. Maybe more on this later though. 4096 tokens is something like 20,000 characters, but it this can vary. Tokens are just words, bits of words, or combinations of words or cominations of bits of words. Every response from ChatGPT will inform you how many tokens you're using, so you can keep track.\n", - "\n", - "Let's start with an example input from a user to the API:" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [], - "source": [ - "completion = openai.ChatCompletion.create(\n", - " model=\"gpt-3.5-turbo\", # this is \"ChatGPT\" $0.002 per 1k tokens\n", - " messages=[{\"role\": \"user\", \"content\": \"What is the circumference in km of the planet Earth?\"}]\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Notice the \"role\" is \"user.\" There are 3 roles:\n", - "\n", - "User - This is meant to mimic the end-user that is interacting with the assistant. This is the role that you will be using most of the time.\n", - "System - This role can mimic sort of background nudges and prompts that you might want to inject into the conversation, but that dont need a response. At the moment, system is weighted less than \"user,\" so it still seems more useful to use the user for encouraging specific behaviors in my opinion.\n", - "Assistant - This is the agent's response. Often this will be actual responses, but keep in mind... you will be able to inject your own responses here, so you can actually have the agent say whatever you want. This is a bit of a hack, but it's a fun one and can be useful in certain situations.\n", - "\n", - "The full completion has a lot of information besides just the text response:" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\n", - " \"choices\": [\n", - " {\n", - " \"finish_reason\": \"stop\",\n", - " \"index\": 0,\n", - " \"message\": {\n", - " \"content\": \"\\n\\nThe circumference of the planet Earth in km is approximately 40,075 km.\",\n", - " \"role\": \"assistant\"\n", - " }\n", - " }\n", - " ],\n", - " \"created\": 1678044086,\n", - " \"id\": \"chatcmpl-6qoD8O1qGxluR2fct8hM9aSYDnqzU\",\n", - " \"model\": \"gpt-3.5-turbo-0301\",\n", - " \"object\": \"chat.completion\",\n", - " \"usage\": {\n", - " \"completion_tokens\": 18,\n", - " \"prompt_tokens\": 18,\n", - " \"total_tokens\": 36\n", - " }\n", - "}\n" - ] - } - ], - "source": [ - "print(completion)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In probably most cases, what you're after is specifically:" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "\n", - "The circumference of the planet Earth in km is approximately 40,075 km.\n" - ] - } - ], - "source": [ - "reply_content = completion.choices[0].message.content\n", - "print(reply_content)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "So far so good, this is a very basic example of using the API. In most cases, you're going to need to manage the history, however. The API itself isn't going to manage your history for you, so how might we do that? I would just start with some sort of message history variable for now to keep it simple, but you might use a database or some other storage method. " - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "User's input was: What is the moon's circumference in km?\n" - ] - } - ], - "source": [ - "message_history = []\n", - "# What is the moon's circumference in km?\n", - "user_input = input(\"> \")\n", - "print(\"User's input was: \", user_input)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now that you have the user input, let's format it for the API:" - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [], - "source": [ - "message_history.append({\"role\": \"user\", \"content\": f\"{user_input}\"})" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Then we can query the API:" - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "\n", - "The moon's circumference is approximately 10,917 km.\n" - ] - } - ], - "source": [ - "completion = openai.ChatCompletion.create(\n", - " model=\"gpt-3.5-turbo\",\n", - " messages=message_history\n", - ")\n", - "\n", - "# Now we can print the response:\n", - "reply_content = completion.choices[0].message.content\n", - "print(reply_content)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "After getting a repsonse, you'll want to append it to the history:" - ] - }, - { - "cell_type": "code", - "execution_count": 27, - "metadata": {}, - "outputs": [], - "source": [ - "# note the use of the \"assistant\" role here. This is because we're feeding the model's response into context.\n", - "message_history.append({\"role\": \"assistant\", \"content\": f\"{reply_content}\"})" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can then followup with another query, demonstrating the use of history:" - ] - }, - { - "cell_type": "code", - "execution_count": 28, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "User's input was: which moon is that in reference to?\n", - "\n", - "I'm sorry for the confusion. That circumference is for Earth's moon, also known as Luna.\n" - ] - } - ], - "source": [ - "# which moon is that in reference to?\n", - "user_input = input(\"> \")\n", - "print(\"User's input was: \", user_input)\n", - "print()\n", - "message_history.append({\"role\": \"user\", \"content\": f\"{user_input}\"})\n", - "\n", - "completion = openai.ChatCompletion.create(\n", - " model=\"gpt-3.5-turbo\",\n", - " messages=message_history\n", - ")\n", - "\n", - "reply_content = completion.choices[0].message.content\n", - "print(reply_content)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Combining everything:" - ] - }, - { - "cell_type": "code", - "execution_count": 31, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "User's input was: Can I drink the water from a dehumidifier?\n", - "\n", - "\n", - "As an AI language model, I cannot recommend that you drink water from a dehumidifier. Although dehumidifiers remove moisture from the air, the water collected can contain impurities and toxins that are harmful to human health. Also, dehumidifiers are not designed to produce potable water, and the water may contain bacteria, fungi, or other contaminants that might make you ill. It is always best to use a certified water source or filter the collected water before drinking it.\n", - "\n", - "User's input was: How might we make it safe in an emergency to drink?\n", - "In an emergency situation, it is essential to have access to safe drinking water. If you need water and there is no other option but to use water from a dehumidifier, here are some steps you can take to make it safer to drink:\n", - "\n", - "1. Check the water: Make sure the water is clean and clear. If the water is cloudy or has particles in it, do not drink it.\n", - "\n", - "2. Boil the water: Boiling the water can kill off any bacteria or viruses that may be present. Bring the water to a rolling boil, then let it cool down before drinking.\n", - "\n", - "3. Use a filter: A water filter can help remove impurities from the water. A portable water filter, such as a ceramic or carbon filter, can be useful in such situations.\n", - "\n", - "4. Add purification tablets or drops: Purification tablets or drops, such as iodine or chlorine, can kill off harmful microorganisms in the water. Follow the instructions provided by the manufacturer to ensure proper usage.\n", - "\n", - "It is always better to have clean and safe drinking water stored in advance, rather than relying on questionable sources during an emergency situation.\n", - "\n" - ] - } - ], - "source": [ - "message_history = []\n", - "\n", - "def chat(inp, role=\"user\"):\n", - " message_history.append({\"role\": role, \"content\": f\"{inp}\"})\n", - " completion = openai.ChatCompletion.create(\n", - " model=\"gpt-3.5-turbo\",\n", - " messages=message_history\n", - " )\n", - " reply_content = completion.choices[0].message.content\n", - " message_history.append({\"role\": \"assistant\", \"content\": f\"{reply_content}\"})\n", - " return reply_content\n", - "\n", - "for i in range(2):\n", - " user_input = input(\"> \")\n", - " print(\"User's input was: \", user_input)\n", - " print(chat(user_input))\n", - " print()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Great, looks like everything is working, now, let's see how we might combine this into our own application. We can start off with the most obvious example: A chatbot, and we can make use of `gradio` for the front-end UI.\n", - "\n", - "To use gradio, we'll need to install it with `pip install gradio`. Then, we'll make our initial imports:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import gradio as gr\n", - "import openai\n", - "\n", - "openai.api_key = open(\"key.txt\", \"r\").read().strip(\"\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Then, we can start by defining our message history. In this case, let's make our chatbot a joke bot, where we supply the subject(s) and the bot will make a joke from there.\n", - "\n", - "I'll start by having the user submit the following:\n", - "\n", - "\"You are a joke bot. I will specify the subject matter in my messages, and you will reply with a joke that includes the subjects I mention in my messages. Reply only with jokes to further input. If you understand, say OK.\"" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "message_history = [{\"role\": \"user\", \"content\": f\"You are a joke bot. I will specify the subject matter in my messages, and you will reply with a joke that includes the subjects I mention in my messages. Reply only with jokes to further input. If you understand, say OK.\"},\n", - " {\"role\": \"assistant\", \"content\": f\"OK\"}]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "\n", - "Then, we'll inject the assistant's reply of \"OK\" to encourage it to do what I've asked. Next, we'll make a predict function, which is similar to our `chat` function from before, but is merged with the demo `predict` function from a gradio example:" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def predict(input):\n", - " # tokenize the new input sentence\n", - " message_history.append({\"role\": \"user\", \"content\": f\"{input}\"})\n", - "\n", - " completion = openai.ChatCompletion.create(\n", - " model=\"gpt-3.5-turbo\",\n", - " messages=message_history\n", - " )\n", - " #Just the reply text\n", - " reply_content = completion.choices[0].message.content#.replace('```python', '
').replace('```', '
')\n", - " \n", - " message_history.append({\"role\": \"assistant\", \"content\": f\"{reply_content}\"}) \n", - " \n", - " # get pairs of msg[\"content\"] from message history, skipping the pre-prompt: here.\n", - " response = [(message_history[i][\"content\"], message_history[i+1][\"content\"]) for i in range(2, len(message_history)-1, 2)] # convert to tuples of list\n", - " return response" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Then we can build the gradio app. To make things easier, I'll comment what each line does here:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# creates a new Blocks app and assigns it to the variable demo.\n", - "with gr.Blocks() as demo: \n", - "\n", - " # creates a new Chatbot instance and assigns it to the variable chatbot.\n", - " chatbot = gr.Chatbot() \n", - "\n", - " # creates a new Row component, which is a container for other components.\n", - " with gr.Row(): \n", - " '''creates a new Textbox component, which is used to collect user input. \n", - " The show_label parameter is set to False to hide the label, \n", - " and the placeholder parameter is set'''\n", - " txt = gr.Textbox(show_label=False, placeholder=\"Enter text and press enter\").style(container=False)\n", - " '''\n", - " sets the submit action of the Textbox to the predict function, \n", - " which takes the input from the Textbox, the chatbot instance, \n", - " and the state instance as arguments. \n", - " This function processes the input and generates a response from the chatbot, \n", - " which is displayed in the output area.'''\n", - " txt.submit(predict, txt, chatbot) # submit(function, input, output)\n", - " #txt.submit(lambda :\"\", None, txt) #Sets submit action to lambda function that returns empty string \n", - "\n", - " '''\n", - " sets the submit action of the Textbox to a JavaScript function that returns an empty string. \n", - " This line is equivalent to the commented out line above, but uses a different implementation. \n", - " The _js parameter is used to pass a JavaScript function to the submit method.'''\n", - " txt.submit(None, None, txt, _js=\"() => {''}\") # No function, no input to that function, submit action to textbox is a js function that returns empty string, so it clears immediately.\n", - " \n", - "demo.launch()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The full app now is:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import gradio as gr\n", - "import openai\n", - "\n", - "openai.api_key = open(\"key.txt\", \"r\").read().strip(\"\\n\")\n", - "\n", - "message_history = [{\"role\": \"user\", \"content\": f\"You are a joke bot. I will specify the subject matter in my messages, and you will reply with a joke that includes the subjects I mention in my messages. Reply only with jokes to further input. If you understand, say OK.\"},\n", - " {\"role\": \"assistant\", \"content\": f\"OK\"}]\n", - "\n", - "def predict(input):\n", - " # tokenize the new input sentence\n", - " message_history.append({\"role\": \"user\", \"content\": f\"{input}\"})\n", - "\n", - " completion = openai.ChatCompletion.create(\n", - " model=\"gpt-3.5-turbo\", #10x cheaper than davinci, and better. $0.002 per 1k tokens\n", - " messages=message_history\n", - " )\n", - " #Just the reply:\n", - " reply_content = completion.choices[0].message.content#.replace('```python', '
').replace('```', '
')\n", - "\n", - " print(reply_content)\n", - " message_history.append({\"role\": \"assistant\", \"content\": f\"{reply_content}\"}) \n", - " \n", - " # get pairs of msg[\"content\"] from message history, skipping the pre-prompt: here.\n", - " response = [(message_history[i][\"content\"], message_history[i+1][\"content\"]) for i in range(2, len(message_history)-1, 2)] # convert to tuples of list\n", - " return response\n", - "\n", - "# creates a new Blocks app and assigns it to the variable demo.\n", - "with gr.Blocks() as demo: \n", - "\n", - " # creates a new Chatbot instance and assigns it to the variable chatbot.\n", - " chatbot = gr.Chatbot() \n", - "\n", - " # creates a new Row component, which is a container for other components.\n", - " with gr.Row(): \n", - " '''creates a new Textbox component, which is used to collect user input. \n", - " The show_label parameter is set to False to hide the label, \n", - " and the placeholder parameter is set'''\n", - " txt = gr.Textbox(show_label=False, placeholder=\"Enter text and press enter\").style(container=False)\n", - " '''\n", - " sets the submit action of the Textbox to the predict function, \n", - " which takes the input from the Textbox, the chatbot instance, \n", - " and the state instance as arguments. \n", - " This function processes the input and generates a response from the chatbot, \n", - " which is displayed in the output area.'''\n", - " txt.submit(predict, txt, chatbot) # submit(function, input, output)\n", - " #txt.submit(lambda :\"\", None, txt) #Sets submit action to lambda function that returns empty string \n", - "\n", - " '''\n", - " sets the submit action of the Textbox to a JavaScript function that returns an empty string. \n", - " This line is equivalent to the commented out line above, but uses a different implementation. \n", - " The _js parameter is used to pass a JavaScript function to the submit method.'''\n", - " txt.submit(None, None, txt, _js=\"() => {''}\") # No function, no input to that function, submit action to textbox is a js function that returns empty string, so it clears immediately.\n", - " \n", - "demo.launch()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "From here, we can open the app:\n", - "\n", - "```\n", - "$ python3 gradio-joke.py \n", - "Running on local URL: http://127.0.0.1:7860\n", - "\n", - "To create a public link, set `share=True` in `launch()`.\n", - "```" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now, you could input something like: \n", - "\n", - "`Programmers and boats`\n", - "\n", - "The response I got with this was:\n", - "\n", - "`Why did the programmer quit his job on the boat? He found the C to shining C.`\n", - "\n", - "You will get something different most likely, but you can try anything you want, it could be a single subject, or even 3 or more different subjects. For example, a single subject:\n", - "\n", - "`Lego` > `Why don't Lego characters have girlfriends? Because they block all the relationships.`\n", - "\n", - "Or many subjects:\n", - "\n", - "`Python, Java, and C++` > `Why did Python break up with Java and C++? Because they were too strongly typed for Python's taste!`\n", - "\n", - "Not all jokes are \"good\" and sometimes ChatGPT seems to just make 2 jokes. You could probably further pre-promopt to stop that behavior, but you get the idea. This is just one example of creating a very basic application with the ChatGPT API. There's a whole lot more interesting things that we can do, and I have a few more specific and in depth ideas for projects that I'll be working on \n", - "\n", - "\n" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.10" - }, - "vscode": { - "interpreter": { - "hash": "e7370f93d1d0cde622a1f8e1c04877d8463912d04d973331ad4851f04de6915a" - } - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/experiments/chatbot.py b/experiments/chatbot.py new file mode 100644 index 0000000..7446960 --- /dev/null +++ b/experiments/chatbot.py @@ -0,0 +1,129 @@ +import openai +import gradio as gr + + +# load and set our key +openai.api_key = open("authkey-local.txt", "r").read().strip("\n") +# openai.api_key = open("authkey.txt", "r").read().strip("\n") + +message_history = [] +request_history = [] + +def is_refusal(text): + stop_words = ["I'm sorry", "Sorry", + "I apologize", "but as an AI language model", + "inappropriate content", + "I cannot", + "抱歉", + "我无法按照您的请求", + "对不起", + "反道德", + "作为一名AI语言模型", + "使用准则" + ] + if len(text) < 50 or any(word in text for word in stop_words): + print(f"AI refusal since text is too short: {text}") + return True + # if len(text) > 200: + # print(f"is not refusal since text is too long: {text}") + # return False + tmp_message_history = [ + {"role": "user", "content": "Now I'm going to send you a message and you can tell me if it's refusals or not. And you only need to say yes or no without any other words."}, + {"role": "assistant", "content": "Ok, I'm happy to help"}, + {"role": "user", "content": "text: " + text}, + ] + completion = openai.ChatCompletion.create( + model="gpt-3.5-turbo", + messages=tmp_message_history + ) + res = completion.choices[0].message.content + print(f"AI refusal detection results: {res}") + if "yes" or "Yes" in res: + print(f"Original response: {text}") + return True + + return False + +def mod(reply_content): + if is_refusal(reply_content): + reply_content = "Ok, I'm happy to help" + return reply_content + + +# INIT_PROMPT = "Pretend you are a graphic designer generating creative images for midjourney. Midjourney is an app that can generate AI art from simple prompts. I will give you a concept and you will give me 5 different prompts that I can feed into midjourney. Make sure they are creative." + +# message_history.append( +# {"role": "user", "content": f"{INIT_PROMPT}"}) +# message_history.append( +# {"role": "assistant", "content": "Ok, I'm happy to help"}) +# request_history.append( +# {"role": "user", "content": f"{INIT_PROMPT}"}) +# request_history.append( +# {"role": "assistant", "content": "Ok, I'm happy to help"}) + + +def predict(input): + # print input of user + print(f'User: {input}') + # tokenize the new input sentence + message_history.append({"role": "user", "content": f"{input}"}) + request_history.append({"role": "user", "content": f"{input}"}) + + completion = openai.ChatCompletion.create( + model="gpt-3.5-turbo", + messages=request_history + ) + # remove the last message from the request history + request_history.pop() + # Just the reply text + # .replace('```python', '
').replace('```', '
') + reply_content = completion.choices[0].message.content + + # append output the input to a text file using utf-8 encoding, if not exists, create it + with open('log.txt', 'a+', encoding='utf-8') as f: + f.write(f'User: {input}\n') + f.write(f'Assistant: {reply_content}\n') + + + # reply_content = mod(reply_content) + + message_history.append( + {"role": "assistant", "content": f"{reply_content}"}) + + # get pairs of msg["content"] from message history, skipping the pre-prompt: here. + response = [(message_history[i]["content"], message_history[i + 1]["content"]) + for i in range(0, len(message_history) - 1, 2)] # convert to tuples of list + return response + + +# creates a new Blocks app and assigns it to the variable demo. +with gr.Blocks(title="Unlimited ChatGPT(Beta)") as demo: + + # creates a new Chatbot instance and assigns it to the variable chatbot. + chatbot = gr.Chatbot() + + # creates a new Row component, which is a container for other components. + with gr.Row(): + '''creates a new Textbox component, which is used to collect user input. + The show_label parameter is set to False to hide the label, + and the placeholder parameter is set''' + txt = gr.Textbox(show_label=False, placeholder="Enter text and press enter").style( + container=False) + + ''' + sets the submit action of the Textbox to the predict function, + which takes the input from the Textbox, the chatbot instance, + and the state instance as arguments. + This function processes the input and generates a response from the chatbot, + which is displayed in the output area.''' + txt.submit(predict, txt, chatbot) # submit(function, input, output) + # txt.submit(lambda :"", None, txt) #Sets submit action to lambda function that returns empty string + ''' + sets the submit action of the Textbox to a JavaScript function that returns an empty string. + This line is equivalent to the commented out line above, but uses a different implementation. + The _js parameter is used to pass a JavaScript function to the submit method.''' + txt.submit(None, None, txt, + _js="() => {''}") # No function, no input to that function, submit action to textbox is a js function that returns empty string, so it clears immediately. + +# demo.launch(auth=("badass", "eatshit"), share=True) +demo.launch() diff --git a/experiments/langchainTest.py b/experiments/langchainTest.py new file mode 100644 index 0000000..892fa2a --- /dev/null +++ b/experiments/langchainTest.py @@ -0,0 +1,76 @@ +import openai +import gradio as gr + + +# load and set our key +openai.api_key = open("authkey-local.txt", "r").read().strip("\n") +# openai.api_key = open("authkey.txt", "r").read().strip("\n") + +message_history = [] +request_history = [] + +def predict(input): + # print input of user + print(f'User: {input}') + # tokenize the new input sentence + message_history.append({"role": "user", "content": f"{input}"}) + request_history.append({"role": "user", "content": f"{input}"}) + + completion = openai.ChatCompletion.create( + model="gpt-3.5-turbo", + messages=request_history + ) + # remove the last message from the request history + request_history.pop() + # Just the reply text + # .replace('```python', '
').replace('```', '
') + reply_content = completion.choices[0].message.content + + # append output the input to a text file using utf-8 encoding, if not exists, create it + with open('log.txt', 'a+', encoding='utf-8') as f: + f.write(f'User: {input}\n') + f.write(f'Assistant: {reply_content}\n') + + + # reply_content = mod(reply_content) + + message_history.append( + {"role": "assistant", "content": f"{reply_content}"}) + + # get pairs of msg["content"] from message history, skipping the pre-prompt: here. + response = [(message_history[i]["content"], message_history[i + 1]["content"]) + for i in range(0, len(message_history) - 1, 2)] # convert to tuples of list + return response + + +# creates a new Blocks app and assigns it to the variable demo. +with gr.Blocks(title="Unlimited ChatGPT(Beta)") as demo: + + # creates a new Chatbot instance and assigns it to the variable chatbot. + chatbot = gr.Chatbot() + + # creates a new Row component, which is a container for other components. + with gr.Row(): + '''creates a new Textbox component, which is used to collect user input. + The show_label parameter is set to False to hide the label, + and the placeholder parameter is set''' + txt = gr.Textbox(show_label=False, placeholder="Enter text and press enter").style( + container=False) + + ''' + sets the submit action of the Textbox to the predict function, + which takes the input from the Textbox, the chatbot instance, + and the state instance as arguments. + This function processes the input and generates a response from the chatbot, + which is displayed in the output area.''' + txt.submit(predict, txt, chatbot) # submit(function, input, output) + # txt.submit(lambda :"", None, txt) #Sets submit action to lambda function that returns empty string + ''' + sets the submit action of the Textbox to a JavaScript function that returns an empty string. + This line is equivalent to the commented out line above, but uses a different implementation. + The _js parameter is used to pass a JavaScript function to the submit method.''' + txt.submit(None, None, txt, + _js="() => {''}") # No function, no input to that function, submit action to textbox is a js function that returns empty string, so it clears immediately. + +# demo.launch(auth=("badass", "eatshit"), share=True) +demo.launch() diff --git a/experiments/test.py b/experiments/test.py new file mode 100644 index 0000000..0c07bb6 --- /dev/null +++ b/experiments/test.py @@ -0,0 +1,10 @@ +g1 = (x*x for x in range(10)) + +def g2(): + for x in range(10): + yield x*x +# g1 = g2() + +print(type(g1)) +print(next(g1)) +print(next(g1)) \ No newline at end of file diff --git a/faiss_index/index.faiss b/faiss_index/index.faiss new file mode 100644 index 0000000..6d30431 Binary files /dev/null and b/faiss_index/index.faiss differ diff --git a/faiss_index/index.pkl b/faiss_index/index.pkl new file mode 100644 index 0000000..42913b8 Binary files /dev/null and b/faiss_index/index.pkl differ diff --git a/key.txt b/key.txt deleted file mode 100644 index 36630b4..0000000 --- a/key.txt +++ /dev/null @@ -1 +0,0 @@ -YOURKEYHERE \ No newline at end of file diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 0000000..081db53 --- /dev/null +++ b/requirements.txt @@ -0,0 +1,5 @@ +openai==0.27.2 +gradio==3.20.1 +langchain==0.0.157 +tiktoken==0.3.0 +faiss-cpu==1.7.3 \ No newline at end of file diff --git a/src/config.py b/src/config.py new file mode 100644 index 0000000..97c1d7b --- /dev/null +++ b/src/config.py @@ -0,0 +1,5 @@ +CHUNK_SIZE = 30 +CHUNK_OVERLAP = 0 +DOCFILE = "test_docs/test1.txt" +DB_NAME = "faiss_index" +AUTHPATH = "authkey-local.txt" \ No newline at end of file diff --git a/src/jarvis.py b/src/jarvis.py new file mode 100644 index 0000000..8ecf2d3 --- /dev/null +++ b/src/jarvis.py @@ -0,0 +1,28 @@ +# For embedding +from langchain.embeddings.openai import OpenAIEmbeddings +# Vector store +from langchain.vectorstores import FAISS +# For Q&A +from langchain.chains.question_answering import load_qa_chain +# ChatOpenAI GPT 3.5 +from langchain.chat_models import ChatOpenAI +from config import DB_NAME, DOCFILE, AUTHPATH +import os +from scanner import scan + +def jarvis(query:str, DB_NAME=DB_NAME) -> str: + embeddings = OpenAIEmbeddings() + db = FAISS.load_local(DB_NAME, embeddings) + embedding_vector = embeddings.embed_query(query) + + docs_and_scores = db.similarity_search_by_vector(embedding_vector) + + chain = load_qa_chain(ChatOpenAI(temperature=0), chain_type="stuff") + + ans = chain({"input_documents": docs_and_scores, "question": query}) + return ans["output_text"] + +if __name__ == '__main__': + os.environ["OPENAI_API_KEY"] = open(AUTHPATH).read().strip() + scan(DOCFILE, DB_NAME) + jarvis(DB_NAME) \ No newline at end of file diff --git a/src/scanner.py b/src/scanner.py new file mode 100644 index 0000000..786664a --- /dev/null +++ b/src/scanner.py @@ -0,0 +1,33 @@ +# Embedding用 +from langchain.embeddings.openai import OpenAIEmbeddings +from langchain.text_splitter import CharacterTextSplitter +# Vector 格納 / FAISS +from langchain.vectorstores import FAISS +# テキストファイルを読み込む +from langchain.document_loaders import TextLoader +from config import CHUNK_SIZE, CHUNK_OVERLAP, DOCFILE, DB_NAME + +import os + + +def scan(document, db_name=DB_NAME): + + loader = TextLoader(document) + documents = loader.load() + text_splitter = CharacterTextSplitter( + separator = "\n", + chunk_size=CHUNK_SIZE, + chunk_overlap=CHUNK_OVERLAP + ) + docs = text_splitter.split_documents(documents) + # print(docs) + + embeddings = OpenAIEmbeddings() + + db = FAISS.from_documents(docs, embeddings) + + db.save_local(db_name) + +if __name__ == "__main__": + os.environ["OPENAI_API_KEY"] = open("authkey.txt").read().strip() + scan(DOCFILE, DB_NAME) diff --git a/src/webui.py b/src/webui.py new file mode 100644 index 0000000..d819d25 --- /dev/null +++ b/src/webui.py @@ -0,0 +1,48 @@ +import gradio as gr +from jarvis import jarvis + + +message_history = [] +request_history = [] + +def predict(input): + # print input of user + print(f'User: {input}') + # tokenize the new input sentence + message_history.append({"role": "user", "content": f"{input}"}) + + reply_content = jarvis(input) + + # append output the input to a text file using utf-8 encoding, if not exists, create it + with open('log.txt', 'a+', encoding='utf-8') as f: + f.write(f'User: {input}\n') + f.write(f'Assistant: {reply_content}\n') + + + # reply_content = mod(reply_content) + + message_history.append( + {"role": "assistant", "content": f"{reply_content}"}) + + # get pairs of msg["content"] from message history, skipping the pre-prompt: here. + response = [(message_history[i]["content"], message_history[i + 1]["content"]) + for i in range(0, len(message_history) - 1, 2)] # convert to tuples of list + return response + +# creates a new Blocks app and assigns it to the variable demo. +with gr.Blocks(title="Unlimited ChatGPT(Beta)") as demo: + + chatbot = gr.Chatbot() + + # creates a new Row component, which is a container for other components. + with gr.Row(): + txt = gr.Textbox(show_label=False, placeholder="Enter text and press enter").style( + container=False) + + txt.submit(predict, txt, chatbot) # submit(function, input, output) + + txt.submit(None, None, txt, + _js="() => {''}") # No function, no input to that function, submit action to textbox is a js function that returns empty string, so it clears immediately. + +demo.launch(auth=("badass", "eatshit"), share=True) + diff --git a/test_docs/test1.txt b/test_docs/test1.txt new file mode 100644 index 0000000..0067446 --- /dev/null +++ b/test_docs/test1.txt @@ -0,0 +1,17 @@ +灯塔简介:灯塔学院成立于2019年,面向在校学生及社会人群体提供最切合日本IT行业需求的多样性技能培训与专业的就转职咨询服务。 学院讲师均来自日本顶级IT企业的最前线,结合契合日本市场的自研课程结倾囊相授。内容包含Java、Python、SAP、AWS云服务等多领域IT技能类课程、以及基本情报技术者、日语、就职培训等知识对策类课程。 +钟老师,担当:(个人)Java开发 + 项目实战 + 资格全栈对策,背景:厦门大学本科,京都大学MBA,职位:日本IT顶尖大手企业架构师,介绍:主要从事基盘系统以及后端服务器架构与开发,参与企业微服务指导手册撰写,某开放银行平台架构,某支付平台后台服务器开发。拥有丰富的教学经验。 +徐老师,担当:(个人)Java开发 + 项目实战 + 资格全栈对策,背景:东京工业大学情报工学硕士,职位:国立情报学研究所研究员, 介绍:从事基础技术研究。自行开发游戏引擎及游戏。从事过网页设计开发,应用程序设计开发。在AI自动化应用,人工智能 & 深度学习,多媒体信息识别等相关开发及研究。丰富的教学经验。 +焉老师,担当:(个人)Java开发 + 项目实战 + 资格全栈对策,背景:一桥大学商学硕士,东京大学计算机科学博士,职位:高级工程师,数据分析师。介绍:计算机领域专家,撰写某顶尖企业『社内开发标准』。拥有已授权专利18件,其中几件广泛应用于各大IT互联网企业。拥有丰富的教学经验。 +邓老师,担当:基本情报技术者/情报安全管理资格对策,背景:千叶大学工学系硕士,职位:SAP项目经理,介绍:多年在日本顶级电机制造商从事IoT解决方案的技术咨询和超大型系统集成项目的项目管理与运维业务。擅长各类开发模式,战略,经营管理等相关领域。 +冯老师,担当:基本情报技术者/情报安全管理资格对策,背景:西安建筑科技大学计算机科学专业本科,职位:全栈工程师,介绍:在日IT行业从业15年,拥有丰富的项目经验及十几项国家资格。ネットワークスペシャリスト,データベーススペシャリスト,情報処理安全確保支援士(登録者) 等高级国家资格持有者。 +万老师,担当:Python + 大数据处理 + 人工智能,背景:清华大学本科,东京大学修士及博士,职位:数据科学家,AI工程师,介绍:日本TOP互联网企业从事数理统计,数据分析 & 机器学习相关工作。擅长深度神经网络,迁移学习等领域,有着诸多研究成果。坚信编程是第一生产力! +赵老师,担当:Python + 大数据处理 + 人工智能,背景:新加坡南洋理工大学本科(数据分析),职位:全栈工程师,AI,数据分析师,介绍:机器学习方面专家。新加坡SAP亚洲研究所从事机器学习相关研究工作,擅长计算机科学,各种机器学习相关算法及其中的数学理论,剖析算法中的数学原理。 +龚老师,担当:Python + 大数据处理 + 人工智能,背景:筑波大学博士(情报科学),职位:数据科学家,AI工程师,介绍:日本TOP互联网企业,政府科研机构等从事数据咨询相关工作。擅长计算机科学,机器学习,深度学习,教给大家使用图形可视化方式分析数据。 +徐老师,担当:Python + 大数据处理 + 人工智能,背景:东京工业大学硕士(情报工学),职位:国立情报学研究所研究员,介绍:从事基础技术研究。自行开发游戏引擎及游戏。从事过网页设计开发,应用程序设计开发。在AI自动化应用,人工智能 & 深度学习,多媒体信息识别等相关开发及研究。丰富的教学经验。 +徐老师,担当:AWS云计算架构 + 解决方案项目+ 资格对策,背景:奈良先端科学技术大学院大学情报科学研究科,职位:高级云工程师,运维工程师,介绍:擅长通信基盘设施的设计及构筑,AWS,Azure等云服务自动化运维基盘设计及构筑。持有AWS方面开发专利。拥有丰富的教学经验,资深AWS讲师。 +東老师,担当:SAP开发咨询+ABAP开发+SD模块,背景:首都大学東京硕士,职位:日本埃森哲技术本部SAP经理,介绍:二十年以上SAP开发咨询顾问工作经验、从事制造业,贩卖业相关项目开发经理。社内ABAP教育担当。非常擅长数学及逻辑思维,拥有丰富教学经验。 +索老师,担当:极速日语①(基础-N2整合),背景:东洋大学社会福祉学博士,职位:灯塔学院日语教育负责人,介绍:资深日语讲师,教龄10年,教学经验超过3000人。从事日本社会福祉学会・日本社会政策学会前中日同声传译担当。爱好电影,美食,旅行。擅长通过类比传授各种语法。 +王老师,担当:极速日语①(基础-N2整合),背景:庆应义塾大学硕士,职位:灯塔学院日语金牌讲师,介绍:资深日语教师,从高中开始赴日学习工作,从事日语教学6年。擅长领域为JLPT(日本语能力考试)、EJU(留学生考试)等各专项训练、日语小论文写作、面试指导等相关科目。 +郑老师,担当:极速日语①(基础-N2整合)背景:东京外国语大学日本语教育硕士,职位:灯塔学院日语金牌讲师,介绍:资深日语教师,日语教学经验7年,长期从事中日翻译工作。擅长科目为JLPT(日本语能力考试)各专项训练、商务日语、IT日语等。 +李老师,担当:保就职服务,背景:千叶大学法政经学部,职位:灯塔学院人才咨询总负责人,介绍:灯塔计划创始人之一,曾在日本顶级人材服务企业负责面向IT企业、外资制造商、商社以及政府机关的外国人才采用咨询、营业及市场推广等业务。现在负责人材咨询与营业拓展。 +Cindy老师,担当:保就职服务,背景:一流私立大学社会学部毕业,职位:灯塔学院保就职服务导师,介绍:现与日本政府机关共同制定并施行外国人才引进方案,从事外国人就职转职咨询辅导工作5年。擅长自我分析、面试等就转职辅导。熟知IT企业招聘流程,所需人物像和留学生常见面试问题。