- 
                Notifications
    You must be signed in to change notification settings 
- Fork 37
Use custom dataset as response source #200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
395619f    to
    a7f0bea      
    Compare
  
    Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
…o dataset Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
| Peview with LibreChat with 46e5d1e as backend: @mayabar @irar2, this PR is big. I refactored a lot while implementing the custom dataset. Please don't hesitate to let me know where to fix or improve. I also created Hugging Face organization to host the converted dataset. Let me know which accounts should own it. Thanks! You probably have observed that if the prompt does not hit the dataset, it needs to go through all the keys, which takes some time, thus affecting the desired  | 
Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
| 
 Passed the question forward, I'll keep you updated | 
51209a7    to
    5b39826      
    Compare
  
    Signed-off-by: Qifan Deng <[email protected]>
5b39826    to
    037a910      
    Compare
  
    | Hi Qifan, | 
| 
 @mayabar Yes I expected concerns. Perhaps I should have discussed this before the implementation. For a request, a hash is generated, full messages for chat and whatever prompt for text completion. The hash is used to lookup responses. If no response found, or number of tokens is greater than max tokens, the desired number of tokens will be used to lookup responses. The “desired number of tokens” is generated using existing code. If there are responses found in dataset, randomly select one from them. Otherwise, randomly generate using existing implementation. 
 The main purpose of using a prepared dataset is for performance, with some other considerations. 
 
 Yes, the utility is with the dataset in the huggingface repo. Briefly, it creates sha256 on full messages, and save result in sqlite db. The file is called  To wrap up, the implementation is mainly for performance. I can add a config to let user decide the behaviour, with “loading dataset” in memory. Let me know what you think. | 
| 
 @pancak3 Thanks a lot for setting this up and for preparing the dataset + README — really helpful! | 
| 
 Sure, I am waiting for Maya's team's decision on whether to merge the use of the remote dataset in this feature. After the decision, we will know whether to keep the hf org I created. If not, I will delete it. Otherwise, the dataset repo may need to be created in the official llmd hf org. We will see. May I have the link to the official HF org? I failed to find it, so I created  | 
| 
 Deleted, https://huggingface.co/organizations/llm-d | 
Signed-off-by: Qifan Deng <[email protected]>
2a4c2c9    to
    b5518a7      
    Compare
  
    | @mayabar please let me know how you'd like to proceed with this PR when you have a moment. Thanks! | 
| @pancak3 Hi Qifan, sorry for low responsiveness, we have a holidays period ;) Going to review your changes | 
| 
 Sorry, I did not know. No worries! Let's discuss this after your holiday. Enjoy🏖️ | 
| I'm working today, we can discuss any opened issues | 
Signed-off-by: Qifan Deng <[email protected]>
| @pancak3 I just noticed that I forgot to finish review and all my comments were in pending state for last 3 days 🤦♀️ | 
| 
 It happens. :) This PR is big, so let me know when you've finished the review, and then I'll start to address them. Thanks! | 
| I had pending comments, submitted them about 1 hour ago | 
Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
| Hi Maya, I have resolved the above comments. Also added a config  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pancak3 thanks for your updates, I added some comments
Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
Signed-off-by: Qifan Deng <[email protected]>
| /lgtm | 

/resolve #196