Optimize multiple concurrent LLM calls #1073
              
                Unanswered
              
          
                  
                    
                      NicolaZomer
                    
                  
                
                  asked this question in
                Q&A
              
            Replies: 1 comment 1 reply
-
| I guess you might be thinking on batch. As far as I know, this PR will bring it to the project: | 
Beta Was this translation helpful? Give feedback.
                  
                    1 reply
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone!
I would like to know if there is an efficient way to optimize multiple LLM calls. Let's say I need to make 10 independent requests to the same LLM, instantiated with llama-cpp-python. Is there a more efficient way then doing it sequentially? Can we manage the workload, or parallelize it, or do you any other strategies that might help?
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions