-
-
Notifications
You must be signed in to change notification settings - Fork 174
Description
I was looking for a way to run inference across multiple devices when I came across your project. Previously I have looked at Distributed Llama (https://github.com/b4rtaz/distributed-llama) and gotten that to work but it was frankly a lot of work. I was hoping your project would be easier to use - and it seems to be from the instructions at least.
Could you expand the README.md file and be more specific as to the purpose of the project. I would like to know if Cake is aimed at providing users with the "capability" to run inference on large models, or if it also does so in a parallel way, resulting in increased inference speed?
This information would help users know beforehand if their goals and intentions align with what your project aims to do.