Skip to content

Commit 5f5a199

Browse files
[GenAI] Add readme to Microsoft.ML.GenAI.Phi (dotnet#7206)
* add readme * Update src/Microsoft.ML.GenAI.Phi/README.md Co-authored-by: Luis Quintanilla <[email protected]> * Update src/Microsoft.ML.GenAI.Phi/README.md Co-authored-by: Luis Quintanilla <[email protected]> * Update src/Microsoft.ML.GenAI.Phi/README.md Co-authored-by: Luis Quintanilla <[email protected]> * Update src/Microsoft.ML.GenAI.Phi/README.md Co-authored-by: Luis Quintanilla <[email protected]> * Update src/Microsoft.ML.GenAI.Phi/README.md Co-authored-by: Luis Quintanilla <[email protected]> * Update src/Microsoft.ML.GenAI.Phi/README.md Co-authored-by: Luis Quintanilla <[email protected]> --------- Co-authored-by: Luis Quintanilla <[email protected]>
1 parent 9ffb3a3 commit 5f5a199

File tree

1 file changed

+119
-0
lines changed

1 file changed

+119
-0
lines changed
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# Microsoft.ML.GenAI.Phi
2+
Torchsharp implementation of Microsoft phi-series models for GenAI
3+
4+
## Supported list
5+
The following phi-models are supported and tested:
6+
- [x] [Phi-2](https://huggingface.co/microsoft/phi-2)
7+
- [x] [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
8+
- [x] [Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)
9+
- [ ] [Phi-3-small-8k-instruct](https://huggingface.co/microsoft/Phi-3-small-8k-instruct)
10+
- [ ] [Phi-3-small-128k-instruct](https://huggingface.co/microsoft/Phi-3-small-128k-instruct)
11+
- [ ] [Phi-3-medium-4k-instruct](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct)
12+
- [ ] [Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct)
13+
- [ ] [Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-large-4k-instruct)
14+
15+
## Getting Started with Semantic Kernel
16+
17+
### Download model weight (e.g. phi-3-mini-4k-instruct) from Huggingface
18+
```bash
19+
## make sure you have lfs installed
20+
git clone https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
21+
```
22+
23+
### Load model
24+
```csharp
25+
var weightFolder = "/path/to/Phi-3-mini-4k-instruct";
26+
var configName = "config.json";
27+
var config = JsonSerializier.Deserialize<Phi3Config>(File.ReadAllText(Path.Combine(weightFolder, configName)));
28+
var model = new Phi3ForCasualLM(config);
29+
30+
// load tokenizer
31+
var tokenizerModelName = "tokenizer.model";
32+
var tokenizer = Phi3TokenizerHelper.FromPretrained(Path.Combine(weightFolder, tokenizerModelName));
33+
34+
// load weight
35+
model.LoadSafeTensors(weightFolder);
36+
37+
// initialize device
38+
var device = "cuda";
39+
if (device == "cuda")
40+
{
41+
torch.InitializeDeviceType(DeviceType.CUDA);
42+
}
43+
44+
45+
// create causal language model pipeline
46+
var pipeline = new CausalLMPipeline<Tokenizer, Phi3ForCausalLM>(tokenizer, model, device);
47+
```
48+
49+
### Add pipeline as `IChatCompletionService` to sematic kernel
50+
```csharp
51+
var kernel = Kernel.CreateBuilder()
52+
.AddGenAIChatCompletion(pipeline)
53+
.Build();
54+
```
55+
56+
### Chat with the model
57+
```csharp
58+
var chatService = kernel.GetRequiredService<IChatCompletionService>();
59+
var chatHistory = new ChatHistory();
60+
chatHistory.AddSystemMessage("you are a helpful assistant");
61+
chatHistory.AddUserMessage("write a C# program to calculate the factorial of a number");
62+
await foreach (var response in chatService.GetStreamingChatMessageContentsAsync(chatHistory))
63+
{
64+
Console.Write(response);
65+
}
66+
```
67+
68+
## Getting started with AutoGen.Net
69+
### Follow the same steps download model weight and load model
70+
### Create `Phi3Agent` from pipeline
71+
```csharp
72+
var agent = new Phi3Agent(pipeline, name: "assistant")
73+
.RegisterPrintMessage();
74+
```
75+
76+
### Chat with the model
77+
```csharp
78+
var task = """
79+
write a C# program to calculate the factorial of a number
80+
""";
81+
82+
await agent.SendAsync(task);
83+
```
84+
85+
### More examples
86+
Please refer to [Microsoft.ML.GenAI.Samples](./../../docs/samples/Microsoft.ML.GenAI.Samples/) for more examples.
87+
88+
## Dynamic loading
89+
It's recommended to run model inference on GPU, which requires at least 8GB of GPU memory for phi-3-mini-4k-instruct model if fully loaded.
90+
91+
If your GPU memory is not enough, you can choose to dynamically load the model weight to GPU memory. Here is how it works behind the scene:
92+
- when initializing the model, the size of each layer is calculated and stored in a dictionary
93+
- when loading the model weight, each layer is assigned to a device (CPU or GPU) based on the size of the layer and the remaining memory of the device. If there is no enough memory on the device, the layer is loaded to CPU memory.
94+
- when inference, the layer which is loaded to CPU memory is moved to GPU memory before the inference and moved back to CPU memory after the inference.
95+
96+
Here is how to enable dynamic loading of model:
97+
### Step 1: Infer the size of each layer
98+
You can infer the size of each layer using `InferDeviceMapForEachLayer` API. The `deviceMap` will be a key-value dictionary, where the key is the layer name and the value is the device name (e.g. "cuda" or "cpu").
99+
100+
```csharp
101+
// manually set up the available memory on each device
102+
var deviceSizeMap = new Dictionary<string, long>
103+
{
104+
["cuda"] = modelSizeOnCudaInGB * 1L * 1024 * 1024 * 1024,
105+
["cpu"] = modelSizeOnMemoryInGB * 1L * 1024 * 1024 * 1024,
106+
["disk"] = modelSizeOnDiskInGB * 1L * 1024 * 1024 * 1024,
107+
};
108+
109+
var deviceMap = model.InferDeviceMapForEachLayer(
110+
devices: ["cuda", "cpu", "disk"],
111+
deviceSizeMapInByte: deviceSizeMap);
112+
```
113+
114+
### Step 2: Load model weights using `ToDynamicLoadingModel` API
115+
Once the `deviceMap` is calculated, you can pass it to `ToDynamicLoadingModel` api to load the model weight.
116+
117+
```csharp
118+
model = model.ToDynamicLoadingModel(deviceMap, "cuda");
119+
```

0 commit comments

Comments
 (0)