Access local SLMs directly #6694

dotMorten · 2024-06-12T23:14:00Z

dotMorten
Jun 12, 2024

Every sample I see requires you to fire up llama and run a localhost website to access your local models. This makes for an awkward developer and user experience, and it also risk hitting port conflicts.
When I use onnxruntime-genai I can load and use my Phi3 model directly, without the need of a webservice running. Is this possible with semantic kernel?

This issue here might be there to address that, but the issue has no description so hard to tell: #6619 @matthewbolanos

dotMorten · 2024-06-13T04:25:18Z

dotMorten
Jun 13, 2024
Author

So figured out how to create a custom Phi3 provider:

using Microsoft.SemanticKernel;
using System.Text;
using Microsoft.SemanticKernel.ChatCompletion;
using Microsoft.SemanticKernel.Services;
using Microsoft.ML.OnnxRuntimeGenAI;
using System.Runtime.CompilerServices;
using System.Diagnostics;

namespace Phi3SemanticKernel
{
    public class Phi3ChatCompletionService : IChatCompletionService
    {
        private static Model? model;
        private static MultiModalProcessor? processor;

        public Phi3ChatCompletionService(string modelPath)
        {
            model = new Model(modelPath);
            processor = new MultiModalProcessor(model);
        }
        private async IAsyncEnumerable<StreamingChatMessageContent> Answer(ChatHistory history, [EnumeratorCancellation] CancellationToken cancellationToken = default)
        {
            if (processor is not null)
            {
                await Task.Delay(1).ConfigureAwait(false);
                using var tokenizerStream = processor!.CreateStream();

                StringBuilder prompt = new StringBuilder();
                foreach(var item in history)
                    prompt.Append($"<|{item.Role}|>{item.Content}<|end|>");
                prompt.Append("<|assistant|>");
                var fullPrompt = prompt.ToString();

                var inputTensors = processor.ProcessImages(fullPrompt, null);
                using GeneratorParams generatorParams = new GeneratorParams(model);
                generatorParams.SetSearchOption("max_length", 3072);
                generatorParams.SetInputs(inputTensors);

                // generate response
                using var generator = new Generator(model, generatorParams);
                while (!generator.IsDone())
                {
                    generator.ComputeLogits();
                    generator.GenerateNextToken();
                    var seq = generator.GetSequence(0)[^1];
                    var str = tokenizerStream.Decode(seq);
                    if (cancellationToken.IsCancellationRequested)
                        break;
                    yield return new StreamingChatMessageContent(AuthorRole.Assistant, str)
;
                    await Task.Yield();
                    if (cancellationToken.IsCancellationRequested)
                        break;
                }
            }
        }

        IReadOnlyDictionary<string, object?> IAIService.Attributes => throw new NotImplementedException();

        Task<IReadOnlyList<ChatMessageContent>> IChatCompletionService.GetChatMessageContentsAsync(ChatHistory chatHistory, PromptExecutionSettings? executionSettings = null, Kernel? kernel = null, CancellationToken cancellationToken = default)
        {
            throw new NotImplementedException();
        }

        IAsyncEnumerable<StreamingChatMessageContent> IChatCompletionService.GetStreamingChatMessageContentsAsync(ChatHistory chatHistory, PromptExecutionSettings? executionSettings = null, Kernel? kernel = null, CancellationToken cancellationToken = default)
        {
            return Answer(chatHistory, cancellationToken);
        }
    }
}

However I'm not quite sure where to continue to implement function calling which is what I'm really interested in with the semantic kernel

0 replies

sophialagerkranspandey · 2024-06-14T15:37:33Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Access local SLMs directly #6694

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Access local SLMs directly #6694

Uh oh!

dotMorten Jun 12, 2024

Replies: 2 comments · 2 replies

Uh oh!

Uh oh!

dotMorten Jun 13, 2024 Author

Uh oh!

sophialagerkranspandey Jun 14, 2024

Uh oh!

MithrilMan Aug 8, 2024

Uh oh!

lilhoser Oct 2, 2024

dotMorten
Jun 12, 2024

Replies: 2 comments 2 replies

dotMorten
Jun 13, 2024
Author

sophialagerkranspandey
Jun 14, 2024