Skip to content

Commit 7f9b56a

Browse files
committed
chatbot v10 has a Server class
1 parent ef432b7 commit 7f9b56a

File tree

3 files changed

+382
-2
lines changed

3 files changed

+382
-2
lines changed

notebooks/tps/chatbot/.teacher/README-chatbot-corrige-nb.md

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -472,7 +472,37 @@ plutôt que de proposer une liste de modèles "en dur" comme dans le *starter co
472472

473473
dans mon implémentation j'ai choisi de "cacher" ce résultat, pour ne pas redemander plusieurs fois cette liste à un même serveur (cette liste bouge très très peu...); mais c'est optionnel; par contre ce serait sympa pour les utilisateurs de conserver, lorsque c'est possible, le modèle choisi lorsqu'on change de serveur...
474474

475-
+++
475+
## v10 (optionnel): une classe `Server`
476+
477+
dans cette version, je vous propose de **créer une classe `Server`** qui
478+
encapsule la logique d'interaction avec un serveur donné; pour l'instant nous
479+
avons deux serveurs qui implémentent la même API, mais on pourrait imaginer que
480+
dans le futur on ait à interagir avec d'autres types de serveurs, qui
481+
implémentent une API différente (par exemple `litellm` que nous avons aussi
482+
déployé à l'Inria)
483+
484+
c'est pourquoi dans cette v10, je vous propose de rester à fonctionnalités
485+
constantes, mais de créer une classe `OllamaServer` qui hérite de
486+
`AbstractServer` et qui implémente les méthodes suivantes:
487+
488+
```python
489+
class AbstractServer:
490+
"""
491+
an abstract server class
492+
"""
493+
def list_models(self) -> list[str]:
494+
pass
495+
def generate_blocking(self, prompt, model, streaming ) -> list[str]:
496+
"""
497+
non-streaming generation - returns a list of text chunks
498+
"""
499+
pass
500+
def generate_streaming(self, prompt, model, streaming) -> Iterator[str]:
501+
"""
502+
streaming generation - yields text chunks
503+
"""
504+
pass
505+
```
476506

477507
## plein d'améliorations possibles
478508

Lines changed: 320 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,320 @@
1+
"""
2+
instead of using a hard-wired list of models,
3+
we fetch the list of supported models at the server
4+
at the api/tags endpoint using GET
5+
"""
6+
7+
import json
8+
from typing import Iterator
9+
10+
import requests
11+
import flet as ft
12+
13+
# in this version we create servers as INSTANCES of CLASSES
14+
# so we can encapsulate the logic to interact with them
15+
#
16+
# rationale is to be able to talk with servers that implement other APIs
17+
# e.g. litellm that has also been deployed and on more servers
18+
19+
# we keep the idea of specifying our available servers as this dictionary
20+
# but below we'll use this to create actual server INSTANCES
21+
22+
SERVER_SPECS = {
23+
# this one is fast because it has GPUs,
24+
# but it requires a login / password
25+
'GPU': {
26+
"name": "GPU fast",
27+
"url": "https://ollama-sam.inria.fr",
28+
"username": "Bob",
29+
"password": "hiccup",
30+
},
31+
# this one is slow because it has no GPUs,
32+
# but it does not require a login / password
33+
'CPU': {
34+
"name": "CPU slow",
35+
"url": "http://ollama.pl.sophia.inria.fr:8080",
36+
},
37+
}
38+
39+
TITLE = "My first Chatbot 10"
40+
41+
class AbstractServer:
42+
"""
43+
an abstract server class
44+
"""
45+
def list_models(self) -> list[str]:
46+
pass
47+
def generate_blocking(self, prompt, model, streaming ) -> list[str]:
48+
"""
49+
non-streaming generation - returns a list of text chunks
50+
"""
51+
pass
52+
def generate_streaming(self, prompt, model, streaming) -> Iterator[str]:
53+
"""
54+
streaming generation - yields text chunks
55+
"""
56+
pass
57+
58+
59+
class OllamaServer(AbstractServer):
60+
"""
61+
for servers that comply with ollama's API
62+
"""
63+
def __init__(self, name, url, username=None, password=None):
64+
self.name = name
65+
self.url = url
66+
self.username = username
67+
self.password = password
68+
69+
def _authenticate_extra_args(self) -> dict:
70+
auth_args = {}
71+
if self.username is not None:
72+
auth_args = {
73+
'auth': (self.username, self.password)
74+
}
75+
return auth_args
76+
77+
def list_models(self):
78+
url = f"{self.url}/api/tags"
79+
auth_args = self._authenticate_extra_args()
80+
answer = requests.get(url, **auth_args)
81+
print("HTTP status code:", answer.status_code)
82+
raw = answer.json()
83+
return raw['models']
84+
85+
def generate_blocking(self, prompt, model, streaming):
86+
url = f"{self.url}/api/generate"
87+
auth_args = self._authenticate_extra_args()
88+
payload = {'model': model, 'prompt': prompt}
89+
result = []
90+
91+
answer = requests.post(url, json=payload, **auth_args)
92+
print("HTTP status code:", answer.status_code)
93+
if answer.status_code != 200:
94+
print("not 200, aborting")
95+
return result
96+
for line in answer.text.split("\n"):
97+
# splitting artefacts can be ignored
98+
if not line:
99+
continue
100+
# there should be no exception, but just in case...
101+
try:
102+
data = json.loads(line)
103+
# the last JSON chunk contains statistics and is not a message
104+
if data['done']:
105+
break
106+
result.append(data['response'])
107+
except Exception as e:
108+
print(f"Exception {type(e)=}, {e=}")
109+
return result
110+
111+
112+
def generate_streaming(self, prompt, model, streaming):
113+
url = f"{self.url}/api/generate"
114+
auth_args = self._authenticate_extra_args()
115+
payload = {'model': model, 'prompt': prompt}
116+
result = []
117+
118+
answer = requests.post(url, json=payload, stream=True, **auth_args)
119+
print("HTTP status code:", answer.status_code)
120+
if answer.status_code != 200:
121+
print("not 200, aborting")
122+
return
123+
for line in answer.iter_lines():
124+
if not line:
125+
continue
126+
try:
127+
data = json.loads(line)
128+
if data['done']:
129+
return
130+
yield data['response']
131+
except Exception as e:
132+
print(f"Exception {type(e)=}, {e=}")
133+
134+
135+
SERVERS = {}
136+
for key, spec in SERVER_SPECS.items():
137+
SERVERS[key] = OllamaServer(
138+
name=spec['name'],
139+
url=spec['url'],
140+
username=spec.get('username', None),
141+
password=spec.get('password', None),
142+
)
143+
144+
class History(ft.Column):
145+
"""
146+
the history is a column of text messages
147+
where prompts and answers alternate
148+
"""
149+
150+
def __init__(self, app):
151+
super().__init__(
152+
[ft.TextField(
153+
label="Type a message...",
154+
on_submit=lambda event: app.send_request(event),
155+
fill_color="lightgrey",
156+
)],
157+
scroll=ft.ScrollMode.AUTO,
158+
auto_scroll=True,
159+
expand=True,
160+
)
161+
162+
# insert material - prompt or answer - to allow for different styles
163+
def add_prompt(self, message):
164+
self._add_entry(message, "prompt")
165+
def add_answer(self, message):
166+
self._add_entry(message, "answer")
167+
def _add_entry(self, message, kind):
168+
display = ft.Text(value=message)
169+
display.color = "blue" if kind == "prompt" else "green"
170+
display.size = 20 if kind == "prompt" else 16
171+
display.italic = kind == "prompt"
172+
self.controls.insert(-1, display)
173+
174+
# we always insert in the penultimate position
175+
# given that the last item in controls is the prompt TextField
176+
def add_chunk(self, chunk):
177+
self.controls[-2].value += chunk
178+
def current_prompt(self):
179+
return self.controls[-1].value
180+
181+
def enable_prompt(self):
182+
self.controls[-1].disabled = False
183+
def disable_prompt(self):
184+
self.controls[-1].disabled = True
185+
186+
class ChatbotApp(ft.Column):
187+
188+
def __init__(self):
189+
# we keep a cache of available models on each server
190+
self.models_per_server = {}
191+
192+
header = ft.Text(value=TITLE, size=40)
193+
194+
self.streaming = ft.Checkbox(label="streaming", value=True)
195+
# will be populated later
196+
self.model = ft.Dropdown(
197+
# options=[],
198+
width=300,
199+
)
200+
self.server = ft.Dropdown(
201+
options=[ft.dropdown.Option(server) for server in ("CPU", "GPU")],
202+
value="GPU",
203+
width=100,
204+
on_change=lambda event: self.update_models(),
205+
)
206+
207+
self.submit = ft.ElevatedButton("Send", on_click=self.send_request)
208+
209+
self.history = History(self)
210+
211+
row = ft.Row(
212+
[self.streaming, self.model, self.server, self.submit],
213+
alignment=ft.MainAxisAlignment.CENTER,
214+
)
215+
super().__init__(
216+
[header, row, self.history],
217+
horizontal_alignment=ft.CrossAxisAlignment.CENTER,
218+
expand=True,
219+
)
220+
221+
# go fetch the relevant models for the selected server
222+
# as explained below, at this point we are not yet in the page
223+
# so we cannot yet call update() at this point
224+
self.update_models(update=False)
225+
226+
def fetch_models(self):
227+
# already fetched ?
228+
if self.server.value in self.models_per_server:
229+
return
230+
server_instance = SERVERS[self.server.value]
231+
models = server_instance.list_models()
232+
233+
# for usability: sort the models alphabetically
234+
models.sort(key=lambda record: record['name'])
235+
for model in models:
236+
print(model)
237+
model_names = [ record['name'] for record in models ]
238+
self.models_per_server[self.server.value] = model_names
239+
240+
def update_models(self, *, update=True):
241+
# preserve current setting as far as possible
242+
current_model = self.model.value
243+
self.fetch_models()
244+
available_models = self.models_per_server[self.server.value]
245+
# replace the current options with the new ones
246+
self.model.options = [
247+
ft.dropdown.Option(model) for model in self.models_per_server[self.server.value]
248+
]
249+
# preserve setting if possible, otherwise pick first one
250+
if current_model in available_models:
251+
self.model.value = current_model
252+
else:
253+
# xxx somehow the first model on GPU - all-minilm:22m-l6-v2-fp16
254+
# returns an error saying the model does not support generate
255+
# so, as a workaround, find the first model that does not start with all-
256+
self.model.value = next(
257+
model for model in available_models if not model.startswith("all-")
258+
)
259+
# a subtle point here: because we call update_models in the constructor,
260+
# and because at that time the app is not yet in the page
261+
# we cannot call update() in that circumstance()
262+
# BUT since this method is bound the the 'change' event on the server widget
263+
# in that circumstance we need to update
264+
if update:
265+
self.update()
266+
267+
def send_request(self, _event):
268+
# disable the button to prevent double submission
269+
self.submit.disabled = True
270+
self.history.disable_prompt()
271+
self.send_request_2(_event)
272+
self.submit.disabled = False
273+
self.history.enable_prompt()
274+
self.update()
275+
276+
277+
# send the prompt to the server and display the answer
278+
def send_request_2(self, _event):
279+
model = self.model.value
280+
prompt = self.history.current_prompt()
281+
server_instance = SERVERS[self.server.value]
282+
283+
# record the question asked
284+
self.history.add_prompt(prompt)
285+
# create placeholder for the answer
286+
self.history.add_answer("")
287+
# update UI
288+
self.update()
289+
290+
# send the request
291+
streaming = self.streaming.value
292+
293+
print(f"Sending message to {server_instance.name}, {model=}, {streaming=}, {prompt=}")
294+
295+
296+
# streaming or non streaming
297+
if not streaming:
298+
# not streaming = blocking
299+
answers = server_instance.generate_blocking(prompt, model, streaming)
300+
for (text, done) in answers:
301+
if done:
302+
break
303+
self.history.add_chunk(text)
304+
self.update()
305+
else:
306+
# streaming version
307+
answers = server_instance.generate_streaming(prompt, model, streaming)
308+
for text in answers:
309+
self.history.add_chunk(text)
310+
self.update()
311+
312+
313+
def main(page: ft.Page):
314+
page.title = TITLE
315+
316+
chatbot = ChatbotApp()
317+
page.add(chatbot)
318+
319+
320+
ft.app(target=main)

notebooks/tps/chatbot/README-chatbot-nb.md

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -458,7 +458,37 @@ plutôt que de proposer une liste de modèles "en dur" comme dans le *starter co
458458

459459
dans mon implémentation j'ai choisi de "cacher" ce résultat, pour ne pas redemander plusieurs fois cette liste à un même serveur (cette liste bouge très très peu...); mais c'est optionnel; par contre ce serait sympa pour les utilisateurs de conserver, lorsque c'est possible, le modèle choisi lorsqu'on change de serveur...
460460

461-
+++
461+
## v10 (optionnel): une classe `Server`
462+
463+
dans cette version, je vous propose de **créer une classe `Server`** qui
464+
encapsule la logique d'interaction avec un serveur donné; pour l'instant nous
465+
avons deux serveurs qui implémentent la même API, mais on pourrait imaginer que
466+
dans le futur on ait à interagir avec d'autres types de serveurs, qui
467+
implémentent une API différente (par exemple `litellm` que nous avons aussi
468+
déployé à l'Inria)
469+
470+
c'est pourquoi dans cette v10, je vous propose de rester à fonctionnalités
471+
constantes, mais de créer une classe `OllamaServer` qui hérite de
472+
`AbstractServer` et qui implémente les méthodes suivantes:
473+
474+
```python
475+
class AbstractServer:
476+
"""
477+
an abstract server class
478+
"""
479+
def list_models(self) -> list[str]:
480+
pass
481+
def generate_blocking(self, prompt, model, streaming ) -> list[str]:
482+
"""
483+
non-streaming generation - returns a list of text chunks
484+
"""
485+
pass
486+
def generate_streaming(self, prompt, model, streaming) -> Iterator[str]:
487+
"""
488+
streaming generation - yields text chunks
489+
"""
490+
pass
491+
```
462492

463493
## plein d'améliorations possibles
464494

0 commit comments

Comments
 (0)