Skip to content

Commit a16cde1

Browse files
author
leoguillaume
committed
feat: qdrant grist migration
1 parent f8aaee6 commit a16cde1

File tree

16 files changed

+282
-102
lines changed

16 files changed

+282
-102
lines changed

README.md

Lines changed: 4 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
<div id="toc"><ul align="center" style="list-style: none">
22
<summary><h1>Albert API</h1></summary>
33

4-
![Code Coverage](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/etalab-ia/albert-api/174-tests-CI-CD/.github/badges/coverage.json)
4+
![codecov](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/etalab-ia/albert-api/174-tests-CI-CD/.github/badges/coverage.json)
55

66
<br>
77
<a href="https://github.com/etalab-ia/albert-api/blob/main/CHANGELOG.md"><b>Changelog</b></a> | <a href="https://albert.api.etalab.gouv.fr/documentation"><b>Documentation</b></a> | <a href="https://albert.api.etalab.gouv.fr/playground"><b>Playground</b></a> | <a href="https://albert.api.etalab.gouv.fr/status"><b>Status</b></a> | <a href="https://albert.api.etalab.gouv.fr/swagger"><b>Swagger</b></a> <br><br>
@@ -18,10 +18,6 @@ Ce framework, destiné à un environnement de production soumis à des contraint
1818

1919
En se basant sur les conventions définies par OpenAI, Albert API expose des endpoints qui peuvent être appelés avec le [client officiel python d'OpenAI](https://github.com/openai/openai-python/tree/main). Ce formalisme permet une intégration aisée avec des bibliothèques tierces comme [Langchain](https://www.langchain.com/) ou [LlamaIndex](https://www.llamaindex.ai/).
2020

21-
## 🔑 Accès
22-
23-
Si vous êtes un organisme public, vous pouvez demander une clé d'accès à Albert API en remplissant le [formulaire sur le site ALLiaNCE](https://alliance.numerique.gouv.fr/albert/).
24-
2521
## 📫 API Gateway
2622

2723
L'API Albert permet d'être un proxy entre des clients API d'IA générative et d'assurer du load balancing entre les différents clients :
@@ -101,22 +97,10 @@ Vous trouverez ici des ressources de documentation :
10197
- [Documentation technique de l'API](./docs)
10298
- [Repository HuggingFace](https://huggingface.co/AgentPublic)
10399

104-
## 🧑‍💻 Contribuez au projet
105-
106-
Albert API est un projet open source, vous pouvez contribuer au projet en lisant notre [guide de contribution](./CONTRIBUTING.md).
107-
108-
## 🚀 Installation
100+
## 🚀 Quickstart
109101

110102
Pour déployer l'API Albert sur votre propre infrastructure, suivez la [documentation](./docs/deployment.md).
111103

112-
### Quickstart
113-
114-
1. Complétez le fichier *[config.example.yml](./config.example.yml)* à la racine du dépot la configuration de vos API de modèles (voir la [documentation déploiement](./docs/deployment.md) pour plus d'informations).
115-
116-
2. Complétez le fichier *[compose.yml](./compose.yml)* à la racine du dépot avec les variables d'environnement nécessaires pour l'UI.
117-
118-
3. Déployez l'API avec Docker à l'aide du fichier [compose.yml](../compose.yml) à la racine du dépot.
104+
## 🧑‍💻 Contribuez au projet
119105

120-
```bash
121-
docker compose --file compose.prod.yml up --detach
122-
```
106+
Albert API est un projet open source, vous pouvez contribuer au projet en lisant notre [guide de contribution](./CONTRIBUTING.md).
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
"""remove unique name on token and collection
2+
3+
Revision ID: a4ac45e7c990
4+
Revises: 8c5ae2a3d4d0
5+
Create Date: 2025-04-15 15:13:33.886841
6+
7+
"""
8+
9+
from typing import Sequence, Union
10+
11+
from alembic import op
12+
13+
14+
# revision identifiers, used by Alembic.
15+
revision: str = "a4ac45e7c990"
16+
down_revision: Union[str, None] = "8c5ae2a3d4d0"
17+
branch_labels: Union[str, Sequence[str], None] = None
18+
depends_on: Union[str, Sequence[str], None] = None
19+
20+
21+
def upgrade() -> None:
22+
"""Upgrade schema."""
23+
# ### commands auto generated by Alembic - please adjust! ###
24+
op.drop_constraint("unique_collection_name_per_user", "collection", type_="unique")
25+
op.drop_constraint("unique_token_name_per_user", "token", type_="unique")
26+
# ### end Alembic commands ###
27+
28+
29+
def downgrade() -> None:
30+
"""Downgrade schema."""
31+
# ### commands auto generated by Alembic - please adjust! ###
32+
op.create_unique_constraint("unique_token_name_per_user", "token", ["user_id", "name"])
33+
op.create_unique_constraint("unique_collection_name_per_user", "collection", ["user_id", "name"])
34+
# ### end Alembic commands ###

app/endpoints/collections.py

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,16 @@
11
from typing import Union
22

3-
from fastapi import APIRouter, Depends, Path, Query, Request, Response, Security
3+
from fastapi import APIRouter, Body, Depends, Path, Query, Request, Response, Security
44
from fastapi.responses import JSONResponse
55
from sqlalchemy.ext.asyncio import AsyncSession
66

77
from app.helpers import Authorization
8-
from app.schemas.collections import Collection, CollectionRequest, CollectionUpdateRequest, Collections
8+
from app.schemas.collections import Collection, CollectionRequest, Collections, CollectionUpdateRequest
99
from app.sql.session import get_db as get_session
1010
from app.utils.exceptions import CollectionNotFoundException
1111
from app.utils.lifespan import context
1212
from app.utils.variables import ENDPOINT__COLLECTIONS
1313

14-
1514
router = APIRouter()
1615

1716

@@ -101,7 +100,7 @@ async def delete_collections(
101100
async def update_collection(
102101
request: Request,
103102
collection: int = Path(..., description="The collection ID"),
104-
body: CollectionUpdateRequest = None,
103+
body: CollectionUpdateRequest = Body(..., description="The collection to update."),
105104
session: AsyncSession = Depends(get_session),
106105
) -> Response:
107106
"""
@@ -114,9 +113,9 @@ async def update_collection(
114113
session=session,
115114
user_id=request.app.state.user.id,
116115
collection_id=collection,
117-
name=body.name if body else None,
118-
visibility=body.visibility if body else None,
119-
description=body.description if body else None,
116+
name=body.name,
117+
visibility=body.visibility,
118+
description=body.description,
120119
)
121120

122121
return Response(status_code=204)

app/helpers/_authorization.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,9 @@ async def __call__(
5757
if request.url.path.startswith(f"/v1{ENDPOINT__CHAT_COMPLETIONS}") and request.method == "POST":
5858
await self._check_chat_completions_post(user=user, role=role, limits=limits, request=request)
5959

60+
if request.url.path.startswith(f"/v1{ENDPOINT__COLLECTIONS}") and request.method == "PATCH":
61+
await self._check_collections_patch(user=user, role=role, limits=limits, request=request)
62+
6063
if request.url.path.startswith(f"/v1{ENDPOINT__COLLECTIONS}") and request.method == "POST":
6164
await self._check_collections_post(user=user, role=role, limits=limits, request=request)
6265

@@ -180,6 +183,13 @@ async def _check_chat_completions_post(self, user: User, role: Role, limits: Dic
180183
if body.get("search", False):
181184
await self._check_limits(user=user, limits=limits, model=body.get("search_args", {}).get("model", None))
182185

186+
async def _check_collections_patch(self, user: User, role: Role, limits: Dict[str, UserModelLimits], request: Request) -> None:
187+
body = await request.body()
188+
body = json.loads(body)
189+
190+
if body.get("visibility") == CollectionVisibility.PUBLIC and PermissionType.CREATE_PUBLIC_COLLECTION not in role.permissions:
191+
raise InsufficientPermissionException("Missing permission to update collection visibility to public.")
192+
183193
async def _check_collections_post(self, user: User, role: Role, limits: Dict[str, UserModelLimits], request: Request) -> None:
184194
body = await request.body()
185195
body = json.loads(body)

app/helpers/_documentmanager.py

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -180,8 +180,8 @@ async def create_document(self, session: AsyncSession, user_id: int, collection_
180180
except NoResultFound:
181181
raise CollectionNotFoundException()
182182

183-
file_name = file.filename.strip()
184-
file_extension = file_name.rsplit(".", maxsplit=1)[-1]
183+
document_name = file.filename.strip()
184+
file_extension = document_name.rsplit(".", maxsplit=1)[-1]
185185

186186
try:
187187
document = self._parse(file=file, file_extension=file_extension)
@@ -198,19 +198,18 @@ async def create_document(self, session: AsyncSession, user_id: int, collection_
198198
raise ChunkingFailedException(detail=f"Chunking failed: {e}")
199199

200200
result = await session.execute(
201-
statement=insert(table=DocumentTable).values(name=file_name, collection_id=collection_id).returning(DocumentTable.id)
201+
statement=insert(table=DocumentTable).values(name=document_name, collection_id=collection_id).returning(DocumentTable.id)
202202
)
203203
document_id = result.scalar_one()
204204
await session.commit()
205205

206206
client = self.qdrant_model.get_client(endpoint=ENDPOINT__EMBEDDINGS)
207207
for i, chunk in enumerate(chunks):
208-
chunk.metadata["document_part"] = f"{i + 1}/{len(chunks)}"
209208
chunk.metadata["collection_id"] = collection.id
210-
chunk.metadata["collection_name"] = collection.name
211209
chunk.metadata["document_id"] = document_id
212-
chunk.metadata["document_name"] = file_name
210+
chunk.metadata["document_name"] = document_name
213211
chunk.metadata["document_created_at"] = round(time.time())
212+
chunk.metadata["document_part"] = f"{i + 1}/{len(chunks)}"
214213
try:
215214
await self._upsert(chunks=chunks, collection_id=collection_id, model_client=client)
216215
except Exception as e:
@@ -279,7 +278,13 @@ async def delete_document(self, session: AsyncSession, user_id: int, document_id
279278
await self.qdrant.delete(collection_name=str(document.collection_id), points_selector=FilterSelector(filter=filter))
280279

281280
async def get_chunks(
282-
self, session: AsyncSession, user_id: int, document_id: int, chunk_id: Optional[int] = None, offset: int = 0, limit: int = 10
281+
self,
282+
session: AsyncSession,
283+
user_id: int,
284+
document_id: int,
285+
chunk_id: Optional[int] = None,
286+
offset: int = 0,
287+
limit: int = 10,
283288
) -> List[Chunk]:
284289
# check if document exists
285290
result = await session.execute(

app/sql/models.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -95,8 +95,6 @@ class Token(Base):
9595

9696
user = relationship(argument="User", backref=backref(name="token", cascade="all, delete-orphan"))
9797

98-
__table_args__ = (UniqueConstraint("user_id", "name", name="unique_token_name_per_user"),)
99-
10098

10199
class Collection(Base):
102100
__tablename__ = "collection"
@@ -111,8 +109,6 @@ class Collection(Base):
111109

112110
user = relationship(argument="User", backref=backref(name="collection", cascade="all, delete-orphan"))
113111

114-
__table_args__ = (UniqueConstraint("user_id", "name", name="unique_collection_name_per_user"),)
115-
116112

117113
class Document(Base):
118114
__tablename__ = "document"

app/tests/test_collections.py

Lines changed: 23 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
@pytest.mark.usefixtures("client")
1212
class TestCollections:
13-
def test_create_private_collection_with_user(self, client: TestClient):
13+
def test_create_private_collection(self, client: TestClient):
1414
params = {"name": f"test_collection_{str(uuid4())}", "visibility": CollectionVisibility.PRIVATE}
1515
response = client.post_without_permissions(url=f"/v1{ENDPOINT__COLLECTIONS}", json=params)
1616
assert response.status_code == 201, response.text
@@ -30,7 +30,7 @@ def test_create_private_collection_with_user(self, client: TestClient):
3030
assert collection["name"] == params["name"]
3131
assert collection["visibility"] == CollectionVisibility.PRIVATE
3232

33-
def test_get_one_collection_with_user(self, client: TestClient):
33+
def test_get_one_collection(self, client: TestClient):
3434
collection_name = f"test_collection_{str(uuid4())}"
3535
params = {"name": collection_name, "visibility": CollectionVisibility.PRIVATE}
3636
response = client.post_without_permissions(url=f"/v1{ENDPOINT__COLLECTIONS}", json=params)
@@ -44,17 +44,6 @@ def test_get_one_collection_with_user(self, client: TestClient):
4444
collection = response.json()
4545
assert collection["name"] == collection_name
4646

47-
def test_create_private_collection_already_existing_name(self, client: TestClient):
48-
collection_name = f"test_collection_{str(uuid4())}"
49-
params = {"name": collection_name, "visibility": CollectionVisibility.PRIVATE}
50-
response = client.post_without_permissions(url=f"/v1{ENDPOINT__COLLECTIONS}", json=params)
51-
assert response.status_code == 201, response.text
52-
53-
params = {"name": collection_name, "visibility": CollectionVisibility.PRIVATE}
54-
55-
response = client.post_without_permissions(url=f"/v1{ENDPOINT__COLLECTIONS}", json=params)
56-
assert response.status_code == 400, response.text
57-
5847
def test_patch_collection_name(self, client: TestClient):
5948
collection_name = f"test_collection_{str(uuid4())}"
6049
params = {"name": collection_name, "visibility": CollectionVisibility.PRIVATE}
@@ -73,7 +62,7 @@ def test_patch_collection_name(self, client: TestClient):
7362
collection = response.json()
7463
assert collection["name"] == new_collection_name
7564

76-
def test_format_collection_with_user(self, client: TestClient):
65+
def test_format_collection(self, client: TestClient):
7766
params = {"name": f"test_collection_{str(uuid4())}", "visibility": CollectionVisibility.PRIVATE}
7867
response = client.post_without_permissions(url=f"/v1{ENDPOINT__COLLECTIONS}", json=params)
7968
assert response.status_code == 201, response.text
@@ -97,12 +86,17 @@ def test_format_collection_with_user(self, client: TestClient):
9786
collection = response.json()
9887
Collection(**collection) # test output format
9988

100-
def test_create_public_collection_with_user(self, client: TestClient):
89+
def test_create_public_collection_without_permissions(self, client: TestClient):
10190
params = {"name": f"test_collection_{str(uuid4())}", "visibility": CollectionVisibility.PUBLIC}
10291
response = client.post_without_permissions(url=f"/v1{ENDPOINT__COLLECTIONS}", json=params)
10392
assert response.status_code == 403, response.text
10493

105-
def test_create_public_collection_with_admin(self, client: TestClient):
94+
def test_patch_public_collection_without_permissions(self, client: TestClient):
95+
params = {"name": f"test_collection_{str(uuid4())}", "visibility": CollectionVisibility.PUBLIC}
96+
response = client.post_without_permissions(url=f"/v1{ENDPOINT__COLLECTIONS}", json=params)
97+
assert response.status_code == 403, response.text
98+
99+
def test_create_public_collection_with_permissions(self, client: TestClient):
106100
collection_name = f"test_collection_{str(uuid4())}"
107101
params = {"name": collection_name, "visibility": CollectionVisibility.PUBLIC}
108102
response = client.post_with_permissions(url=f"/v1{ENDPOINT__COLLECTIONS}", json=params)
@@ -121,15 +115,17 @@ def test_create_public_collection_with_admin(self, client: TestClient):
121115
assert collection["name"] == collection_name
122116
assert collection["visibility"] == CollectionVisibility.PUBLIC
123117

124-
def test_create_already_existing_collection_with_user(self, client: TestClient):
118+
def test_patch_public_collection_with_permissions(self, client: TestClient):
125119
collection_name = f"test_collection_{str(uuid4())}"
126120
params = {"name": collection_name, "visibility": CollectionVisibility.PRIVATE}
127-
response = client.post_without_permissions(url=f"/v1{ENDPOINT__COLLECTIONS}", json=params)
121+
response = client.post_with_permissions(url=f"/v1{ENDPOINT__COLLECTIONS}", json=params)
128122
assert response.status_code == 201, response.text
129123

130-
params = {"name": collection_name, "visibility": CollectionVisibility.PRIVATE}
131-
response = client.post_without_permissions(url=f"/v1{ENDPOINT__COLLECTIONS}", json=params)
132-
assert response.status_code == 400, response.text
124+
collection_id = response.json()["id"]
125+
126+
params = {"visibility": CollectionVisibility.PUBLIC}
127+
response = client.patch_with_permissions(url=f"/v1{ENDPOINT__COLLECTIONS}/{collection_id}", json=params)
128+
assert response.status_code == 204, response.text
133129

134130
def test_view_collection_of_other_user(self, client: TestClient):
135131
collection_name = f"test-collection_{str(uuid4())}"
@@ -139,12 +135,8 @@ def test_view_collection_of_other_user(self, client: TestClient):
139135

140136
collection_id = response.json()["id"]
141137

142-
response = client.get_without_permissions(url=f"/v1{ENDPOINT__COLLECTIONS}")
143-
collections = response.json()
144-
assert response.status_code == 200, response.text
145-
146-
collections = [collection["id"] for collection in collections["data"] if collection["id"] == collection_id]
147-
assert len(collections) == 0
138+
response = client.get_without_permissions(url=f"/v1{ENDPOINT__COLLECTIONS}/{collection_id}")
139+
assert response.status_code == 404, response.text
148140

149141
def test_view_public_collection_of_other_user(self, client: TestClient):
150142
collection_name = f"test-collection_{str(uuid4())}"
@@ -154,19 +146,10 @@ def test_view_public_collection_of_other_user(self, client: TestClient):
154146

155147
collection_id = response.json()["id"]
156148

157-
response = client.get_without_permissions(url=f"/v1{ENDPOINT__COLLECTIONS}")
158-
collections = response.json()
149+
response = client.get_without_permissions(url=f"/v1{ENDPOINT__COLLECTIONS}/{collection_id}")
159150
assert response.status_code == 200, response.text
160151

161-
collections = [collection for collection in collections["data"] if collection["id"] == collection_id]
162-
assert len(collections) == 1
163-
164-
collection = collections[0]
165-
assert collection["name"] == collection_name
166-
assert collection["owner"] == "test-user-admin"
167-
assert collection["visibility"] == CollectionVisibility.PUBLIC
168-
169-
def test_delete_private_collection_with_user(self, client: TestClient):
152+
def test_delete_private_collection_without_permissions(self, client: TestClient):
170153
collection_name = f"test-collection_{str(uuid4())}"
171154
params = {"name": collection_name, "visibility": CollectionVisibility.PRIVATE}
172155
response = client.post_without_permissions(url=f"/v1{ENDPOINT__COLLECTIONS}", json=params)
@@ -184,7 +167,7 @@ def test_delete_private_collection_with_user(self, client: TestClient):
184167
collections = [collection for collection in collections["data"] if collection["id"] == collection_id]
185168
assert len(collections) == 0
186169

187-
def test_delete_public_collection_with_user(self, client: TestClient):
170+
def test_delete_public_collection_without_permissions(self, client: TestClient):
188171
collection_name = f"test-collection_{str(uuid4())}"
189172
params = {"name": collection_name, "visibility": CollectionVisibility.PUBLIC}
190173
response = client.post_with_permissions(url=f"/v1{ENDPOINT__COLLECTIONS}", json=params)
@@ -199,7 +182,7 @@ def test_delete_public_collection_with_user(self, client: TestClient):
199182
response = client.delete_without_permissions(url=f"/v1{ENDPOINT__COLLECTIONS}/{collection_id}")
200183
assert response.status_code == 404, response.text
201184

202-
def test_delete_public_collection_with_admin(self, client: TestClient):
185+
def test_delete_public_collection_with_permissions(self, client: TestClient):
203186
collection_name = f"test-collection_{str(uuid4())}"
204187
params = {"name": collection_name, "visibility": CollectionVisibility.PUBLIC}
205188
response = client.post_with_permissions(url=f"/v1{ENDPOINT__COLLECTIONS}", json=params)

docs/architecture.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ flowchart LR
2121
2222
subgraph **app/clients**
2323
redisclient[Redis - ConnectionPool]
24-
sqlclient[SQLDatabaseClient]
24+
sqlclient[SQLAlchemy - AsyncSession]
2525
qdrantclient[Qrant - AsyncQdrantClient]
2626
internetclient[BraveInternetClient<br>DuckduckgoInternetClient]
2727
modelclient@{ shape: processes, label: "VllmModelClient<br>TeiModelClient<br>AlbertModelClient<br>OpenaiModelClient" }

docs/assets/deployment_001.png

145 KB
Loading

docs/assets/iam_001.png

1.58 MB
Loading

0 commit comments

Comments
 (0)