Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
3bac39b
🔧(compose) configure external network for communication with search
sampaccoud Jul 23, 2025
b04ee70
✨(backend) add dummy content to demo documents
sampaccoud Aug 6, 2025
c04fc7d
✨(backend) add document search indexer
sampaccoud Jul 24, 2025
1268bb2
✨(backend) add async triggers to enable document indexation with find
sampaccoud Aug 6, 2025
65911d8
🔧(compose) Add some ignore for docker-compose local overrides
joehybird Aug 13, 2025
a050557
✨(backend) add unit test for the 'index' command
joehybird Aug 13, 2025
add57c6
✨(backend) add document search view
joehybird Aug 13, 2025
f9e2d28
✨(backend) improve search indexer service configuration
joehybird Sep 11, 2025
49213ca
✨(backend) refactor indexation signals and fix circular import issues
joehybird Sep 12, 2025
3d455a7
✨(backend) add fallback search & default ordering
joehybird Sep 17, 2025
a3afa74
✨(backend) Index partially empty documents
joehybird Sep 22, 2025
5bcdd4c
✨(backend) Index deleted documents
joehybird Sep 24, 2025
7cd276f
🔧(backend) force a valid key for token storage in development mode
joehybird Oct 1, 2025
108b100
🔧(backend) setup Docs app dockers to work with Find
joehybird Oct 6, 2025
fa31d31
🔧(backend) force a valid key for token storage in development mode
joehybird Oct 7, 2025
812885c
✨(backend) some refactor of indexer classes & modules
joehybird Oct 7, 2025
2b92d41
✨(backend) throttle indexation tasks instead of debounce (simplier)
joehybird Oct 14, 2025
bc01b05
WIP 💩(front) hack to use the fulltext search api
joehybird Oct 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitguardian.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
secret:
ignored_matches:
- name:
match: "na1hhus-OLhq9mb9SO3R-8E4dONuMnqpZSY_SX8xcFk="
version: 2
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ venv.bak/
env.d/development/*.local
env.d/terraform

# Docker
compose.override.yml
docker/auth/*.local

# npm
node_modules

Expand Down
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ and this project adheres to
- ♿ update labels and shared document icon accessibility #1442
- 🍱(frontend) Fonts GDPR compliants #1453
- ♻️(service-worker) improve SW registration and update handling #1473
- ✨(backend) add async indexation of documents on save (or access save) #1276
- ✨(backend) add debounce mechanism to limit indexation jobs #1276
- ✨(api) add API route to search for indexed documents in Find #1276

### Fixed

Expand Down
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,10 @@ demo: ## flush db then create a demo for load testing purpose
@$(MANAGE) create_demo
.PHONY: demo

index: ## index all documents to remote search
@$(MANAGE) index
.PHONY: index

# Nota bene: Black should come after isort just in case they don't agree...
lint: ## lint back-end python sources
lint: \
Expand Down
6 changes: 6 additions & 0 deletions bin/fernetkey
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/usr/bin/env bash

# shellcheck source=bin/_config.sh
source "$(dirname "${BASH_SOURCE[0]}")/_config.sh"

_dc_run app-dev python -c 'from cryptography.fernet import Fernet;import sys; sys.stdout.write("\n" + Fernet.generate_key().decode() + "\n");'
18 changes: 18 additions & 0 deletions compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,11 @@ services:
- env.d/development/postgresql.local
ports:
- "8071:8000"
networks:
default: {}
lasuite-net:
aliases:
- impress
volumes:
- ./src/backend:/app
- ./data/static:/data/static
Expand All @@ -92,6 +97,9 @@ services:
command: ["celery", "-A", "impress.celery_app", "worker", "-l", "DEBUG"]
environment:
- DJANGO_CONFIGURATION=Development
networks:
- default
- lasuite-net
env_file:
- env.d/development/common
- env.d/development/common.local
Expand All @@ -107,6 +115,11 @@ services:
image: nginx:1.25
ports:
- "8083:8083"
networks:
default: {}
lasuite-net:
aliases:
- nginx
volumes:
- ./docker/files/etc/nginx/conf.d:/etc/nginx/conf.d:ro
depends_on:
Expand Down Expand Up @@ -217,3 +230,8 @@ services:
kc_postgresql:
condition: service_healthy
restart: true

networks:
lasuite-net:
name: lasuite-net
driver: bridge
16 changes: 15 additions & 1 deletion env.d/development/common
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,14 @@ LOGOUT_REDIRECT_URL=http://localhost:3000
OIDC_REDIRECT_ALLOWED_HOSTS=["http://localhost:8083", "http://localhost:3000"]
OIDC_AUTH_REQUEST_EXTRA_PARAMS={"acr_values": "eidas1"}

# Store OIDC tokens in the session
OIDC_STORE_ACCESS_TOKEN = True
OIDC_STORE_REFRESH_TOKEN = True # Store the encrypted refresh token in the session.

# Must be a valid Fernet key (32 url-safe base64-encoded bytes)
# To create one, use the bin/fernetkey command.
OIDC_STORE_REFRESH_TOKEN_KEY = "na1hhus-OLhq9mb9SO3R-8E4dONuMnqpZSY_SX8xcFk="

# AI
AI_FEATURE_ENABLED=true
AI_BASE_URL=https://openaiendpoint.com
Expand All @@ -68,4 +76,10 @@ Y_PROVIDER_API_BASE_URL=http://y-provider-development:4444/api/
Y_PROVIDER_API_KEY=yprovider-api-key

# Theme customization
THEME_CUSTOMIZATION_CACHE_TIMEOUT=15
THEME_CUSTOMIZATION_CACHE_TIMEOUT=15

# Indexer
SEARCH_INDEXER_CLASS="core.services.search_indexers.SearchIndexer"
SEARCH_INDEXER_SECRET=find-api-key-for-docs-with-exactly-50-chars-length # Key generated by create_demo in Find app.
SEARCH_INDEXER_URL="http://find:8000/api/v1.0/documents/index/"
SEARCH_INDEXER_QUERY_URL="http://find:8000/api/v1.0/documents/search/"
10 changes: 10 additions & 0 deletions src/backend/core/api/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -888,3 +888,13 @@ class MoveDocumentSerializer(serializers.Serializer):
choices=enums.MoveNodePositionChoices.choices,
default=enums.MoveNodePositionChoices.LAST_CHILD,
)


class SearchDocumentSerializer(serializers.Serializer):
"""Serializer for fulltext search requests through Find application"""

q = serializers.CharField(required=True, allow_blank=False, trim_whitespace=True)
page_size = serializers.IntegerField(
required=False, min_value=1, max_value=50, default=20
)
page = serializers.IntegerField(required=False, min_value=1, default=1)
78 changes: 78 additions & 0 deletions src/backend/core/api/viewsets.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
from django.db.models.functions import Left, Length
from django.http import Http404, StreamingHttpResponse
from django.urls import reverse
from django.utils.decorators import method_decorator
from django.utils.functional import cached_property
from django.utils.text import capfirst, slugify
from django.utils.translation import gettext_lazy as _
Expand All @@ -31,6 +32,7 @@
from csp.constants import NONE
from csp.decorators import csp_update
from lasuite.malware_detection import malware_detection
from lasuite.oidc_login.decorators import refresh_oidc_access_token
from rest_framework import filters, status, viewsets
from rest_framework import response as drf_response
from rest_framework.permissions import AllowAny
Expand All @@ -47,6 +49,10 @@
from core.services.converter_services import (
YdocConverter,
)
from core.services.search_indexers import (
get_document_indexer,
get_visited_document_ids_of,
)
from core.tasks.mail import send_ask_for_access_mail
from core.utils import extract_attachments, filter_descendants

Expand Down Expand Up @@ -373,6 +379,7 @@ class DocumentViewSet(
list_serializer_class = serializers.ListDocumentSerializer
trashbin_serializer_class = serializers.ListDocumentSerializer
tree_serializer_class = serializers.ListDocumentSerializer
search_serializer_class = serializers.ListDocumentSerializer

def get_queryset(self):
"""Get queryset performing all annotation and filtering on the document tree structure."""
Expand Down Expand Up @@ -1044,6 +1051,77 @@ def duplicate(self, request, *args, **kwargs):
{"id": str(duplicated_document.id)}, status=status.HTTP_201_CREATED
)

def _simple_search_queryset(self, params):
"""
Returns a queryset filtered by the content of the document title
"""
text = params.validated_data["q"]

# As the 'list' view we get a prefiltered queryset (deleted docs are excluded)
queryset = self.get_queryset()
filterset = DocumentFilter({"title": text}, queryset=queryset)

if not filterset.is_valid():
raise drf.exceptions.ValidationError(filterset.errors)

return filterset.filter_queryset(queryset)

def _fulltext_search_queryset(self, indexer, token, user, params):
"""
Returns a queryset from the results the fulltext search of Find
"""
text = params.validated_data["q"]
queryset = models.Document.objects.all()

# Retrieve the documents ids from Find.
results = indexer.search(
text=text,
token=token,
visited=get_visited_document_ids_of(queryset, user),
page=params.validated_data.get("page", 1),
page_size=params.validated_data.get("page_size", 20),
)

return queryset.filter(pk__in=results)

@drf.decorators.action(detail=False, methods=["get"], url_path="search")
@method_decorator(refresh_oidc_access_token)
def search(self, request, *args, **kwargs):
"""
Returns a DRF response containing the filtered, annotated and ordered document list.

Applies filtering based on request parameter 'q' from `SearchDocumentSerializer`.
Depending of the configuration it can be:
- A fulltext search through the opensearch indexation app "find" if the backend is
enabled (see SEARCH_INDEXER_CLASS)
- A filtering by the model field 'title'.

The ordering is always by the most recent first.
"""
access_token = request.session.get("oidc_access_token")
user = request.user

params = serializers.SearchDocumentSerializer(data=request.query_params)
params.is_valid(raise_exception=True)

indexer = get_document_indexer()

if indexer:
queryset = self._fulltext_search_queryset(
indexer, token=access_token, user=user, params=params
)
else:
# The indexer is not configured, we fallback on a simple icontains filter by the
# model field 'title'.
queryset = self._simple_search_queryset(params)

return self.get_response_for_queryset(
queryset.order_by("-updated_at"),
context={
"request": request,
},
)

@drf.decorators.action(detail=True, methods=["get"], url_path="versions")
def versions_list(self, request, *args, **kwargs):
"""
Expand Down
22 changes: 15 additions & 7 deletions src/backend/core/apps.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,19 @@
"""Impress Core application"""
# from django.apps import AppConfig
# from django.utils.translation import gettext_lazy as _

from django.apps import AppConfig
from django.utils.translation import gettext_lazy as _

# class CoreConfig(AppConfig):
# """Configuration class for the impress core app."""

# name = "core"
# app_label = "core"
# verbose_name = _("impress core application")
class CoreConfig(AppConfig):
"""Configuration class for the impress core app."""

name = "core"
app_label = "core"
verbose_name = _("Impress core application")

def ready(self):
"""
Import signals when the app is ready.
"""
# pylint: disable=import-outside-toplevel, unused-import
from . import signals # noqa: PLC0415
40 changes: 40 additions & 0 deletions src/backend/core/management/commands/index.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
"""
Handle search setup that needs to be done at bootstrap time.
"""

import logging
import time

from django.core.management.base import BaseCommand, CommandError

from core.services.search_indexers import get_document_indexer

logger = logging.getLogger("docs.search.bootstrap_search")


class Command(BaseCommand):
"""Index all documents to remote search service"""

help = __doc__

def handle(self, *args, **options):
"""Launch and log search index generation."""
indexer = get_document_indexer()

if not indexer:
raise CommandError("The indexer is not enabled or properly configured.")

logger.info("Starting to regenerate Find index...")
start = time.perf_counter()

try:
count = indexer.index()
except Exception as err:
raise CommandError("Unable to regenerate index") from err

duration = time.perf_counter() - start
logger.info(
"Search index regenerated from %d document(s) in %.2f seconds.",
count,
duration,
)
47 changes: 25 additions & 22 deletions src/backend/core/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -430,32 +430,35 @@ def __init__(self, *args, **kwargs):
def save(self, *args, **kwargs):
"""Write content to object storage only if _content has changed."""
super().save(*args, **kwargs)

if self._content:
file_key = self.file_key
bytes_content = self._content.encode("utf-8")
self.save_content(self._content)

# Attempt to directly check if the object exists using the storage client.
try:
response = default_storage.connection.meta.client.head_object(
Bucket=default_storage.bucket_name, Key=file_key
)
except ClientError as excpt:
# If the error is a 404, the object doesn't exist, so we should create it.
if excpt.response["Error"]["Code"] == "404":
has_changed = True
else:
raise
def save_content(self, content):
"""Save content to object storage."""

file_key = self.file_key
bytes_content = content.encode("utf-8")

# Attempt to directly check if the object exists using the storage client.
try:
response = default_storage.connection.meta.client.head_object(
Bucket=default_storage.bucket_name, Key=file_key
)
except ClientError as excpt:
# If the error is a 404, the object doesn't exist, so we should create it.
if excpt.response["Error"]["Code"] == "404":
has_changed = True
else:
# Compare the existing ETag with the MD5 hash of the new content.
has_changed = (
response["ETag"].strip('"')
!= hashlib.md5(bytes_content).hexdigest() # noqa: S324
)
raise
else:
# Compare the existing ETag with the MD5 hash of the new content.
has_changed = (
response["ETag"].strip('"') != hashlib.md5(bytes_content).hexdigest() # noqa: S324
)

if has_changed:
content_file = ContentFile(bytes_content)
default_storage.save(file_key, content_file)
if has_changed:
content_file = ContentFile(bytes_content)
default_storage.save(file_key, content_file)

def is_leaf(self):
"""
Expand Down
Loading
Loading