Skip to content

Commit 1ed7393

Browse files
authored
[PERF] Improve persistence storage performance (#5217)
# Pull Request Template <!-- Please include a summary of the changes and the related issue. Please also include relevant motivation and context. List any dependencies that are required for this change. --> Closes #<issue_number> **Type of change** <!-- Please delete options that are not relevant. Remember to title the PR according to the type of change --> - Bug fix (non-breaking change which fixes an issue) - New feature (non-breaking change which adds functionality) - Breaking change (fix or feature that would cause existing functionality to not work as expected) - Refactor (change restructuring the codebase without changing functionality) - Improvement (change adding some improvement to an existing functionality) - Documentation update **How Has This Been Tested** <!-- Please add some reference about how your feature has been tested. --> **Checklist** <!-- Please go over the list and make sure you've taken everything into account --> - I added relevant documentation - I followed the style guidelines of this project - I did a self-review of my code - I made corresponding changes to the documentation - I confirm My changes generate no new warnings - I have added tests that prove my fix is effective or that my feature works - I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)
1 parent ed5825b commit 1ed7393

File tree

4 files changed

+11
-18
lines changed

4 files changed

+11
-18
lines changed

argilla-server/docker/quickstart/Dockerfile

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,12 @@ FROM ${ARGILLA_SERVER_IMAGE}:${ARGILLA_VERSION}
55

66
USER root
77

8+
# Copy Argilla distribution files
9+
COPY scripts/start_quickstart_argilla.sh /home/argilla
10+
COPY scripts/start_argilla_server.sh /home/argilla
11+
COPY Procfile /home/argilla
12+
COPY requirements.txt /packages/requirements.txt
13+
814
RUN apt-get update && apt-get install -y \
915
apt-transport-https \
1016
gnupg \
@@ -16,12 +22,6 @@ RUN wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | gpg --dearmo
1622
# Add Elasticsearch repository
1723
RUN echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | tee /etc/apt/sources.list.d/elastic-8.x.list
1824

19-
# Copy Argilla distribution files
20-
COPY scripts/start_quickstart_argilla.sh /home/argilla
21-
COPY scripts/start_argilla_server.sh /home/argilla
22-
COPY Procfile /home/argilla
23-
COPY requirements.txt /packages/requirements.txt
24-
2525
RUN \
2626
# Indicate that this is a quickstart deployment
2727
echo -e "{ \"deployment\": \"quickstart\" }" > /opt/venv/lib/python3.10/site-packages/argilla_server/static/deployment.json && \
@@ -70,6 +70,4 @@ ENV ARGILLA_WORKSPACE=$ADMIN_USERNAME
7070

7171
ENV UVICORN_PORT=6900
7272

73-
ENV REINDEX_DATASETS=0
74-
7573
CMD ["/bin/bash", "start_quickstart_argilla.sh"]

argilla-server/docker/quickstart/README.md

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -132,8 +132,3 @@ This will run the latest quickstart docker image with 3 users `owner`, `admin`,
132132
- `ANNOTATOR_PASSWORD`: This sets a custom password for login into the app with the `argilla` username. The default password
133133
is `12345678`. By setting up a custom password you can use your own password to login into the app.
134134
- `ARGILLA_WORKSPACE`: The name of a workspace that will be created and used by default for admin and annotator users. The default value will be the one defined by `ADMIN_USERNAME` environment variable.
135-
- `LOAD_DATASETS`: This variables will allow you to load sample datasets. The default value will be `full`. The
136-
supported values for this variable is as follows:
137-
1. `single`: Load single datasets for Feedback task.
138-
2. `full`: Load all the sample datasets for NLP tasks (Feedback, TokenClassification, TextClassification, Text2Text)
139-
3. `none`: No datasets being loaded.

argilla-server/docker/quickstart/config/elasticsearch.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
cluster.name: "docker-cluster"
22
network.host: 0.0.0.0
3-
path.data: "/data/elasticsearch"
3+
path.data: "/usr/share/elasticsearch/data"
4+
path.logs: "/usr/share/elasticsearch/logs"
45
discovery.type: single-node
56
xpack.security.enabled: false
67
xpack.security.transport.ssl.enabled: false

argilla-server/docker/quickstart/scripts/start_argilla_server.sh

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,9 @@ python -m argilla_server database users create \
3131
--role annotator \
3232
--workspace "$ARGILLA_WORKSPACE"
3333

34-
if [ "$REINDEX_DATASETS" == "true" ] || [ "$REINDEX_DATASETS" == "1" ]; then
35-
echo "Reindexing existing datasets"
36-
python -m argilla_server search-engine reindex
37-
fi
34+
# Forcing reindex on restart since elasticsearch data could be allocated in a non-persistent volume
35+
echo "Reindexing existing datasets"
36+
python -m argilla_server search-engine reindex
3837

3938
# Start Argilla
4039
echo "Starting Argilla"

0 commit comments

Comments
 (0)