Skip to content

Commit a533f07

Browse files
authored
improvement: add ARGILLA_DATABASE_POSTGRESQL_POOL_SIZE and ARGILLA_DATABASE_POSTGRESQL_MAX_OVERFLOW (#5220)
# Description After testing a high number of concurrent requests using PostgreSQL I received the following error: ``` QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30.00 ``` This PR add the following two environment variables so we can configure the pool size and max overflow. Refs #5000 **Type of change** - Improvement (change adding some improvement to an existing functionality) **How Has This Been Tested** - [x] Manually testing with PostgreSQL. **Checklist** - I added relevant documentation - I followed the style guidelines of this project - I did a self-review of my code - I made corresponding changes to the documentation - I confirm My changes generate no new warnings - I have added tests that prove my fix is effective or that my feature works - I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)
1 parent 9daefee commit a533f07

File tree

6 files changed

+80
-13
lines changed

6 files changed

+80
-13
lines changed

argilla-server/CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@ These are the section headers that we use:
1919
### Added
2020

2121
- Added new `ARGILLA_DATABASE_SQLITE_TIMEOUT` environment variable allowing to set transactions timeout for SQLite. ([#5213](https://github.com/argilla-io/argilla/pull/5213))
22+
- Added new `ARGILLA_DATABASE_POSTGRESQL_POOL_SIZE` environment variable allowing to set the number of connections to keep open inside the database connection pool. ([#5220](https://github.com/argilla-io/argilla/pull/5220))
23+
- Added new `ARGILLA_DATABASE_POSTGRESQL_MAX_OVERFLOW` environment variable allowing to set the number of connections that can be opened above and beyond the `ARGILLA_DATABASE_POSTGRESQL_POOL_SIZE` setting. ([#5220](https://github.com/argilla-io/argilla/pull/5220))
2224

2325
### Fixed
2426

argilla-server/src/argilla_server/constants.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
WORKSPACE_HEADER_NAME = "X-Argilla-Workspace"
1717

1818
DATABASE_SQLITE = "sqlite"
19+
DATABASE_POSTGRESQL = "postgresql"
1920

2021
SEARCH_ENGINE_ELASTICSEARCH = "elasticsearch"
2122
SEARCH_ENGINE_OPENSEARCH = "opensearch"
@@ -26,6 +27,9 @@
2627

2728
DEFAULT_DATABASE_SQLITE_TIMEOUT = 15
2829

30+
DEFAULT_DATABASE_POSTGRESQL_POOL_SIZE = 15
31+
DEFAULT_DATABASE_POSTGRESQL_MAX_OVERFLOW = 10
32+
2933
DEFAULT_MAX_KEYWORD_LENGTH = 128
3034
DEFAULT_TELEMETRY_KEY = "C6FkcaoCbt78rACAgvyBxGBcMB3dM3nn"
3135

argilla-server/src/argilla_server/database.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,8 @@ def set_sqlite_pragma(dbapi_connection, connection_record):
5151
cursor.close()
5252

5353

54-
async_engine = create_async_engine(settings.database_url, connect_args=settings.database_connect_args)
54+
async_engine = create_async_engine(settings.database_url, **settings.database_engine_args)
55+
5556
AsyncSessionLocal = async_sessionmaker(autocommit=False, expire_on_commit=False, bind=async_engine)
5657

5758

argilla-server/src/argilla_server/settings.py

Lines changed: 34 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,13 +27,16 @@
2727

2828
from argilla_server.constants import (
2929
DATABASE_SQLITE,
30+
DATABASE_POSTGRESQL,
3031
DEFAULT_LABEL_SELECTION_OPTIONS_MAX_ITEMS,
3132
DEFAULT_MAX_KEYWORD_LENGTH,
3233
DEFAULT_SPAN_OPTIONS_MAX_ITEMS,
3334
DEFAULT_TELEMETRY_KEY,
3435
DEFAULT_DATABASE_SQLITE_TIMEOUT,
3536
SEARCH_ENGINE_ELASTICSEARCH,
3637
SEARCH_ENGINE_OPENSEARCH,
38+
DEFAULT_DATABASE_POSTGRESQL_POOL_SIZE,
39+
DEFAULT_DATABASE_POSTGRESQL_MAX_OVERFLOW,
3740
)
3841
from argilla_server.pydantic_v1 import BaseSettings, Field, root_validator, validator
3942

@@ -75,7 +78,19 @@ class Settings(BaseSettings):
7578

7679
home_path: Optional[str] = Field(description="The home path where argilla related files will be stored")
7780
base_url: Optional[str] = Field(description="The default base url where server will be deployed")
81+
7882
database_url: Optional[str] = Field(description="The database url that argilla will use as data store")
83+
# https://docs.sqlalchemy.org/en/20/core/engines.html#sqlalchemy.create_engine.params.pool_size
84+
database_postgresql_pool_size: Optional[int] = Field(
85+
default=DEFAULT_DATABASE_POSTGRESQL_POOL_SIZE,
86+
description="The number of connections to keep open inside the database connection pool",
87+
)
88+
# https://docs.sqlalchemy.org/en/20/core/engines.html#sqlalchemy.create_engine.params.max_overflow
89+
database_postgresql_max_overflow: Optional[int] = Field(
90+
default=DEFAULT_DATABASE_POSTGRESQL_MAX_OVERFLOW,
91+
description="The number of connections that can be opened above and beyond the pool_size setting",
92+
)
93+
# https://docs.python.org/3/library/sqlite3.html#sqlite3.connect
7994
database_sqlite_timeout: Optional[int] = Field(
8095
default=DEFAULT_DATABASE_SQLITE_TIMEOUT,
8196
description="SQLite database connection timeout in seconds",
@@ -227,9 +242,19 @@ def old_dataset_records_index_name(self) -> str:
227242
return index_name.replace("<NAMESPACE>", f".{ns}")
228243

229244
@property
230-
def database_connect_args(self) -> Dict:
245+
def database_engine_args(self) -> Dict:
231246
if self.database_is_sqlite:
232-
return {"timeout": self.database_sqlite_timeout}
247+
return {
248+
"connect_args": {
249+
"timeout": self.database_sqlite_timeout,
250+
},
251+
}
252+
253+
if self.database_is_postgresql:
254+
return {
255+
"pool_size": self.database_postgresql_pool_size,
256+
"max_overflow": self.database_postgresql_max_overflow,
257+
}
233258

234259
return {}
235260

@@ -240,6 +265,13 @@ def database_is_sqlite(self) -> bool:
240265

241266
return self.database_url.lower().startswith(DATABASE_SQLITE)
242267

268+
@property
269+
def database_is_postgresql(self) -> bool:
270+
if self.database_url is None:
271+
return False
272+
273+
return self.database_url.lower().startswith(DATABASE_POSTGRESQL)
274+
243275
@property
244276
def search_engine_is_elasticsearch(self) -> bool:
245277
return self.search_engine == SEARCH_ENGINE_ELASTICSEARCH

argilla-server/tests/unit/commons/test_settings.py

Lines changed: 30 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -59,16 +59,6 @@ def test_settings_default_database_url(monkeypatch):
5959
assert settings.database_url == f"sqlite+aiosqlite:///{settings.home_path}/argilla.db?check_same_thread=False"
6060

6161

62-
def test_settings_database_sqlite_timeout(monkeypatch):
63-
monkeypatch.setenv("ARGILLA_DATABASE_SQLITE_TIMEOUT", "3")
64-
65-
assert Settings().database_sqlite_timeout == 3
66-
67-
68-
def test_settings_default_database_sqlite_timeout():
69-
assert Settings().database_sqlite_timeout == 15
70-
71-
7262
@pytest.mark.parametrize(
7363
"url, expected_url",
7464
[
@@ -82,3 +72,33 @@ def test_settings_database_url(url: str, expected_url: str, monkeypatch):
8272
monkeypatch.setenv("ARGILLA_DATABASE_URL", url)
8373

8474
assert Settings().database_url == expected_url
75+
76+
77+
def test_settings_default_database_sqlite_timeout():
78+
assert Settings().database_sqlite_timeout == 15
79+
80+
81+
def test_settings_database_sqlite_timeout(monkeypatch):
82+
monkeypatch.setenv("ARGILLA_DATABASE_SQLITE_TIMEOUT", "3")
83+
84+
assert Settings().database_sqlite_timeout == 3
85+
86+
87+
def test_settings_default_database_postgresql_pool_size():
88+
assert Settings().database_postgresql_pool_size == 15
89+
90+
91+
def test_settings_database_postgresql_pool_size(monkeypatch):
92+
monkeypatch.setenv("ARGILLA_DATABASE_POSTGRESQL_POOL_SIZE", "42")
93+
94+
assert Settings().database_postgresql_pool_size == 42
95+
96+
97+
def test_settings_default_database_postgresql_max_overflow():
98+
assert Settings().database_postgresql_max_overflow == 10
99+
100+
101+
def test_settings_database_postgresql_max_overflow(monkeypatch):
102+
monkeypatch.setenv("ARGILLA_DATABASE_POSTGRESQL_MAX_OVERFLOW", "12")
103+
104+
assert Settings().database_postgresql_max_overflow == 12

docs/_source/getting_started/installation/configurations/server_configuration.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,14 @@ The following environment variables are useful only when SQLite is used:
7373

7474
- `ARGILLA_DATABASE_SQLITE_TIMEOUT`: How many seconds the connection should wait before raising an `OperationalError` when a table is locked. If another connection opens a transaction to modify a table, that table will be locked until the transaction is committed. (Defaut: `15` seconds).
7575

76+
#### PostgreSQL
77+
78+
The following environment variables are useful only when PostgreSQL is used:
79+
80+
- `ARGILLA_DATABASE_POSTGRESQL_POOL_SIZE`: The number of connections to keep open inside the database connection pool (Default: `15`).
81+
82+
- `ARGILLA_DATABASE_POSTGRESQL_MAX_OVERFLOW`: The number of connections that can be opened above and beyond `ARGILLA_DATABASE_POSTGRESQL_POOL_SIZE` setting (Default: `10`).
83+
7684
#### Elasticsearch and Opensearch
7785

7886
- `ARGILLA_ELASTICSEARCH`: URL of the connection endpoint of the Elasticsearch instance (Default: `http://localhost:9200`).

0 commit comments

Comments
 (0)