Skip to content

Commit 22a20e1

Browse files
perf: replace python-dotenv with custom fast parser - 5x faster cold start
1 parent 14ae092 commit 22a20e1

File tree

6 files changed

+1240
-72
lines changed

6 files changed

+1240
-72
lines changed

README.md

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,9 @@
2323

2424
-**High performance** - Built on msgspec for speed
2525
-**Type-safe** - Full type hints and validation
26-
-**.env support** - Automatic loading from .env files via python-dotenv
26+
-**.env support** - Fast built-in .env parser (no dependencies)
2727
-**Nested settings** - Support for complex configuration structures
28-
-**Minimal dependencies** - Only msgspec and python-dotenv
28+
-**Zero dependencies** - Only msgspec required
2929
-**Familiar API** - Easy to learn if you've used settings libraries before
3030

3131
## Installation
@@ -144,31 +144,31 @@ msgspec-ext provides a **faster, lighter alternative** to pydantic-settings whil
144144

145145
### Performance Comparison
146146

147-
**First-time load** (what you'll see when testing):
147+
**Cold start** (first load, includes .env parsing):
148148

149149
| Library | Time per load | Speed |
150150
|---------|---------------|-------|
151-
| **msgspec-ext** | **1.818ms** | **1.5x faster**|
152-
| pydantic-settings | 2.814ms | Baseline |
151+
| **msgspec-ext** | **0.39ms** | **5.0x faster**|
152+
| pydantic-settings | 1.95ms | Baseline |
153153

154-
**With caching** (repeated loads in long-running applications):
154+
**Warm (cached)** (repeated loads in long-running applications):
155155

156156
| Library | Time per load | Speed |
157157
|---------|---------------|-------|
158-
| **msgspec-ext** | **0.016ms** | **112x faster**|
159-
| pydantic-settings | 1.818ms | Baseline |
158+
| **msgspec-ext** | **0.012ms** | **267x faster**|
159+
| pydantic-settings | 3.2ms | Baseline |
160160

161161
> *Benchmark includes .env file parsing, environment variable loading, type validation, and nested configuration (app settings, database, redis, feature flags). Run `benchmark/benchmark_cold_warm.py` to reproduce.*
162162
163163
### Key Advantages
164164

165165
| Feature | msgspec-ext | pydantic-settings |
166166
|---------|------------|-------------------|
167-
| **First load** | **1.5x faster**| Baseline |
168-
| **Cached loads** | **112x faster**| Baseline |
167+
| **Cold start** | **5.0x faster**| Baseline |
168+
| **Warm (cached)** | **267x faster**| Baseline |
169169
| **Package size** | **0.49 MB** | 1.95 MB |
170-
| **Dependencies** | **2 (minimal)** | 5+ |
171-
| .env support |||
170+
| **Dependencies** | **1 (msgspec only)** | 5+ |
171+
| .env support |Built-in | Via python-dotenv |
172172
| Type validation |||
173173
| Advanced caching |||
174174
| Nested config |||
@@ -179,14 +179,15 @@ msgspec-ext provides a **faster, lighter alternative** to pydantic-settings whil
179179

180180
msgspec-ext achieves its performance through:
181181
- **Bulk validation**: Validates all fields at once in C (via msgspec), not one-by-one in Python
182-
- **Smart caching**: Caches .env files, field mappings, and type information - loads after the first are 112x faster
182+
- **Custom .env parser**: Built-in fast parser with zero external dependencies (no python-dotenv overhead)
183+
- **Smart caching**: Caches .env files, field mappings, and type information - loads after the first are 267x faster
183184
- **Optimized file operations**: Uses fast os.path operations instead of slower pathlib alternatives
184185
- **Zero overhead**: Fast paths for common types (str, bool, int, float) with minimal Python code
185186

186187
This means your application **starts faster** and uses **less memory**, especially important for:
187-
- 🚀 **CLI tools** - 1.5x faster startup every time you run the command
188+
- 🚀 **CLI tools** - 5.0x faster startup every time you run the command
188189
-**Serverless functions** - Lower cold start latency means better response times
189-
- 🔄 **Long-running apps** - After the first load, reloading settings is 112x faster (16 microseconds!)
190+
- 🔄 **Long-running apps** - After the first load, reloading settings is 267x faster (12 microseconds!)
190191

191192
## Contributing
192193

pyproject.toml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,6 @@ authors = [
1414
requires-python = ">=3.10"
1515
dependencies = [
1616
"msgspec>=0.19.0",
17-
"python-dotenv>=1.1.1",
1817
]
1918
classifiers = [
2019
"Development Status :: 4 - Beta",

src/msgspec_ext/fast_dotenv.py

Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
r"""Fast .env file parser - optimized for performance.
2+
3+
Key features:
4+
1. UTF-8 BOM support (\ufeff)
5+
2. Escape sequences parsing (\n, \t, etc)
6+
3. Whitespace preservation inside quotes
7+
4. Strict variable name validation (isidentifier)
8+
5. Robust 'export' keyword support
9+
6. Correct duplicate handling
10+
7. Special symbols in unquoted values
11+
"""
12+
13+
import os
14+
15+
# Global cache
16+
_FILE_CACHE: dict[str, dict[str, str]] = {}
17+
18+
# Optimization constants
19+
_BOM = "\ufeff"
20+
_EXPORT_LEN = 6 # len("export")
21+
22+
23+
def parse_env_file(file_path: str, encoding: str | None = "utf-8") -> dict[str, str]: # noqa: C901, PLR0912
24+
"""Fast .env file parser with production-grade robustness.
25+
26+
Optimized for speed while handling edge cases correctly.
27+
"""
28+
cache_key = f"{file_path}:{encoding}"
29+
if cache_key in _FILE_CACHE:
30+
return _FILE_CACHE[cache_key]
31+
32+
env_vars: dict[str, str] = {}
33+
34+
try:
35+
# 1. Fast read with immediate BOM handling
36+
with open(file_path, encoding=encoding) as f:
37+
content = f.read()
38+
39+
# Remove BOM if present
40+
if content.startswith(_BOM):
41+
content = content[1:]
42+
43+
# Local references for loop speed
44+
_str_strip = str.strip
45+
_str_startswith = str.startswith
46+
47+
for raw_line in content.splitlines():
48+
# Fast initial cleanup
49+
line = _str_strip(raw_line)
50+
51+
if not line or _str_startswith(line, "#"):
52+
continue
53+
54+
# 2. Handle 'export' keyword
55+
# Check if starts with 'export' followed by space (not a var called 'exporter')
56+
if (
57+
_str_startswith(line, "export")
58+
and len(line) > _EXPORT_LEN
59+
and line[_EXPORT_LEN].isspace()
60+
):
61+
line = line[_EXPORT_LEN:].lstrip()
62+
63+
# 3. Atomic partition
64+
key, sep, value = line.partition("=")
65+
66+
if not sep:
67+
continue
68+
69+
key = key.strip()
70+
71+
# 4. Variable name validation
72+
# isidentifier() is implemented in C and covers:
73+
# - Not starting with number
74+
# - Only alphanumerics and underscore
75+
# - No hyphens (bash compliant)
76+
if not key.isidentifier():
77+
continue
78+
79+
# 5. Value parsing
80+
if not value:
81+
env_vars[key] = ""
82+
continue
83+
84+
quote = value[0] if value else ""
85+
86+
# Quote handling logic
87+
if quote in ('"', "'"):
88+
# Check if quote closes (ignore orphaned quotes)
89+
if value.endswith(quote) and len(value) > 1:
90+
# Extract content
91+
val_content = value[1:-1]
92+
93+
# Double quotes: Support escape sequences
94+
if quote == '"':
95+
# Decode common escapes
96+
# Manual replace is faster than codecs.decode('unicode_escape') for this subset
97+
if "\\" in val_content:
98+
val_content = (
99+
val_content.replace("\\n", "\n")
100+
.replace("\\r", "\r")
101+
.replace("\\t", "\t")
102+
.replace('\\"', '"')
103+
.replace("\\\\", "\\")
104+
)
105+
# Single quotes: Minimal escape processing
106+
elif quote == "'":
107+
# Only unescape single quote itself if needed
108+
if "\\'" in val_content:
109+
val_content = val_content.replace("\\'", "'")
110+
111+
env_vars[key] = val_content
112+
else:
113+
# Broken or unclosed quotes -> Treat as unquoted string
114+
env_vars[key] = value.strip()
115+
else:
116+
# Unquoted value - Preserve leading spaces but allow inline comments
117+
# Do NOT remove leading spaces to preserve intentionality
118+
119+
# Remove inline comments (e.g., VAL=123 # id)
120+
if "#" in value:
121+
# Only partition if # exists to avoid overhead
122+
value = value.partition("#")[0]
123+
124+
# Remove trailing whitespace only at the end
125+
env_vars[key] = value.rstrip()
126+
127+
except FileNotFoundError:
128+
pass
129+
except Exception: # noqa: S110
130+
# In critical production, logging would be ideal, but keeping interface clean
131+
pass
132+
133+
_FILE_CACHE[cache_key] = env_vars
134+
return env_vars
135+
136+
137+
def load_dotenv(
138+
dotenv_path: str | None = ".env",
139+
encoding: str | None = "utf-8",
140+
*,
141+
override: bool = False,
142+
) -> bool:
143+
"""Load environment variables from .env file into os.environ.
144+
145+
Args:
146+
dotenv_path: Path to .env file (default: ".env")
147+
encoding: File encoding (default: "utf-8")
148+
override: Whether to override existing environment variables (default: False)
149+
150+
Returns:
151+
True if file was loaded successfully, False otherwise
152+
"""
153+
try:
154+
env_vars = parse_env_file(dotenv_path, encoding)
155+
156+
if not env_vars:
157+
return False # Empty or invalid file
158+
159+
if override:
160+
# Override all variables from file
161+
os.environ.update(env_vars)
162+
else:
163+
# Preserve existing environment variables
164+
# Direct iteration is faster than sets for small/medium dicts
165+
environ = os.environ
166+
for key, value in env_vars.items():
167+
if key not in environ:
168+
environ[key] = value
169+
170+
return True
171+
except Exception:
172+
return False

src/msgspec_ext/settings.py

Lines changed: 3 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@
44
from typing import Any, ClassVar, Union, get_args, get_origin
55

66
import msgspec
7-
from dotenv import load_dotenv
7+
8+
from msgspec_ext.fast_dotenv import load_dotenv
89

910
__all__ = ["BaseSettings", "SettingsConfigDict"]
1011

@@ -305,7 +306,7 @@ def _get_env_name(cls, field_name: str) -> str:
305306
return env_name
306307

307308
@classmethod
308-
def _preprocess_env_value(cls, env_value: str, field_type: type) -> Any: # noqa: C901, PLR0912
309+
def _preprocess_env_value(cls, env_value: str, field_type: type) -> Any: # noqa: C901
309310
"""Convert environment variable string to JSON-compatible type.
310311
311312
Ultra-optimized to minimize type introspection overhead with caching.
@@ -356,43 +357,3 @@ def _preprocess_env_value(cls, env_value: str, field_type: type) -> Any: # noqa
356357
return cls._preprocess_env_value(env_value, resolved_type)
357358

358359
return env_value
359-
360-
# Fast path: Direct type comparison (avoid get_origin when possible)
361-
if field_type is str:
362-
return env_value
363-
if field_type is bool:
364-
return env_value.lower() in ("true", "1", "yes", "y", "t")
365-
if field_type is int:
366-
try:
367-
return int(env_value)
368-
except ValueError as e:
369-
raise ValueError(f"Cannot convert '{env_value}' to int") from e
370-
if field_type is float:
371-
try:
372-
return float(env_value)
373-
except ValueError as e:
374-
raise ValueError(f"Cannot convert '{env_value}' to float") from e
375-
376-
# Only use typing introspection for complex types (Union, Optional, etc.)
377-
origin = get_origin(field_type)
378-
if origin is Union:
379-
args = get_args(field_type)
380-
non_none = [a for a in args if a is not type(None)]
381-
if non_none:
382-
# Cache the resolved type for future use
383-
resolved_type = non_none[0]
384-
cls._type_cache[field_type] = resolved_type
385-
# Recursively process with the non-None type
386-
return cls._preprocess_env_value(env_value, resolved_type)
387-
388-
return env_value
389-
390-
# Type conversion (required for JSON encoding)
391-
if field_type is bool:
392-
return env_value.lower() in ("true", "1", "yes", "y", "t")
393-
if field_type is int:
394-
return int(env_value)
395-
if field_type is float:
396-
return float(env_value)
397-
398-
return env_value

0 commit comments

Comments
 (0)