This example demonstrates how to create a custom provider plugin that extends LangExtract with your own model backend.
Note: This is an example included in the LangExtract repository for reference. It is not part of the LangExtract package and won't be installed when you pip install langextract.
Automated Creation: Instead of manually copying this example, use the provider plugin generator script:
python scripts/create_provider_plugin.py MyProvider --with-schemaThis will create a complete plugin structure with all boilerplate code ready for customization.
custom_provider_plugin/
├── pyproject.toml # Package configuration and metadata
├── README.md # This file
├── langextract_provider_example/ # Package directory
│ ├── __init__.py # Package initialization
│ ├── provider.py # Custom provider implementation
│ └── schema.py # Custom schema implementation (optional)
└── test_example_provider.py # Test script
@lx.providers.registry.register(
r'^gemini', # Pattern for model IDs this provider handles
)
class CustomGeminiProvider(lx.inference.BaseLanguageModel):
def __init__(self, model_id: str, **kwargs):
# Initialize your backend client
def infer(self, batch_prompts, **kwargs):
# Call your backend API and return results[project.entry-points."langextract.providers"]
custom_gemini = "langextract_provider_example:CustomGeminiProvider"This entry point allows LangExtract to automatically discover your provider.
Providers can optionally implement custom schemas for structured output:
Flow: Examples → from_examples() → to_provider_config() → Provider kwargs → Inference
class CustomProviderSchema(lx.schema.BaseSchema):
@classmethod
def from_examples(cls, examples_data, attribute_suffix="_attributes"):
# Analyze examples to find patterns
# Build schema based on extraction classes and attributes seen
return cls(schema_dict)
def to_provider_config(self):
# Convert schema to provider kwargs
return {
"response_schema": self._schema_dict,
"enable_structured_output": True
}
@property
def supports_strict_mode(self):
# True = valid JSON output, no markdown fences needed
return TrueThen in your provider:
class CustomProvider(lx.inference.BaseLanguageModel):
@classmethod
def get_schema_class(cls):
return CustomProviderSchema # Tell LangExtract about your schema
def __init__(self, **kwargs):
# Receive schema config in kwargs when use_schema_constraints=True
self.response_schema = kwargs.get('response_schema')
def infer(self, batch_prompts, **kwargs):
# Use schema during API calls
if self.response_schema:
config['response_schema'] = self.response_schema# Navigate to this example directory first
cd examples/custom_provider_plugin
# Install in development mode
pip install -e .
# Test the provider (must be run from this directory)
python test_example_provider.pySince this example registers the same pattern as the default Gemini provider, you must explicitly specify it:
import langextract as lx
# Create a configured model with explicit provider selection
config = lx.factory.ModelConfig(
model_id="gemini-2.5-flash",
provider="CustomGeminiProvider",
provider_kwargs={"api_key": "your-api-key"}
)
model = lx.factory.create_model(config)
# Note: Passing model directly to extract() is coming soon.
# For now, use the model's infer() method directly or pass parameters individually:
result = lx.extract(
text_or_documents="Your text here",
model_id="gemini-2.5-flash",
api_key="your-api-key",
prompt_description="Extract key information",
examples=[...]
)
# Coming soon: Direct model passing
# result = lx.extract(
# text_or_documents="Your text here",
# model=model, # Planned feature
# prompt_description="Extract key information"
# )# Copy this example directory
cp -r examples/custom_provider_plugin/ ~/langextract-myprovider/
# Rename the package directory
cd ~/langextract-myprovider/
mv langextract_provider_example langextract_myproviderEdit pyproject.toml:
- Change
name = "langextract-myprovider" - Update description and author information
- Change entry point:
myprovider = "langextract_myprovider:MyProvider"
Edit provider.py:
- Change class name from
CustomGeminiProvidertoMyProvider - Update
@register()patterns to match your model IDs - Replace Gemini API calls with your backend
- Add any provider-specific parameters
Edit schema.py:
- Rename to
MyProviderSchema - Customize
from_examples()for your extraction format - Update
to_provider_config()for your API requirements - Set
supports_strict_modebased on your capabilities
# Install in development mode
pip install -e .
# Test your provider
python -c "
import langextract as lx
lx.providers.load_plugins_once()
print('Provider registered:', any('myprovider' in str(e) for e in lx.providers.registry.list_entries()))
"- Test that your provider loads and handles basic inference
- Verify schema support works (if implemented)
- Test error handling for your specific API
# Build package
python -m build
# Upload to PyPI
twine upload dist/*Share with the community:
- Submit a PR to add your provider to the Community Providers Registry
- Open an issue on LangExtract GitHub to announce your provider and get feedback
- Forgetting to trigger plugin loading - Plugins load lazily, use
load_plugins_once()in tests - Pattern conflicts - Avoid patterns that conflict with built-in providers
- Missing dependencies - List all requirements in
pyproject.toml - Schema mismatches - Test schema generation with real examples
- Not handling None schema - Provider must clear schema when
apply_schema(None)is called (see provider.py for implementation)
Apache License 2.0