Semantic operators provide a high-level, declarative API for performing common data transformation tasks using natural language. Inspired by LOTUS-style semantic operations, these operators enable you to work with structured and unstructured data using LLM-powered transformations.
Agentics semantic operators bridge the gap between traditional data manipulation (like pandas operations) and LLM-powered semantic understanding. Each operator accepts either an AG (Agentics) or a pandas DataFrame as input and returns the same type, making them easy to integrate into existing data pipelines.
| Operator | Description |
|---|---|
sem_map |
Map each record using a natural language instruction |
sem_filter |
Keep records that match a natural language predicate |
sem_agg |
Aggregate across all records (e.g., for summarization) |
Transform each record in your dataset according to natural language instructions, mapping source data to a target schema.
async def sem_map(
source: AG | pd.DataFrame,
target_type: Type[BaseModel] | str,
instructions: str,
merge_output: bool = True,
**kwargs,
) -> AG | pd.DataFramesource(AG | pd.DataFrame): Input data to be mappedtarget_type(Type[BaseModel] | str): Target schema for the output- If a Pydantic
BaseModelsubclass: used directly as the target type - If a
str: a Pydantic model is created dynamically with a single string field
- If a Pydantic
instructions(str): Natural language description of how to transform the datamerge_output(bool, default=True):True: Merge mapped fields back into original source recordsFalse: Return only the mapped output
**kwargs: Additional arguments forwarded toAG()constructor (e.g., model configuration, batching)
AG | pd.DataFrame:AGorDataFramethat contains the transformed data followingtarget_type
import pandas as pd
from agentics.core.semantic_operators import sem_map
from pydantic import BaseModel
# Sample data
df = pd.DataFrame({
'review': [
'This product is amazing! Best purchase ever.',
'Terrible quality, broke after one day.',
'It works okay, nothing special.'
]
})
# Define target schema
class Sentiment(BaseModel):
sentiment: Optional[str] = Field(None, description="The sentiment of the review (e.g., positive, negative, neutral)")
confidence: Optional[float] = Field(None, description="Confidence score of the sentiment analysis btw 0 and 1")
# Map reviews to sentiment
result = await sem_map(
source=df,
target_type=Sentiment,
instructions="Analyze the sentiment of the review and provide a confidence score between 0 and 1."
)
# Output includes original 'review' column plus 'sentiment' and 'confidence' columns
review sentiment confidence
0 This product is amazing! Best purchase ever. positive 0.85
1 Terrible quality, broke after one day. negative 0.99
2 It works okay, nothing special. neutral 0.85# Using string target type for simpler cases
result = await sem_map(
source=df,
target_type="category",
instructions="Classify the review into one of: positive, negative, neutral"
)Filter records based on a natural language predicate, keeping only those that satisfy the condition.
async def sem_filter(
source: AG | pd.DataFrame,
predicate_template: str,
**kwargs
) -> AG | pd.DataFramesource(AG | pd.DataFrame): Input data to be filteredpredicate_template(str): Natural language condition or LangChain-style template- Can use
{field}placeholders to reference source fields - Or provide a plain text predicate
- Can use
**kwargs: Additional arguments forwarded toAG()constructor
AG | pd.DataFrame: Filtered data containing only records that satisfy the predicate
from agentics.core.semantic_operators import sem_filter
df = pd.DataFrame({
'product': ['Laptop', 'Phone', 'Tablet', 'Monitor'],
'description': [
'High-performance gaming laptop with RGB keyboard',
'Budget smartphone with basic features',
'Premium tablet with stylus support',
'4K monitor for professional work'
]
})
# Filter for premium/high-end products
result = await sem_filter(
source=df,
predicate_template="The product is premium or high-end"
)
print(result)
product description
0 Laptop High-performance gaming laptop with RGB keyboard
1 Tablet Premium tablet with stylus support
2 Monitor 4K monitor for professional work# Use field placeholders in the predicate
result = await sem_filter(
source=df,
predicate_template="The {product} described as '{description}' is suitable for gaming"
)Aggregate data across all records to produce a summary or consolidated output.
async def sem_agg(
source: AG | pd.DataFrame,
target_type: Type[BaseModel] | str,
instructions: str = None,
**kwargs,
) -> AG | pd.DataFramesource(AG | pd.DataFrame): Input data to be aggregatedtarget_type(Type[BaseModel] | str): Schema for the aggregated outputinstructions(str, optional): Natural language description of the aggregation**kwargs: Additional arguments forwarded toAG()constructor
AG | pd.DataFrame: Aggregated result (typically a single record or summary)
from agentics.core.semantic_operators import sem_agg
from pydantic import BaseModel
df = pd.DataFrame({
'review': [
'Great product, very satisfied!',
'Good quality but expensive',
'Not worth the price',
'Excellent, would buy again',
'Decent but has some issues'
]
})
class ReviewSummary(BaseModel):
overall_sentiment: str
key_themes: list[str]
recommendation: str
# Aggregate all reviews into a summary
result = await sem_agg(
source=df,
target_type=ReviewSummary,
instructions="Summarize all reviews, identify key themes, and provide an overall recommendation"
)
print(result)
# Returns a single record with aggregated insightsclass Statistics(BaseModel):
total_count: int
positive_count: int
negative_count: int
average_sentiment: str
result = await sem_agg(
source=df,
target_type=Statistics,
instructions="Count total reviews, positive reviews, negative reviews, and determine average sentiment"
)sem_map: Use for 1:1 transformations (each input → one output)sem_filter: Use for selecting subsets based on conditionssem_agg: Use for many:1 transformations (all inputs → one summary)
# ❌ Vague
instructions = "Process the data"
# ✅ Clear and specific
instructions = """
Extract the product name, price, and category from each description.
Normalize prices to USD. Categorize products as: Electronics, Clothing, or Home Goods.
"""# For simple extractions, use string types
sem_map(
...
target_type= "category_name"
...
)
# For structured outputs, use Pydantic models
class Product(BaseModel):
name: str
price: float
category: str
sem_map(
...
target_type= Product
...
)# Configure batch size for large datasets
result = await sem_map(
source=large_df,
target_type=MyType,
instructions="...",
amap_batch_size=50 # Process 50 records at a time
)# Operators work with both types
df_result = await sem_filter(df, "condition") # Returns DataFrame
ag_result = await sem_filter(ag, "condition") # Returns AGSemantic operators support batching for efficient processing of large datasets:
result = await sem_map(
source=df,
target_type=MyType,
instructions="...",
amap_batch_size=20 # Default for sem_filter
)Semantic operators integrate seamlessly with other Agentics features:
# Filter → Map → Aggregate pipeline
filtered = await sem_filter(df, "High-value customers")
mapped = await sem_map(filtered, CustomerProfile, "Extract profile details")
summary = await sem_agg(mapped, Summary, "Summarize customer segments")- 👉 Semantic Operators Tutorial - Code examples
- 👉 Agentics (AG) for data modeling patterns and typed state containers
- 👉 Index