diff --git a/examples/partners/temporal_agents_with_knowledge_graphs/temporal_agents_with_knowledge_graphs.ipynb b/examples/partners/temporal_agents_with_knowledge_graphs/temporal_agents_with_knowledge_graphs.ipynb
index 32f9386bc7..8297114396 100644
--- a/examples/partners/temporal_agents_with_knowledge_graphs/temporal_agents_with_knowledge_graphs.ipynb
+++ b/examples/partners/temporal_agents_with_knowledge_graphs/temporal_agents_with_knowledge_graphs.ipynb
@@ -11,59 +11,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Table of Contents"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "1. **[Executive Summary](#1-executive-summary)**\n",
- " - 1.1. Purpose and Audience\n",
- " - 1.2. Key Takeaways\n",
- "2. **[How to Use this Cookbook](#2-how-to-use-this-cookbook)**\n",
- " - 2.1. Pre-requisites\n",
- "3. **[Creating a Temporally-Aware Knowledge Graph with a Temporal Agent](#3-creating-a-temporally-aware-knowledge-graph-with-a-temporal-agent)**\n",
- " - 3.1. Introducing our Temporal Agent\n",
- " - 3.1.1. Key enhancements introduced in this cookbook\n",
- " - 3.1.2. The Temporal Agent Pipeline\n",
- " - 3.1.3. Selecting the right model for a Temporal Agent \n",
- " - 3.2. Building our Temporal Agent Pipeline\n",
- " - 3.2.1. Load transcripts\n",
- " - 3.2.2. Creating a Semantic Chunker\n",
- " - 3.2.3. Laying the Foundations for our Temporal Agent\n",
- " - 3.2.4. Statement Extraction\n",
- " - 3.2.5. Temporal Range Extraction\n",
- " - 3.2.6. Creating our Triplets\n",
- " - 3.2.7. Temporal Event\n",
- " - 3.2.8. Defining our Temporal Agent\n",
- " - 3.2.9. Entity Resolution\n",
- " - 3.2.10. Invalidation agent\n",
- " - 3.2.11. Putting it all together\n",
- " - 3.3. Knowledge Graphs\n",
- " - 3.3.1. Building our Knowledge Graph with NetworkX\n",
- " - 3.3.2. NetworkX versus Neo4j in Production\n",
- " - 3.4. Evaluation and Suggested Feature Additions\n",
- " - 3.4.1. Temporal Agent\n",
- " - 3.4.2. Invalidation Agent\n",
- "4. **[Multi-Step Retrieval Over a Knowledge Graph](#4-multi-step-retrieval-over-a-knowledge-graph)**\n",
- " - 4.1. Building our Retrieval Agent\n",
- " - 4.1.1. Imports\n",
- " - 4.1.2. (Re-)Initialize OpenAI Client\n",
- " - 4.1.3. (Re-)Load our Temporal Knowledge Graph\n",
- " - 4.1.4. Planner\n",
- " - 4.1.5. Function Calling\n",
- " - 4.1.6. Retriever\n",
- " - 4.1.7. Selecting the right model for Multi-Step Knowledge-Graph Retrieval\n",
- " - 4.2. Elevating your Retrieval System\n",
- "5. **[Prototype to Production](#5-prototype-to-production)**"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# 1. Executive summary\n",
+ "# 1. Executive Summary\n",
"---"
]
},
@@ -6288,14 +6236,14 @@
"- **Scaling retrieval agents**\n",
"- **Safeguards**\n",
"\n",
- "This section serves as a walkthrough of key considerations and best practices to ensure your temporally-aware knowledge graph can operate reliably in a real-world environment. A more detailed [Prototype to Production Appendix section](./Appendix.ipynb) can be found in the repository for this cookbook.\n",
+ "This section serves as a walkthrough of key considerations and best practices to ensure your temporally-aware knowledge graph can operate reliably in a real-world environment. A more detailed [Prototype to Production Appendix section](#a-prototype-to-production) can be found towards the end of this cookbook.\n",
"\n",
"
\n",
"\n",
"- \n",
" Storing and Retrieving High-Volume Graph Data
\n",
" \n",
- " Appendix section A.1. \"Storing and Retrieving High-Volume Graph Data\"\n",
+ " Appendix section A.1. \"Storing and Retrieving High-Volume Graph Data\"\n",
"
\n",
" \n",
" Manage scalability through thoughtful schema design, sharding, and partitioning. Clearly define entities, relationships, and ensure schema flexibility for future evolution. Use high-cardinality fields like timestamps for efficient data partitioning.\n",
@@ -6305,7 +6253,7 @@
"
- \n",
" Temporal Validity & Versioning
\n",
" \n",
- " Appendix section A.1.2. \"Temporal Validity & Versioning\"\n",
+ " Appendix section A.1.2. \"Temporal Validity & Versioning\"\n",
"
\n",
" \n",
" Include temporal markers (valid_from, valid_to) for each statement. Maintain historical records non-destructively by marking outdated facts as inactive and indexing temporal fields for efficient queries.\n",
@@ -6315,7 +6263,7 @@
"
- \n",
" Indexing & Semantic Search
\n",
" \n",
- " Appendix section A.1.3. \"Indexing & Semantic Search\"\n",
+ " Appendix section A.1.3. \"Indexing & Semantic Search\"\n",
"
\n",
" \n",
" Utilize B-tree indexes for efficient temporal querying. Leverage PostgreSQL’s pgvector extension for semantic search with approximate nearest-neighbor algorithms like ivfflat, ivfpq, and hnsw to optimize query speed and memory usage.\n",
@@ -6325,7 +6273,7 @@
"
- \n",
" Managing and Pruning Datasets
\n",
" \n",
- " Appendix section A.2. \"Managing and Pruning Datasets\"\n",
+ " Appendix section A.2. \"Managing and Pruning Datasets\"\n",
"
\n",
" \n",
" Establish TTL and archival policies for data retention based on source reliability and relevance. Implement automated archival tasks and intelligent pruning with relevance scoring to optimize graph size.\n",
@@ -6335,7 +6283,7 @@
"
- \n",
" Concurrent Ingestion Pipeline
\n",
" \n",
- " Appendix section A.3. \"Implementing Concurrency in the Ingestion Pipeline\"\n",
+ " Appendix section A.3. \"Implementing Concurrency in the Ingestion Pipeline\"\n",
"
\n",
" \n",
" Implement batch processing with separate, scalable pipeline stages for chunking, extraction, invalidation, and entity resolution. Optimize throughput and parallelism to manage ingestion bottlenecks.\n",
@@ -6345,7 +6293,7 @@
"
- \n",
" Minimizing Token Costs
\n",
" \n",
- " Appendix section A.4. \"Minimizing Token Cost\"\n",
+ " Appendix section A.4. \"Minimizing Token Cost\"\n",
"
\n",
" \n",
" Use caching strategies to avoid redundant API calls. Adopt service tiers like OpenAI's flex option to reduce costs and replace expensive model queries with efficient embedding and nearest-neighbor search.\n",
@@ -6355,7 +6303,7 @@
"
- \n",
" Scaling Retrieval Agents
\n",
" \n",
- " Appendix section A.5. \"Scaling and Productionizing our Retrieval Agent\"\n",
+ " Appendix section A.5. \"Scaling and Productionizing our Retrieval Agent\"\n",
"
\n",
" \n",
" Use a controller and traversal workers architecture to handle multi-hop queries. Implement parallel subgraph extraction, dynamic traversal with chained reasoning, caching, and autoscaling for high performance.\n",
@@ -6365,7 +6313,7 @@
"
- \n",
" Safeguards & Verification
\n",
" \n",
- " Appendix section A.6. \"Safeguards\"\n",
+ " Appendix section A.6. \"Safeguards\"\n",
"
\n",
" \n",
" Deploy multi-layered output verification, structured logging, and monitoring to ensure data integrity and operational reliability. Track critical metrics and perform regular audits.\n",
@@ -6375,7 +6323,7 @@
"
- \n",
" Prompt Optimization
\n",
" \n",
- " Appendix section A.7. \"Prompt Optimization\"\n",
+ " Appendix section A.7. \"Prompt Optimization\"\n",
"
\n",
" \n",
" Optimize LLM interactions with personas, few-shot prompts, chain-of-thought methods, dynamic context management, and automated A/B testing of prompt variations for continuous performance improvement.\n",
@@ -6418,6 +6366,643 @@
"- [Danny Wigg](https://www.linkedin.com/in/dannywigg/)\n",
"- [Shikhar Kwatra](https://www.linkedin.com/in/shikharkwatra/)"
]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Appendix\n",
+ "---\n",
+ "\n",
+ "Within this appendix, you'll find a more in-depth *Prototype to Production* section."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3f70ab4a-e04a-44aa-8e75-f9c15be22c5f",
+ "metadata": {},
+ "source": [
+ "## A. Prototype to Production"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b97b1677-7c77-4bad-bdfe-b541901c3b81",
+ "metadata": {},
+ "source": [
+ "### A.1. Storing and Retrieving High-Volume Graph Data"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ada5b662-f493-41e3-a5f5-85f92dec61c0",
+ "metadata": {},
+ "source": [
+ "#### A.1.1. Data Volume & Schema Complexity"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8a1009ec-9885-486a-ab69-5d86720389a6",
+ "metadata": {},
+ "source": [
+ "As your dataset scales to millions or even billions of nodes and edges, managing performance and maintainability becomes critical. This requires thoughtful approaches to both schema design and data partitioning:\n",
+ "\n",
+ "
\n",
+ " - \n",
+ " Schema design for growth and change
\n",
+ " \n",
+ " Clearly define core entity types (e.g., Person
, Organization
, Event
) and relationships. Design the schema with versioning and flexibility in mind, enabling future schema evolution with minimal downtime.\n",
+ "
\n",
+ " \n",
+ "\n",
+ " - \n",
+ " Sharding & partitioning
\n",
+ " \n",
+ " Use high-cardinality fields (such as timestamps or unique entity IDs) for partitioning to preserve query performance as data volume grows. This is particularly important for temporally-aware data. For example:\n",
+ "
\n",
+ " \n",
+ " ```sql \n",
+ " CREATE TABLE statements (\n",
+ " statement_id UUID PRIMARY KEY,\n",
+ " entity_id UUID NOT NULL,\n",
+ " text TEXT NOT NULL,\n",
+ " valid_from TIMESTAMP NOT NULL,\n",
+ " valid_to TIMESTAMP,\n",
+ " status VARCHAR(16) DEFAULT 'active',\n",
+ " embedding VECTOR(1536),\n",
+ " ...\n",
+ " ) PARTITION BY RANGE (valid_from);\n",
+ " ```\n",
+ " \n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f91fa408-9b9c-4fc5-9471-c0ccbc49bbae",
+ "metadata": {},
+ "source": [
+ "#### A.1.2. Temporal Validity & Versioning"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cb172a41-c085-4fc3-b7b4-80d22c3dbabf",
+ "metadata": {},
+ "source": [
+ "In our temporal knowledge graph, each statement includes temporal markers (e.g., `valid_from`, `valid_to`). \n",
+ "\n",
+ "\n",
+ " - \n",
+ " Preserve history non-destructively
\n",
+ " \n",
+ " Avoid deleting or overwriting records. Instead mark outdated facts as inactive by setting a status
(e.g., inactive
).\n",
+ "
\n",
+ " \n",
+ "\n",
+ " - \n",
+ " Optimize for temporal access
\n",
+ " \n",
+ " Index temporal fields (valid_from
, valid_to
) to support efficient querying of both current and historical states.\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "\n",
+ "\n",
+ "#### Example: Non-Destructive Updates\n",
+ "\n",
+ "Rather than removing or overwriting a record, update its status and close its validity window:\n",
+ "\n",
+ "```sql\n",
+ "UPDATE statements\n",
+ "SET status = 'inactive', valid_to = '2025-03-15T00:00:00Z'\n",
+ "WHERE statement_id = '...' AND entity_id = '...';\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "58caf1c8-49ae-46a9-bb16-8c767c21a362",
+ "metadata": {},
+ "source": [
+ "#### A.1.3. Indexing & Semantic Search"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e982b9c7-611c-4e29-9b56-d52fbdc844b1",
+ "metadata": {},
+ "source": [
+ "##### Temporal Indexes\n",
+ "To support efficient temporal queries create B-tree indexes on `valid_from` and `valid_to`. A 'B-tree' index is a tree data structure that keeps data sorted to facilitate fast lookups, range queries, and ordered scans in logarithmic time. It's the default index type in many relational databases. \n",
+ "\n",
+ "```sql\n",
+ "CREATE INDEX ON statements (valid_from);\n",
+ "CREATE INDEX ON statements (valid_to);\n",
+ "```\n",
+ "##### Semantic search with pgvector\n",
+ "Storing vector embeddings in PostgreSQL (via the `pgvector` extension) enables similarity-based retrieval via semantic search. This follows a two-step process:\n",
+ "1. Store high-dimensional vectors that represent the semantic meaning of the text. These can be created with embedding models such as OpenAI's `text-embedding-3-small` and `text-embedding-3-large`\n",
+ "2. Use Approximate Nearest-Neighbour (ANN) for efficient similarity matching at scale\n",
+ "\n",
+ "There are several different indexing options available in pgvector, each with different purposes. These indexing options are described in more detail, along with in-depth implementation steps in the [README on the Github repository for pgvector](https://github.com/pgvector/pgvector/blob/master/README.md).\n",
+ "| Index Type
| Build Time
| Query Speed
| Memory Usage
| Accuracy
| Recommended Scale
| Notes |\n",
+ "|-------------------------------------|--------------------------------------|----------------------------------------|-----------------------------------------|-----------------------------------|----------------------------------------------|-------|\n",
+ "| **flat**
| Minimal
| Slow
(linear scan)
| Low
| 100%
(exact)
| Very small
(< 100 K vectors)
| No approximate indexing—scans all vectors. Best for exact recall on small collections |\n",
+ "| **ivfflat**
| Moderate
| Fast when tuned
| Moderate
| High
(tunable)
| Small to Medium
(100 K–200 M)
| Uses inverted file indexing. Query-time parameters control trade-offs |\n",
+ "| **ivfpq**
| High
| Very fast
| Low
(quantized)
| Slightly lower
than ivfflat
| Medium to Large
(1 M–500 M)
| Combines inverted files with product quantization for lower memory use |\n",
+ "| **hnsw**
| Highest
| Fastest
(esp. at scale)
| High
(in-memory)
| Very high
| Large to Very Large
(100 M–Billions+)
| Builds a hierarchical navigable graph. Ideal for latency-sensitive, high-scale systems |\n",
+ "\n",
+ "\n",
+ "##### Tuning parameters for vector indexing\n",
+ "\n",
+ "`ivfflat`\n",
+ "* `lists`: Number of partitions (e.g., 100)\n",
+ "* `probes`: Number of partitions to scan at query time (e.g., 10-20), controls recall vs. latency\n",
+ "\n",
+ "`ivfpq`\n",
+ "* `subvectors`: Number of blocks to quantize (e.g., 16)\n",
+ "* `bits`: Number of bits per block (e.g., 8)\n",
+ "* `probes`: Same as in `ivfflat`\n",
+ "\n",
+ "`hnsw`\n",
+ "* `M`: Max connections per node (e.g., 16)\n",
+ "* `ef_construction`: Build-time dynamic candidate list size (e.g., 200)\n",
+ "* `ef_search`: Queyr-time candidate pool (e.g., 64-128)\n",
+ "\n",
+ "##### Best practices\n",
+ "- `flat` for debugging or small datasets\n",
+ "- `ivfflat` when you want tunable accuracy with good speed\n",
+ "- `ivfpq` when memory efficieny is critical\n",
+ "- `hnsw` when optimizing for lowest latency on massive collections\n",
+ "\n",
+ "##### Other vector database options in the ecosystem\n",
+ "\n",
+ "| Vector DB | Key Features | Pros | Cons |\n",
+ "| ------------ | ------------------------------------------------------------ | ------------------------------------------- | --------------------------------------------------------------- |\n",
+ "| **Pinecone** | Fully managed, serverless; supports HNSW and SPANN | Auto-scaling, SLA-backed, easy to integrate | Vendor lock-in; cost escalates at scale |\n",
+ "| **Weaviate** | GraphQL API, built-in modules for encoding and vectorization | Hybrid queries (metadata + vector), modular | Production deployment requires Kubernetes |\n",
+ "| **Milvus** | Supports GPU indexing; IVF, HNSW, ANNOY | High performance at scale, dynamic indexing | Operational complexity; separate system |\n",
+ "| **Qdrant** | Lightweight, real-time updates, payload filtering | Simple setup, good hybrid query support | Lacks native relational joins; eventual consistency in clusters |\n",
+ "| **Vectara** | Managed with semantic ranking and re-ranking | Strong relevance features; easy integration | Proprietary; limited index control |\n",
+ "\n",
+ "##### Choosing the Right Vector Store\n",
+ "\n",
+ "| Scale
| Recommendation
| Details |\n",
+ "|--------------------------------|------------------------------------------|---------|\n",
+ "| **Small to Medium Scale**
(less than 100M vectors)
| PostgreSQL + pgvector
with `ivfflat` index
| Often sufficient for moderate workloads. Recommended settings: `lists = 100–200`, `probes = 10–20`. |\n",
+ "| **Large Scale**
(100M – 1B+ vectors)
| Milvus or Qdrant
| Suitable for high-throughput workloads, especially when GPU-accelerated indexing or sub-millisecond latency is needed. |\n",
+ "| **Hybrid Scenarios**
| PostgreSQL for metadata
+ dedicated vector DB
| Use PostgreSQL for entity metadata storage and a vector DB (e.g., Milvus, Qdrant) for similarity search. Synchronize embeddings using CDC pipelines (e.g., Debezium). |\n",
+ "\n",
+ "For more detailed information, check out the [OpenAI cookbook on vector databases](https://cookbook.openai.com/examples/vector_databases/readme).\n",
+ "\n",
+ "##### Durable disk storage and backup\n",
+ "For some cases, especially those requiring high availability or state recovery across restarts, it may be worth persisting state to reliable disk storage and implementing a backup strategy. \n",
+ "\n",
+ "If durability is a concern, consider using persistent disks with regular backups or syncing state to external storage. While not necessary for all deployments, it can provide a valuable safeguard against data loss or operational disruption in environments where consistency and fault tolerance matter."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7e12f78a-eb4f-44ab-9dc1-2fea0e4d7090",
+ "metadata": {},
+ "source": [
+ "### A.2. Managing and Pruning Datasets"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "52acca83-7a99-436f-86ac-5cfe6b93ba86",
+ "metadata": {},
+ "source": [
+ "#### A.2.1. TTL (Time-to-Live) and Archival Policies"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6e6bbf33-4f3e-4536-a806-1b1a7d84add7",
+ "metadata": {},
+ "source": [
+ "Establish clear policies to determine which facts should be retained indefinitely (e.g., legally required records for regulators) and which can be archived after a defined period (e.g., statements sourced from social media more than one year old).\n",
+ "\n",
+ "Key practices to include:\n",
+ "\n",
+ " - \n",
+ " Automated Archival Jobs
\n",
+ " \n",
+ " Set up a background task that periodically queries for records with e.g., valid_to < NOW() - INTERVAL 'X days'
and moves them to an archival table for long-term storage.\n",
+ "
\n",
+ " \n",
+ "\n",
+ " - \n",
+ " Source-Specific Retention Policies
\n",
+ " \n",
+ " Tailor retention durations by data source or entity type. For example, high-authority sources like government publications may warrant longer retention than less reliable data such as scraped news headlines or user-generated content.\n",
+ "
\n",
+ " \n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9eefcbf8-8cba-40b3-888f-58b7f723ce5a",
+ "metadata": {},
+ "source": [
+ "#### A.2.2. Relevance Scoring and Intelligent Pruning"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0c6e905c-b91a-48d0-a84b-cff5e8f6dbf1",
+ "metadata": {},
+ "source": [
+ "As your knowledge graph grows, the utility of many facts will decline. To keep the graph focused and maximise performance: \n",
+ "\n",
+ " - \n",
+ " Index a Relevance Score
\n",
+ " \n",
+ " Introduce a numeric relevance_score
column (or columns) that incorporate metrics such as recency, source trustworthiness, and production query frequency.\n",
+ "
\n",
+ " \n",
+ "\n",
+ " - \n",
+ " Automated Pruning Logic
\n",
+ " \n",
+ " Schedule a routine job to prune or archive facts falling below a predefined relevance threshold.\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "\n",
+ "\n",
+ "#### Advanced Relevance-Based Graph Reduction\n",
+ "\n",
+ "Efficiently reducing the size of a knowledge graph is important when scaling. [A 2024 survey](https://arxiv.org/pdf/2402.03358) categorizes techniques into **sparsification**, **coarsening**, and **condensation**—all aimed at shrinking the graph while preserving task-critical semantics. These methods offer substantial runtime and memory gains on large-scale KGs.\n",
+ "\n",
+ "Example implementation pattern:\n",
+ "\n",
+ " - \n",
+ " Score Each Triple
\n",
+ " \n",
+ " Compute a composite relevance_score
, for example:\n",
+ "
\n",
+ " relevance_score = β1 * recency_score + β2 * source_trust_score + β3 * retrieval_count
\n",
+ " \n",
+ " Where:\n",
+ "
\n",
+ " \n",
+ " recency_score
: exponential decay from valid_from
\n",
+ " source_trust_score
: source-domain trust value \n",
+ " retrieval_count
: production query frequency \n",
+ "
\n",
+ " \n",
+ "\n",
+ " - \n",
+ " Apply a Reduction Strategy
\n",
+ " \n",
+ " - Sparsify: Select and retain only the most relevant edges or nodes based on criteria like centrality, spectral similarity, or embedding preservation
\n",
+ " - Coarsen: Group low-importance or semantically similar nodes into super-nodes and aggregate their features and connections
\n",
+ " - Condense: Construct a task-optimized mini-graph from scratch
\n",
+ "
\n",
+ " \n",
+ "\n",
+ " - \n",
+ " Validate in Shadow Mode
\n",
+ " \n",
+ " Log and compare outputs from the pruned vs. original graph before routing production traffic.\n",
+ "
\n",
+ " \n",
+ "\n",
+ " - \n",
+ " Re-Score Regularly
\n",
+ " \n",
+ " Recompute relevance (e.g., nightly) to ensure new or frequently accessed facts surface back to the top.\n",
+ "
\n",
+ " \n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c98d8110-9ac9-4247-bd54-a4dccd3cf61d",
+ "metadata": {},
+ "source": [
+ "### A.3. Implementing Concurrency in the Ingestion Pipeline"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "66e144a0-9837-478a-b112-cfa297eb664a",
+ "metadata": {},
+ "source": [
+ "Moving from prototype to production often requires you to transform your linear processing pipeline into a concurrent, scalable pipeline. Instead of processing documents sequentially (document → chunking → statement extraction → entity extraction → statement invalidation → entity resolution), implement a staged pipeline where each phase can scale independently.\n",
+ "\n",
+ "Design your pipeline with a series of specialized stages, each with its own queue and worker pool. This allows you to scale bottlenecks independently and maintain system reliability under varying loads. \n",
+ "\n",
+ "\n",
+ " - \n",
+ " Batch Chunking
\n",
+ " \n",
+ " Begin by collecting documents in batches of e.g., 100–500 using a job queue like Redis or Amazon SQS. Process these documents in parallel, splitting each into their respective chunks. The chunking stage should often optimize for I/O parallelization as document reading is often the bottleneck. You can then store the chunks and their respective metadata in your chunk_store
table, using bulk insert operations to minimize overhead.\n",
+ "
\n",
+ " \n",
+ "\n",
+ " - \n",
+ " Statement and Entity Extraction
\n",
+ " \n",
+ " Pull chunks in batches of e.g., 50–100 and send them to your chosen LLM (e.g., GPT-4.1-mini) using parallel API requests. Implement rate limiting with semaphores or other methods to stay safely within OpenAI's API limits whilst maximizing your throughputs. We've covered rate limiting in more detail in our cookbook on How to handle rate limits. Once extracted, you can then write these to the relevant table in your database.\n",
+ "
\n",
+ " \n",
+ " You can then similarly group the statements we've just extracted into batches, and run the entity extraction processes in a similar vein before storing them.\n",
+ "
\n",
+ " \n",
+ "\n",
+ " - \n",
+ " Statement Invalidation
\n",
+ " \n",
+ " Group extracted statement IDs by their associated entity clusters (e.g., all statements related to a specific entity like “Acme Corp.”). Send each cluster to your LLM (e.g., GPT-4.1-mini) in parallel to assess which statements are outdated or superseded. Use the model’s output to update the status
field in your statements
table—e.g., setting status = 'inactive'
. Parallelize invalidation jobs for performance and consider scheduling periodic sweeps for consistency.\n",
+ "
\n",
+ " \n",
+ "\n",
+ " - \n",
+ " Entity Resolution
\n",
+ " \n",
+ " Take batches of newly extracted entity mentions and compute embeddings using your model’s embedding endpoint. Insert these into your entity_registry
table, assigning each a provisional or canonical entity_id
. Perform approximate nearest-neighbor (ANN) searches using pgvector
to identify near-duplicates or aliases. You can then update the entities
table with resolved canonical IDs, ensuring downstream tasks reference unified representations.\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "\n",
+ "\n",
+ "### Advantages of Batch Processing\n",
+ "* Throughput – Batching reduces the overhead of individual API calls and database transactions.\n",
+ "\n",
+ "* Parallelism – Each stage can horizontally scale: you can run multiple worker processes for chunking, extraction, invalidation, etc., each reading from a queue.\n",
+ "\n",
+ "* Backpressure & Reliability – If one stage becomes slow (e.g., statement invalidation during a sudden data surge), upstream stages can buffer more items in the queue until capacity frees up.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "43cc5595-ae38-47e9-b90f-7b3f1bd004dc",
+ "metadata": {},
+ "source": [
+ "### A.4. Minimizing Token Cost"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "23ff03de-eeed-4bcf-9c1b-18bb8b449353",
+ "metadata": {},
+ "source": [
+ "#### A.4.1. Prompt Caching"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0f5a4127-058a-4cf9-9a35-a928658a99fd",
+ "metadata": {},
+ "source": [
+ "Avoid redundant API calls by memoizing responses to brittle sub-prompts.\n",
+ "\n",
+ "Implementation Strategy:\n",
+ "- **Cache Frequent Queries**: For example, repeated prompts like \"Extract entities from this statement\" on identicial statements\n",
+ "- **Use Hash Keys**: Generate a unique cache key using the MD5 hash of the statement text: `md5(statement_text)`\n",
+ "- **Storage Options**: Redis for scalable persistence or in-memory LRU cache for simplicity and speed\n",
+ "- **Bypass API Calls**: If a statement is found in cache, skip the API call"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "81de3bdc-8cc3-4016-8cd7-01754dd4b04d",
+ "metadata": {},
+ "source": [
+ "#### A.4.2. Service Tier: Flex"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "664765a2-6274-4285-b0c0-b1fb620cdb5a",
+ "metadata": {},
+ "source": [
+ "Utilize the `service_tier=flex` parameter in the OpenAI Responses SDK to enable partial completions and reduce costs.\n",
+ "\n",
+ "API Configuration:\n",
+ "```json\n",
+ "{\n",
+ " \"model\": \"o4-mini\",\n",
+ " \"prompt\": \"\",\n",
+ " \"service_tier\": \"flex\"\n",
+ "}\n",
+ "```\n",
+ "\n",
+ "Cost Benefits:\n",
+ "- Charges only for generated tokens, not prompt tokens\n",
+ "- Can reduce costs by up to 40% for short extractions (e.g., single-sentence entity lists)\n",
+ "\n",
+ "You can learn more about the power of Flex processing and how to utilise it in the [API documentation for Flex processing](https://platform.openai.com/docs/guides/flex-processing?api-mode=responses)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "12c39a74-8e08-4d62-bf09-8603c397b6f2",
+ "metadata": {},
+ "source": [
+ "#### A.4.3. Minimize \"Chattiness\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "05891cac-6f02-40c9-91e8-e19e0fb11e3a",
+ "metadata": {},
+ "source": [
+ "Replace expensive text-generation calls with more efficient alternatives where possible.\n",
+ "\n",
+ "Alternative approach:\n",
+ "- Use embeddings endpoint (cheaper per token) combined with pgvector nearest-neighbor search\n",
+ "- Instead of asking the model \"Which existing statement is most similar?\", compute embeddings once and query directly in Postgres\n",
+ "- This approach is particularly effective for semantic similarity tasks\n",
+ "\n",
+ "**Benefits:**\n",
+ "- Lower cost per operation\n",
+ "- Faster query response times\n",
+ "- Reduced API dependency for similarity searches"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "22affb30-4da1-41aa-9ae4-a285b7f2e060",
+ "metadata": {},
+ "source": [
+ "### A.5. Scaling and Productionizing our Retrieval Agent"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c8fb79ab-d623-4d11-bdff-70af141cf16b",
+ "metadata": {},
+ "source": [
+ "Once your graph is populated, you need a mechanism to answer multi-hop queries at scale. This requires:\n",
+ "\n",
+ "\n",
+ " - \n",
+ " Agent Architecture
\n",
+ " \n",
+ " - Controller Agent (Frontend): Receives a user question (e.g., “What events led to Acme Corp.’s IPO?”), then decomposes it into sub-questions or traversal steps.
\n",
+ " - Traversal Worker Agents: Each worker can perform a local graph traversal (e.g., “Find all facts where Acme Corp. has EventType = Acquisition between 2020–2025”), possibly in parallel on different partitions of the graph.
\n",
+ "
\n",
+ " \n",
+ "\n",
+ " - \n",
+ " Parallel Subgraph Extraction
\n",
+ " \n",
+ " - Partition the graph by entity ID hash (e.g., modulo 16). For a given query, identify which partitions are likely to contain relevant edges, then dispatch traversal tasks in parallel to each worker.
\n",
+ " - Workers return partial subgraphs (nodes + edges), and the Controller Agent merges them.
\n",
+ "
\n",
+ " \n",
+ "\n",
+ " - \n",
+ " Chained LLM Reasoning
\n",
+ " \n",
+ " For multi-hop questions, the Controller can prompt a model (e.g., GPT-4.1) with the partial subgraph and ask “Which next edge should I traverse?” This allows dynamic, context-aware traversal rather than blind breadth-first search.\n",
+ "
\n",
+ " \n",
+ "\n",
+ " - \n",
+ " Caching and Memoization
\n",
+ " \n",
+ " For frequently asked queries or subgraph patterns, cache the results (e.g., in Redis or a Postgres Materialized View) with a TTL equal to the fact’s valid_to
date, so that subsequent requests hit the cache instead of re-traversing.\n",
+ "
\n",
+ " \n",
+ "\n",
+ " - \n",
+ " Load Balancing & Autoscaling
\n",
+ " \n",
+ " Deploy the Traversal Worker Agents in a Kubernetes cluster with Horizontal Pod Autoscalers. Use CPU and memory metrics (and average queue length) to scale out during peak usage.\n",
+ "
\n",
+ " \n",
+ "
\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "28442f85-dfe6-444b-8e19-a9c84e1b23d5",
+ "metadata": {},
+ "source": [
+ "### A.6. Safeguards"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "21583d8e-06e1-4e56-a304-0bf2a4095dc0",
+ "metadata": {},
+ "source": [
+ "#### A.6.1 Multi-Layered Output Verification"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "40f741a4-ff66-4e2d-a3bc-db6ef1f04a26",
+ "metadata": {},
+ "source": [
+ "Run a lightweight validation pipeline to ensure outputs are as desired. Some examples of what can be included in this:\n",
+ "* Check that dates conform to `ISO-8601`\n",
+ "* Verify that entity types match your controlled vocabulary (e.g., if the model outputs an unexpected label, flag for manual review)\n",
+ "* Deploy a \"sanity-check\" function call to a smaller, cheaper model to verify the consistency of outputs (for example, “Does this statement parse correctly as a Fact? Yes/No.”)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1ebdc4fd-bba2-49d2-9e58-18a79c7af467",
+ "metadata": {},
+ "source": [
+ "#### A.6.2. Audit Logging & Monitoring"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5157ecd3-67af-485b-9d0a-a86f5d51d110",
+ "metadata": {},
+ "source": [
+ "- Implement structured logging with configurable verbosity levels (e.g., debug, info, warn, error)\n",
+ "- Store input pre-processing steps, intermediate outputs, and final results with full tracing, such as that offered via [OpenAI's tracing](https://platform.openai.com/traces)\n",
+ "- Track token throughput, latency, and error rates\n",
+ "- Monitor data quality metrics where possible, such as document or statement coverage, temporal resolution rates, and more\n",
+ "- Measure business-related metrics such as user numbers, average message volume, and user satisfaction"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b65f7222-cf56-4b35-a052-a5bf0ab99984",
+ "metadata": {},
+ "source": [
+ "### A.7. Prompt Optimization"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f0dc981e-2a27-4ebe-8917-8eebf725f344",
+ "metadata": {},
+ "source": [
+ "\n",
+ " - \n",
+ " Personas
\n",
+ " \n",
+ " Introducing a persona to the model is an effective way to drive performance. Once you have narrowed down the specialism of the component you are developing the prompt for, you can create a persona in the system prompt that helps to shape the model's behaviour. We used this in our planner model to create a system prompt like this:\n",
+ "
\n",
+ " initial_planner_system_prompt = (\n",
+ " \"You work for the leading financial firm, ABC Incorporated, one of the largest financial firms in the world. \"\n",
+ " \"Due to your long and esteemed tenure at the firm, various equity research teams will often come to you \"\n",
+ " \"for guidance on research tasks they are performing. Your expertise is particularly strong in the area of \"\n",
+ " \"ABC Incorporated's proprietary knowledge base of earnings call transcripts. This contains details that have been \"\n",
+ " \"extracted from the earnings call transcripts of various companies with labelling for when these statements are, or \"\n",
+ " \"were, valid. You are an expert at providing instructions to teams on how to use this knowledge graph to answer \"\n",
+ " \"their research queries. \\n\"\n",
+ ")
\n",
+ " \n",
+ " Persona prompts can become much more developed and specific than this, but this should provide an insight into what this looks like in practice.\n",
+ "
\n",
+ " \n",
+ "\n",
+ " - \n",
+ " Few-Shot Prompting and Chain-of-Thought
\n",
+ " \n",
+ " For extraction-related tasks, such as statement extraction, a concise few-shot prompt (2–5 examples) will typically deliver higher precision than a zero-shot prompt at a marginal increase in cost.\n",
+ "
\n",
+ " \n",
+ " For e.g., temporal reconciliation tasks, chain-of-thought methods where you guide the model through comparison logic are more appropriate. This can look like:\n",
+ "
\n",
+ " Example 1: [Old fact], [New fact] → Invalidate\n",
+ "Example 2: [Old fact], [New fact] → Coexist\n",
+ "Now: [Old fact], [New fact] →
\n",
+ " \n",
+ "\n",
+ " - \n",
+ " Dynamic Prompting & Context Management
\n",
+ " \n",
+ " You can also lean on other LLMs or more structured methods to prune and prepare material that will be dynamically passed to prompts. We saw an example of this when building the tools for our retriever above, where the timeline_generation
tool sorts the retrieved material before passing it back to the central orchestrator.\n",
+ "
\n",
+ " \n",
+ " Steps to clean up the context or compress it mid-run can also be highly effective for longer-running queries.\n",
+ "
\n",
+ " \n",
+ "\n",
+ " - \n",
+ " Template Library & A/B Testing
\n",
+ " \n",
+ " Maintain a set of prompt templates in a version-controlled directory (e.g., prompts/statement_extraction.json
, prompts/entity_extraction.json
) to enable you to audit past changes and revert if necessary. You can utilize OpenAI's reusuable prompts for this. In the OpenAI dashboard, you can develop reusable prompts to use in API requests. This enables you to build and evaluate your prompts, deploying updated and improved versions without ever changing the code.\n",
+ "
\n",
+ " \n",
+ " Automate A/B testing by periodically sampling extracted facts from the pipeline, re-running them through alternative prompts, and comparing performance scores (you can track this in a separate evaluation harness).\n",
+ "
\n",
+ " \n",
+ " Track key performance indicators (KPIs) such as extraction latency, error rates, and invalidation accuracy.\n",
+ "
\n",
+ " \n",
+ " If any metric drifts beyond a threshold (e.g., invalidation accuracy drops below 90%), trigger an alert and roll back to a previous prompt version.\n",
+ "
\n",
+ " \n",
+ "
\n"
+ ]
}
],
"metadata": {
@@ -6436,7 +7021,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.11.8"
+ "version": "3.12.11"
}
},
"nbformat": 4,
diff --git a/registry.yaml b/registry.yaml
index b5448a7dbe..278c35de00 100644
--- a/registry.yaml
+++ b/registry.yaml
@@ -147,15 +147,16 @@
path: examples/partners/temporal_agents_with_knowledge_graphs/temporal_agents_with_knowledge_graphs.ipynb
date: 2025-07-22
authors:
- - shikhar-cyber
- dwigg-openai
+ - shikhar-cyber
- Alex Heald
- Douglas Adams
- Rishabh Sagar
tags:
- knowledge-graphs
- - temporal-agents
- - RAG
+ - retrieval
+ - functions
+ - responses
- title: Using Evals API on Image Inputs
path: examples/evaluation/use-cases/EvalsAPI_Image_Inputs.ipynb