Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
quickstart.py	quickstart.py
requirements.txt	requirements.txt

Pinecone Upsert Example

Complete example showing how to upsert Skill Seekers documents to Pinecone and perform semantic search.

What This Example Does

Creates a Pinecone serverless index
Loads Skill Seekers-generated documents (LangChain format)
Generates embeddings with OpenAI
Upserts documents to Pinecone with metadata
Demonstrates semantic search capabilities
Provides interactive search mode

Prerequisites

# Install dependencies
pip install pinecone-client openai

# Set API keys
export PINECONE_API_KEY=your-pinecone-api-key
export OPENAI_API_KEY=sk-...

Generate Documents

First, generate LangChain-format documents using Skill Seekers:

# Option 1: Use preset config (e.g., Django)
skill-seekers scrape --config configs/django.json
skill-seekers package output/django --target langchain

# Option 2: From GitHub repo
skill-seekers github --repo django/django --name django
skill-seekers package output/django --target langchain

# Output: output/django-langchain.json

Run the Example

cd examples/pinecone-upsert

# Run the quickstart script
python quickstart.py

What You'll See

Index creation (if it doesn't exist)
Documents loaded with category breakdown
Batch upsert with progress tracking
Example queries demonstrating semantic search
Interactive search mode for your own queries

Example Output

============================================================
PINECONE UPSERT QUICKSTART
============================================================

Step 1: Creating Pinecone index...
✅ Index created: skill-seekers-demo

Step 2: Loading documents...
✅ Loaded 180 documents
   Categories: {'api': 38, 'guides': 45, 'models': 42, 'overview': 1, ...}

Step 3: Upserting to Pinecone...
Upserting 180 documents...
Batch size: 100
  Upserted 100/180 documents...
  Upserted 180/180 documents...
✅ Upserted all documents to Pinecone
   Total vectors in index: 180

Step 4: Running example queries...
============================================================

QUERY: How do I create a Django model?
------------------------------------------------------------
  Score: 0.892
  Category: models
  Text: Django models are Python classes that define the structure of your database tables...

  Score: 0.854
  Category: api
  Text: To create a model, inherit from django.db.models.Model and define fields...

============================================================
INTERACTIVE SEMANTIC SEARCH
============================================================
Search the documentation (type 'quit' to exit)

Query: What are Django views?

Features Demonstrated

Serverless Index - Auto-scaling Pinecone infrastructure
Batch Upsertion - Efficient bulk loading (100 docs/batch)
Metadata Filtering - Category-based search filters
Semantic Search - Vector similarity matching
Interactive Mode - Real-time query interface

Files in This Example

quickstart.py - Complete working example
README.md - This file
requirements.txt - Python dependencies

Cost Estimate

For 1000 documents:

Embeddings: ~$0.01 (OpenAI ada-002)
Storage: ~$0.03/month (Pinecone serverless)
Queries: ~$0.025 per 100k queries

Total first month: ~$0.04 + query costs

Customization Options

Change Index Name

INDEX_NAME = "my-custom-index"  # Line 215

Adjust Batch Size

batch_upsert(index, openai_client, documents, batch_size=50)  # Line 239

Filter by Category

matches = semantic_search(
    index=index,
    openai_client=openai_client,
    query="your query",
    category="models"  # Only search in "models" category
)

Use Different Embedding Model

# In create_embeddings() function
response = openai_client.embeddings.create(
    model="text-embedding-3-small",  # Cheaper, smaller dimension
    input=texts
)

# Update index dimension to 1536 (for text-embedding-3-small)
create_index(pc, INDEX_NAME, dimension=1536)

Troubleshooting

"Index already exists"

Normal message if you've run the script before
The script will reuse the existing index

"PINECONE_API_KEY not set"

Get API key from: https://app.pinecone.io/
Set environment variable: export PINECONE_API_KEY=your-key

"OPENAI_API_KEY not set"

Get API key from: https://platform.openai.com/api-keys
Set environment variable: export OPENAI_API_KEY=sk-...

"Documents not found"

Make sure you've generated documents first (see "Generate Documents" above)
Check the DOCS_PATH in quickstart.py matches your output location

"Rate limit exceeded"

OpenAI or Pinecone rate limit hit
Reduce batch_size: batch_size=50 or batch_size=25
Add delays between batches

Advanced Usage

Load Existing Index

from pinecone import Pinecone

pc = Pinecone(api_key="your-api-key")
index = pc.Index("skill-seekers-demo")

# Query immediately (no need to re-upsert)
results = index.query(
    vector=query_embedding,
    top_k=5,
    include_metadata=True
)

Update Existing Documents

# Upsert with same ID to update
index.upsert(vectors=[{
    "id": "doc_123",
    "values": new_embedding,
    "metadata": updated_metadata
}])

Delete Documents

# Delete by ID
index.delete(ids=["doc_123", "doc_456"])

# Delete by metadata filter
index.delete(filter={"category": {"$eq": "deprecated"}})

# Delete all (namespace)
index.delete(delete_all=True)

Use Namespaces

# Upsert to namespace
index.upsert(vectors=vectors, namespace="production")

# Query specific namespace
results = index.query(
    vector=query_embedding,
    namespace="production",
    top_k=5
)

Related Examples

Need help? GitHub Discussions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

Pinecone Upsert Example

What This Example Does

Prerequisites

Generate Documents

Run the Example

What You'll See

Example Output

Features Demonstrated

Files in This Example

Cost Estimate

Customization Options

Change Index Name

Adjust Batch Size

Filter by Category

Use Different Embedding Model

Troubleshooting

Advanced Usage

Load Existing Index

Update Existing Documents

Delete Documents

Use Namespaces

Related Examples

Uh oh!

FilesExpand file tree

pinecone-upsert

Directory actions

More options

Directory actions

More options

Latest commit

History

pinecone-upsert

Folders and files

parent directory

README.md

Pinecone Upsert Example

What This Example Does

Prerequisites

Generate Documents

Run the Example

What You'll See

Example Output

Features Demonstrated

Files in This Example

Cost Estimate

Customization Options

Change Index Name

Adjust Batch Size

Filter by Category

Use Different Embedding Model

Troubleshooting

Advanced Usage

Load Existing Index

Update Existing Documents

Delete Documents

Use Namespaces

Related Examples