Citation Analysis System

Important: This is sample code for demonstration and educational purposes. It is not intended for production use without additional security review and testing. You should work with your security and compliance teams to meet your organizational requirements before deploying to production environments.

A serverless citation analysis system that queries multiple AI models with web search capabilities, captures their responses and citations, crawls cited pages using Amazon Bedrock AgentCore, and stores comprehensive results in DynamoDB for analysis.

Architecture

The system uses AWS Step Functions to orchestrate the workflow and is deployed using AWS CDK in TypeScript.

Components

Search Lambda: Queries OpenAI, Perplexity, Gemini, and Claude with keywords
Deduplication Lambda: Normalizes URLs and deduplicates citations across providers
Crawler Lambda: Crawls cited pages using Bedrock AgentCore browser tools
Step Functions: Orchestrates the workflow
DynamoDB: Stores search results, citations, and crawled content
Secrets Manager: Securely stores API keys

Project Structure

citation-analysis-system/
├── bin/                        # CDK app entry point
├── lib/                        # CDK stack and constructs
│   ├── citation-analysis-stack.ts
│   └── constructs/auth.ts      # Cognito auth construct
├── lambda/
│   ├── api/                    # API endpoint handlers (20+ endpoints)
│   ├── search/                 # AI provider search + brand extraction
│   ├── deduplication/          # Citation deduplication
│   ├── crawler/                # Web crawling with AgentCore
│   ├── generate-summary/       # Execution summary generation
│   ├── parse-keywords/         # Keyword file parsing
│   ├── layer/                  # Shared Lambda layer
│   └── shared/                 # Common utilities and decorators
├── web/                        # React frontend (Vite + Tailwind)
│   └── src/
│       ├── components/         # UI components by feature
│       ├── hooks/              # Custom React hooks
│       └── types/              # TypeScript types
├── scripts/                    # Deployment and build scripts
└── cdk.json

Prerequisites

Node.js 20+ and npm
Python 3.12
AWS CLI configured with appropriate permissions
AWS CDK CLI (npm install -g aws-cdk)
At least one API key from: OpenAI (recommended), Perplexity, Google Gemini, or Anthropic Claude
Docker recommended for building Lambda layers (falls back to cross-compilation if unavailable)

Setup

Quick Start (Automated Deployment)

Use the automated deployment script for a streamlined setup:

./scripts/deploy.sh

This script will:

Check all prerequisites (Node.js, Python, AWS CLI, CDK)
Install Node.js and Python dependencies
Build the Lambda Layer
Compile TypeScript
Check/bootstrap CDK if needed
Deploy the CDK stack
Verify the deployment
Display next steps

Manual Setup (Advanced)

⚠️ Important: Manual setup requires a two-pass deployment. The web dashboard needs configuration values (API Gateway URL, Cognito IDs) that only exist after the first deployment. If you skip steps or run them out of order, you'll encounter CORS errors. For most use cases, npm run deploy handles this automatically.

Expand manual setup instructions

1. Install Dependencies

# Install CDK dependencies
npm install

# Install Python dependencies for each Lambda
cd lambda/search && pip install -r requirements.txt -t .
cd ../deduplication && pip install -r requirements.txt -t .
cd ../crawler && pip install -r requirements.txt -t .

2. Bootstrap CDK (First Time Only)

cdk bootstrap

3. First Deploy (creates the stack)

cdk deploy

4. Build the Web Dashboard

./scripts/build-web.sh

This fetches the API Gateway URL and Cognito configuration from CloudFormation and builds the React dashboard with the correct config.

5. Second Deploy (uploads the configured dashboard)

cdk deploy

The first deploy creates the infrastructure so the build script can fetch the configuration. The second deploy uploads the correctly configured dashboard to S3.

Configure Your First Analysis

After deployment:

Open the dashboard URL (shown in CDK outputs)
Log in with your credentials (admin creates accounts via CLI — see User Registration below)
Go to Settings > Providers to enter your API keys for each AI provider you want to use
Go to Settings > Brand Tracking to configure brand detection:
- Select an industry preset (hotels, airlines, retail, etc.) or choose Custom
- Add your brand names under First Party Brands - these are tracked as "yours"
- Add competitor brands under Competitors - these are tracked for comparison
- The visibility dashboard calculates your share of voice based on this configuration
Go to Keywords to add search queries you want to analyze (e.g., "best hotels in Barcelona")
Click Run Analysis to start your first analysis run

Visualization Dashboard

After deployment, access the real-time visualization dashboard at the CloudFront URL provided in the CDK outputs:

# Get the dashboard URL from CDK outputs
aws cloudformation describe-stacks \
  --stack-name CitationAnalysisStack \
  --query 'Stacks[0].Outputs[?OutputKey==`DashboardUrl`].OutputValue' \
  --output text

Dashboard Features

Dashboard overview with real-time statistics, provider comparison, and brand mentions

Real-time Statistics: Auto-refreshes every 30 seconds
Provider Comparison: Bar chart showing citation counts by AI provider
Brand Analysis: Pie chart of brand mentions across AI providers
Top Citations: Table of most frequently cited URLs
Recent Searches: Latest search results with timestamps
Filtering: Filter by keyword or provider

Brand mention tracking with sentiment analysis and ranking context

Citation gap analysis showing sources that cite competitors but not you

Content Studio generating optimised content based on citation gaps

The dashboard is a static React application hosted on S3 and served via CloudFront, with data fetched from API Gateway endpoints backed by Lambda functions.

Usage Guide

Step 1: Configure AI Providers

Go to Settings > AI Providers. The system supports four providers:

Provider	Model	Description
OpenAI	GPT-5 mini	Native web search via Responses API
Perplexity	Sonar	Real-time web search
Google Gemini	Flash 2.0	Google Search grounding
Anthropic Claude	Sonnet 4.5	Web search tool

You need at least one provider configured with a valid API key. OpenAI is recommended as the primary provider. The system skips providers with placeholder or invalid keys and falls through to the next available one.

Paid API keys are required. This tool analyses the AI providers you want to track, which means it calls their APIs at scale. Free tiers are designed for single API calls and will hit rate limits immediately under any real analysis workload. A small amount of paid credit per provider ($5-10) is more than enough for regular use.

For each provider you want to use:

Click Add Key and enter your API key
Toggle the switch to enable the provider
Only enabled providers with configured keys will be used during analysis

API keys are stored securely in AWS Secrets Manager.

Step 2: Configure Brand Tracking

Go to Settings > Brand Tracking. This is essential for visibility scoring.

Select an industry preset (hotels, airlines, retail, technology, etc.) or choose Custom
Add First Party Brands – Your brands. These are tracked as "yours" in visibility calculations.
Add Competitor Brands – Brands you want to compare against.
Optionally use Expand Brands to discover related brands via AI, or Find Competitors to discover competitors automatically.

Without brand tracking configured, the Visibility dashboard will show 0.0 scores even if data exists.

Step 3: Add Keywords

Go to Settings > Keywords. Add search queries that represent how customers search for your products:

Use natural language queries: "best hotels in Barcelona", "enterprise project management software"
Add multiple keywords to track different aspects of your market
Keywords are used when running analysis

Step 4: Configure Query Prompts (Personas)

Go to Settings > Query Prompts. Define persona-based search templates that shape how AI engines are queried:

Each prompt contains a {keyword} placeholder that gets substituted during analysis
Example: "As a parent travelling with 3 young kids, {keyword}. What are the best family-friendly options?"
Create multiple personas to see how different traveler types get different AI recommendations
Enable/disable individual prompts to control which run during analysis
Each analysis run executes all enabled prompts × all keywords × all providers

Step 5: Run Analysis

Go to Operations > Run Analysis:

Select which keywords to analyze (or all)
Click Start Analysis
Monitor progress in the execution view
Analysis typically takes 3-5 minutes depending on keyword count

For recurring analysis, go to Operations > Schedule to set up automated runs with EventBridge.

Step 6: Explore Results

After analysis completes, explore your data:

Insights Section:

Dashboard – Overview stats: total searches, citations, pages crawled
Visibility – Your visibility score vs competitors, with 30-day trends. Shows share of voice by provider.
Brand Mentions – Every mention of your brand and competitors with sentiment (positive/negative/neutral) and ranking context
Citations – All URLs cited by AI models, ranked by frequency. Click any URL to see breakdown by keyword and provider.
Prompt Insights – How AI models frame their responses to your keywords
Citation Gaps – Sources that cite competitors but not you. This is your PR/content target list.
Action Center – Prioritized recommendations based on your data

Research Section:

Keyword Research – Expand your keyword list using AI with live web search. Analyze competitor websites to discover keywords they target. Both operations run asynchronously — results appear after 30-60 seconds.

Content Section:

Content Studio – Generate content briefs and outlines based on citation gap analysis, with output language selection (English, Spanish, French, or any language). Helps you create content that AI models are more likely to cite.

Data Section:

Recent Searches – Your analysis history with full AI responses
Raw Responses – Browse raw JSON responses and screenshots stored in S3

Technical Details

AI Model Configuration

The default OpenAI model is gpt-5-mini (cost-effective, supports web search). To use a different model:

Go to Settings > AI Providers
The model can be overridden per provider via the ProviderConfig DynamoDB table
Set the model field for any provider to change the default (e.g., gpt-5.2 for higher quality)

Content generation (Content Studio) uses Amazon Bedrock Claude Haiku 4.5 — this runs on your AWS account and doesn't require an external API key.

Retry Logic

All API clients implement exponential backoff (5 retries, ~35s max wait)
Handles rate limits (429), server errors (5xx), timeouts, and connection errors
Step Functions retry on Lambda-level failures only

Concurrency

Keyword processing: 3 concurrent
Citation crawling: 3 concurrent per keyword

Security

User Registration

By default, self-registration is disabled — only administrators can create user accounts via the Cognito console or AWS CLI. This is the recommended setting for production deployments.

To allow users to sign up themselves, change selfSignUpEnabled in lib/constructs/auth.ts:

const userPool = new UserPool(this, "userPool", {
  selfSignUpEnabled: true,  // set to false for admin-only invites
  ...
});

Then redeploy with cdk deploy. When self-registration is disabled, invite users via:

aws cognito-idp admin-create-user \
  --user-pool-id <your-pool-id> \
  --username user@example.com \
  --user-attributes Name=email,Value=user@example.com \
  --desired-delivery-mediums EMAIL

Other Security Features

API keys encrypted in AWS Secrets Manager
DynamoDB encryption at rest
IAM least-privilege roles
CloudWatch audit trails

Federated Identity with Azure AD (Entra ID)

To allow users to sign in with their corporate Azure AD credentials instead of Cognito-managed passwords:

Register an application in Azure AD: Go to Azure Portal > App registrations > New registration. Set the redirect URI to https://<your-cognito-domain>.auth.<region>.amazoncognito.com/oauth2/idpresponse.
Note the Azure AD details: Application (client) ID, Directory (tenant) ID, and create a client secret under Certificates & secrets.
Add Azure AD as a Cognito identity provider: In the AWS Console, go to Cognito > User Pools > your pool > Sign-in experience > Add identity provider > OIDC. Configure:
- Provider name: AzureAD
- Client ID: your Azure application client ID
- Client secret: your Azure client secret
- Issuer URL: https://login.microsoftonline.com/<tenant-id>/v2.0
- Scopes: openid email profile
- Map attributes: email → email, sub → username
Configure the Cognito hosted UI domain: Under App integration > Domain, set up a Cognito domain or custom domain.
Update the app client: Under App integration > App clients, enable the AzureAD identity provider and configure the allowed OAuth flows (Authorization code grant) and scopes (openid, email, profile).
Update the frontend: In lib/constructs/auth.ts, add the identity provider to the user pool client's supportedIdentityProviders and update the OAuth callback URLs if needed.

For detailed steps, see the AWS documentation on adding OIDC identity providers to Cognito.

Development

npm run build          # Build TypeScript
npm run watch          # Watch mode
npm run synth          # Synthesize CloudFormation
npm run deploy         # Full deployment (recommended)
npm run deploy:full    # Deploy + clear CloudFront cache
npm run clear-cache    # Clear CloudFront cache only

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.eslint-rules		.eslint-rules
.github		.github
bin		bin
docs		docs
lambda		lambda
lib		lib
scripts		scripts
web		web
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
THIRD_PARTY_LICENSES.md		THIRD_PARTY_LICENSES.md
architecture-diagram.png		architecture-diagram.png
cdk.json		cdk.json
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

Citation Analysis System

Architecture

Components

Project Structure

Prerequisites

Setup

Quick Start (Automated Deployment)

Manual Setup (Advanced)

1. Install Dependencies

2. Bootstrap CDK (First Time Only)

3. First Deploy (creates the stack)

4. Build the Web Dashboard

5. Second Deploy (uploads the configured dashboard)

Configure Your First Analysis

Visualization Dashboard

Dashboard Features

Usage Guide

Step 1: Configure AI Providers

Step 2: Configure Brand Tracking

Step 3: Add Keywords

Step 4: Configure Query Prompts (Personas)

Step 5: Run Analysis

Step 6: Explore Results

Technical Details

AI Model Configuration

Retry Logic

Concurrency

Security

User Registration

Other Security Features

Federated Identity with Azure AD (Entra ID)

Development

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors

Uh oh!

Languages