Skip to content

Latest commit

 

History

History
586 lines (484 loc) · 19.5 KB

File metadata and controls

586 lines (484 loc) · 19.5 KB

MTC Patent Analytics - Component Documentation

Last Updated: July 26, 2025
Repository: mtc-patent-analytics
Purpose: Detailed technical documentation of key components and modules


📦 Production Python Packages

PizNet - Patent Intelligence Platform

Path: /piznet/
Type: Full-featured Python package
Installation: pip install -e .

Architecture Overview

piznet/
├── analyzers/           # Analysis modules
│   ├── technology.py    # Technology trend analysis
│   ├── regional.py      # Geographic analysis
│   ├── applicant_analyzer.py  # Applicant intelligence
│   ├── family_analyzer.py     # Patent family analysis
│   └── trends.py        # Trend identification
├── data_access/         # Data provider integrations
│   ├── patstat_client.py      # PATSTAT database client
│   ├── ops_client.py          # EPO OPS API client
│   ├── ipc_database_client.py # IPC classification client
│   ├── cpc_database_client.py # CPC classification client
│   └── nuts_mapper.py         # Geographic mapping
├── processors/          # Data processing modules
│   ├── geographic.py    # Geographic data processing
│   ├── classification.py # Patent classification processing
│   ├── citation.py      # Citation network processing
│   └── pipeline.py      # Processing pipeline orchestration
├── visualizations/      # Visualization components
│   ├── charts.py        # Chart generation
│   ├── maps.py         # Geographic visualization
│   ├── dashboards.py   # Dashboard creation
│   └── factory.py      # Visualization factory pattern
└── config/             # Configuration management
    ├── api_config.yaml
    ├── database_config.yaml
    └── visualization_config.yaml

Key Components

Analyzers (analyzers/)

  • TechnologyAnalyzer - Patent technology trend analysis with IPC/CPC classification
  • RegionalAnalyzer - Geographic patent distribution with NUTS region support
  • ApplicantAnalyzer - Applicant intelligence and organization analysis
  • FamilyAnalyzer - Patent family relationship mapping
  • TrendsAnalyzer - Time-series trend identification and forecasting

Data Access (data_access/)

  • PatstatClient - Direct PATSTAT database integration with query optimization
  • OpsClient - EPO OPS API wrapper with authentication and rate limiting
  • IPCDatabaseClient - IPC classification hierarchy access
  • CPCDatabaseClient - CPC classification hierarchy access
  • NutsMapper - European NUTS region mapping and geographic enrichment

Processors (processors/)

  • GeographicProcessor - Geographic data enrichment and standardization
  • ClassificationProcessor - Patent classification parsing and hierarchy resolution
  • CitationProcessor - Citation network analysis and relationship mapping
  • Pipeline - Orchestration framework for multi-stage analysis workflows

Visualizations (visualizations/)

  • ChartsFactory - Plotly-based chart generation with customizable themes
  • MapsVisualizer - Geographic visualization with Folium integration
  • DashboardBuilder - Interactive dashboard creation with filtering
  • Factory - Visualization factory pattern for consistent output

Configuration System

YAML-driven configuration with environment-specific overrides:

  • API Configuration - Database connections and API credentials
  • Processing Configuration - Analysis parameters and thresholds
  • Visualization Configuration - Chart styling and output formats

PatIntelli - Patent Intelligence Framework

Path: /patintelli/
Type: Data provider abstraction framework

Architecture Overview

patintelli/
├── src/
│   ├── config/          # Configuration management
│   │   ├── manager.py   # Configuration manager
│   │   └── providers.py # Data provider configuration
│   ├── data_providers/  # Data source abstractions
│   │   ├── base.py      # Base data provider interface
│   │   └── __init__.py
│   ├── analyzers/       # Analysis modules
│   │   ├── base.py      # Base analyzer interface
│   │   └── regional.py  # Regional analysis implementation
│   └── processors/      # Data processing
│       ├── base.py      # Base processor interface
│       └── search.py    # Search and query processing
├── config/              # YAML configuration files
│   ├── analysis.yaml
│   ├── data_providers.yaml
│   ├── processing.yaml
│   └── visualization.yaml
└── tests/               # Test suite
    └── test_real_connections.py

Key Components

Configuration Management (src/config/)

  • ConfigManager - Centralized configuration loading and validation
  • ProvidersConfig - Data provider registration and initialization
  • Environment-specific configuration with validation

Data Provider Abstraction (src/data_providers/)

  • BaseDataProvider - Abstract interface for all data sources
  • Pluggable architecture for adding new data sources
  • Standardized query interface across different databases

Analysis Framework (src/analyzers/)

  • BaseAnalyzer - Abstract analyzer interface with common functionality
  • RegionalAnalyzer - Geographic analysis implementation
  • Extensible framework for domain-specific analysis modules

Processing Pipeline (src/processors/)

  • BaseProcessor - Abstract processor for data transformation
  • SearchProcessor - Query processing and result standardization
  • Chainable processing components for complex workflows

DeepTechFinder - University Patent Analysis

Path: /deeptechfinder/
Type: Specialized analysis toolkit for German universities

Architecture Overview

deeptechfinder/
├── src/
│   ├── core/            # Core functionality
│   │   ├── university_engine.py  # Main analysis engine
│   │   ├── config.py    # Configuration management
│   │   └── exceptions.py # Custom exceptions
│   ├── analysis/        # Analysis modules
│   ├── etl/            # Extract, Transform, Load
│   ├── export/         # Data export utilities
│   ├── utils/          # Utility functions
│   └── visualization/  # Visualization components
├── cli/                # Command-line interface
│   └── main.py         # CLI entry point
├── config/             # Configuration files
│   └── settings.yaml   # Application settings
├── data/               # Input data files
│   └── EPO_DeepTechFinder_20250513_DE_Uni_Top100.csv
└── legacy/             # Legacy notebooks and scripts
    ├── notebooks/      # Analysis notebooks
    ├── output/         # Generated results
    └── scripts/        # Python scripts

Key Features

University Analysis Engine (src/core/university_engine.py)

  • Processing of EPO Deep Tech Finder university data
  • Automated university-specific patent analysis
  • Integration with EPO OPS API for patent enrichment

EPO OPS Integration

  • Critical Discovery: German university patents are application numbers, not publication numbers
  • Endpoint: published-data/application/epodoc/EP{number}/biblio
  • Format Processing: Handle leading zeros for patents from 2000s
  • Authentication: OAuth2 with credentials management

Data Processing

  • CSV data loading and validation
  • Patent number formatting and standardization
  • Bibliographic data extraction and enrichment
  • Export to multiple formats (CSV, JSON, PDF reports)

Legacy Analysis (legacy/notebooks/)

  • University-specific analysis notebooks (TU Dresden, Humboldt, etc.)
  • Comprehensive output data with complete bibliographic information
  • Visual analysis results and technology mapping

🌐 Web Applications

IPC Tree Explorer

Path: /ipc-tree-explorer/
Type: Modern SvelteKit web application
Credit: Matze

Technology Stack

Frontend: SvelteKit + TypeScript
Styling: TailwindCSS
Visualization: D3.js + Custom components
Build: Vite
Package Manager: pnpm
Testing: Playwright

Architecture Overview

ipc-tree-explorer/
├── src/
│   ├── routes/          # SvelteKit routes
│   │   ├── +page.svelte # Main application page
│   │   ├── +layout.svelte # Application layout
│   │   └── Header.svelte  # Navigation header
│   ├── lib/             # Library modules
│   │   ├── stores.js    # Svelte stores for state management
│   │   ├── ipc_class_*.ts # IPC classification data
│   │   └── cpc_class_*.ts # CPC classification data
│   └── component/       # Reusable components
├── static/              # Static assets
│   ├── ipccpc/         # Classification data (JSON)
│   ├── fonts/          # IBM Plex fonts
│   └── mtc_logo.svg    # Branding assets
└── tests/              # Playwright tests

Key Features

Visualization Modes

  • Radial Tree - Hierarchical classification exploration
  • Sankey Diagrams - Flow visualization between classification levels
  • Circle Packing - Nested hierarchy visualization
  • Interactive Navigation - Real-time exploration and filtering

Performance Optimization

  • Optimized for 15,000+ nodes
  • Lazy loading of classification data
  • Efficient state management with Svelte stores
  • Progressive enhancement for large datasets

Classification Support

  • Real-time switching between IPC and CPC systems
  • Complete hierarchy browsing (Section → Class → Subclass → Group)
  • Search and filtering capabilities
  • Export functionality for visualization states

IPC Browser

Path: /ipc-browser/
Type: SQLite-based classification browser
Credit: Tatjana, Johnny, Marc

Architecture Overview

ipc-browser/
├── ipc/                # Classification data
│   └── EN_ipc_scheme_20250101.xml
├── patent-classification-2025.db # SQLite database
├── ipc_database_builder.ipynb    # Database creation
└── ipc_plotly_visualization.ipynb # Visualization examples

Key Components

Database Layer

  • SQLite storage for IPC 2025 classification scheme
  • Optimized schema for hierarchical data
  • Fast querying with indexing on classification levels

Visualization Layer

  • Plotly-based interactive tree visualization
  • Hierarchical browsing with drill-down capabilities
  • Classification statistics and distribution analysis

Data Processing

  • XML parsing of official IPC schemes
  • Database population and validation
  • Classification hierarchy construction

📊 Specialized Analysis Modules

REE Analysis - Rare Earth Elements

Path: /ree_analysis/
Type: Domain-specific patent landscape analysis

Development Phases

ree_analysis/
├── 0-main/             # Core development
├── 1-input/            # Reference materials
├── 2-enhanced/         # Advanced workflows
├── 3-livedemo-template/ # Base templates
├── 4-livedemo*/        # Trial demonstrations
└── 5-archive/          # Historical development

Key Components

Live Demo Templates (3-livedemo-template/)

  • base_patent_notebook.ipynb - Foundation notebook template
  • ree_citation_analysis_prompt.md - Analysis methodology

Trial Runs (4-livedemo*/)

  • Progressive development iterations with timestamps
  • Complete session documentation and error resolution
  • Enhanced analysis workflows and market intelligence integration

Analysis Modules

  • Citation Analysis - Patent citation network mapping
  • Geographic Intelligence - Regional REE patent activity
  • Market Correlation - Patent-market data integration
  • Business Intelligence - Executive reporting and insights

FamilyTree - Patent Family Visualization

Path: /familytree/
Type: Patent family analysis toolkit
Credit: Anonymous EPO Examiner

Core Components

familytree/
├── patent_analysis/     # Core modules
│   ├── family_record.py    # Family data structures
│   ├── patent_processor.py # Patent data processing
│   ├── tree_creation.py    # Tree generation logic
│   ├── tree_processor.py   # Tree manipulation
│   └── helpers.py          # Utility functions
├── Divitree.ipynb      # Main visualization notebook
├── images/             # Example visualizations
└── README.md           # Documentation

Key Features

Family Processing

  • Patent family relationship detection
  • Priority claim analysis and mapping
  • Family member identification and classification
  • Temporal relationship analysis

Visualization Generation

  • Interactive family tree creation
  • Relationship mapping with visual indicators
  • Export capabilities for presentations
  • Integration with patent databases

Regional Mappings - Geographic Analysis

Path: /regionalmappings/
Type: Geographic patent analysis
Presented: EPO Patent Knowledge Forum 2024

Key Components

regionalmappings/
├── notebooks/          # Analysis notebooks
│   ├── patstat_nuts_de.ipynb # German NUTS analysis
│   └── patstat_nuts_de_structured.ipynb # Structured analysis
├── mappings/           # Geographic data
│   ├── NUTS_RG_20M_2013_4326.geojson # EU boundaries
│   └── nuts_mapping.csv # Mapping tables
├── output/             # Generated results
└── patentknowledgeforum2024.ipynb # Main presentation

Key Features

NUTS Integration

  • European NUTS region boundaries (GeoJSON)
  • District-level patent activity mapping
  • Administrative hierarchy support (NUTS 1/2/3)

Visualization Tools

  • PyGWalker integration for interactive exploration
  • Federal state comparison and ranking
  • Temporal trend analysis with geographic overlay
  • Custom visualization configurations

Analysis Capabilities

  • Regional patent concentration analysis
  • University patent activity by region
  • Technology distribution mapping
  • Comparative regional performance

🔧 Classification & Database Tools

IPC OPS - EPO API Integration

Path: /ipc-ops/
Type: EPO OPS API toolkit

Core Components

ipc-ops/
├── auth.py             # Authentication utilities
├── ipc_query.py        # Query implementation
├── ipc_query_interactive_tutorial.ipynb # Tutorial
└── .env                # API credentials (local)

Key Features

Authentication Management

  • OAuth2 token handling and refresh
  • Credential storage and security
  • Rate limiting and quota management

Query Interface

  • IPC classification data retrieval
  • Patent bibliographic data access
  • Structured response parsing
  • Error handling and retry logic

IPC PATSTAT - Database Integration

Path: /ipc-patstat/
Type: PATSTAT-based analysis

Components

ipc-patstat/
├── IPC_Subclass_Analysis.ipynb # Main analysis
├── output/             # Generated results
│   ├── ipc_analysis_overview_*.png
│   ├── ipc_section_summary_*.csv
│   └── ipc_subclass_*.csv
└── README.md           # Documentation

Analysis Capabilities

  • IPC subclass distribution analysis
  • Technology trend identification
  • Classification statistics and visualization
  • PATSTAT Global database integration

📚 Training & Educational Components

Training Materials

Path: /training/ & /input/TIP_Notebooks/
Type: Educational ecosystem
Credit: EPO and WIPO

PATSTAT In-Depth

input/TIP_Notebooks/patstat_in_depth/
├── patstat_global/     # TLS table analysis (30+ notebooks)
├── patstat-register/   # REG table analysis (25+ notebooks)
└── Readme.md           # Documentation

Coverage:

  • Complete PATSTAT table documentation
  • TLS (Transaction Log) tables analysis
  • REG (Register) tables exploration
  • Query patterns and best practices

EP Fulltext Training

input/TIP_Notebooks/training_code_ep_fulltext/
├── epFulltext_1.ipynb  # Introduction
├── epFulltext_2.ipynb  # Text processing
├── epFulltext_3.ipynb  # Analysis workflows
├── epFulltext_4.ipynb  # Advanced techniques
└── epFulltext__exercise_1.ipynb # Exercises

PATSTAT Basics

training/patstat/
├── Patstat_1.ipynb     # Database introduction
├── Patstat_2.ipynb     # Basic queries
├── Patstat_3.ipynb     # Advanced analysis
└── Patstat_4.ipynb     # Visualization

PATLIB Demonstration Archive

Path: /patlib/
Type: Staff training and demonstration materials

Archive Structure

patlib/archive/
├── trial_run_*/        # Timestamped sessions
├── enhancements/       # Advanced workflows
│   ├── documentation/  # Session transcripts
│   ├── guides/         # Training guides
│   ├── notebooks/      # Enhanced analysis
│   └── outputs/        # Generated results
└── trial_run_*_*/      # Specialized trials

Key Features

  • Session Documentation - Complete development transcripts
  • Error Solutions - Troubleshooting guides and resolutions
  • Progressive Enhancement - Iterative improvement workflows
  • Business Intelligence - Executive reporting templates

🔗 Integration & Configuration

Configuration Management

PizNet Configuration

# api_config.yaml
database:
  patstat_connection: "connection_string"
  ops_credentials: "oauth_config"

# visualization_config.yaml
charts:
  theme: "modern"
  color_palette: "patent_analytics"
  
# geographic_config.yaml
nuts_regions:
  geojson_path: "mappings/nuts_boundaries.geojson"
  mapping_table: "mappings/nuts_mapping.csv"

PatIntelli Configuration

# data_providers.yaml
providers:
  patstat:
    type: "database"
    connection: "patstat_config"
  ops:
    type: "api"
    authentication: "oauth2"

# analysis.yaml
workflows:
  regional_analysis:
    steps: ["data_extract", "geographic_enrich", "visualize"]

Environment Configuration

  • API Credentials - Secure credential storage
  • Database Connections - Environment-specific configurations
  • Output Paths - Configurable result destinations
  • Processing Parameters - Adjustable analysis thresholds

📊 Data Flow Architecture

Data Sources

  1. PATSTAT Database - Global patent statistical data
  2. EPO OPS API - Real-time patent data access
  3. Geographic Data - NUTS regions and administrative boundaries
  4. Market Data - USGS and commodity information
  5. Classification Data - IPC/CPC hierarchies and definitions

Processing Pipeline

  1. Data Extraction - Multi-source data retrieval
  2. Standardization - Format normalization and validation
  3. Enrichment - Geographic and classification enhancement
  4. Analysis - Domain-specific analytical processing
  5. Visualization - Interactive chart and map generation
  6. Export - Multiple format output (CSV, JSON, PDF, HTML)

Output Formats

  • Interactive Visualizations - HTML with Plotly/D3.js
  • Static Charts - PNG/SVG for presentations
  • Data Exports - CSV/Excel for further analysis
  • Reports - PDF executive summaries
  • Geographic Outputs - GeoJSON for GIS integration

This component documentation provides detailed technical insight into each module's architecture, key features, and integration patterns. For implementation details, refer to individual module CLAUDE.md and README.md files.