Skip to content

Troubleshooting

Rain Zhang edited this page Nov 6, 2025 · 2 revisions

Troubleshooting

Table of Contents

  1. Introduction
  2. Common Installation Issues
  3. Configuration Problems
  4. Device Communication Issues
  5. Metadata Service Connectivity
  6. Storage Backend Issues
  7. Error Code Reference
  8. Diagnostic Tools and Logging
  9. Performance Troubleshooting
  10. Support Resources

Introduction

The Post-Quantum WebAuthn Platform is a sophisticated authentication system that combines traditional WebAuthn protocols with post-quantum cryptographic algorithms. This troubleshooting guide addresses common issues encountered during installation, configuration, and operation of the platform.

The platform consists of several key components:

  • WebAuthn Server: Flask-based web server handling authentication requests
  • HID Layer: Hardware interface for USB/FIDO devices
  • Metadata Service: FIDO Alliance Metadata Service integration
  • Storage Backends: Local and cloud storage for credentials
  • Post-Quantum Cryptography: liboqs integration for quantum-resistant algorithms

Common Installation Issues

liboqs Loading Failures

Problem: The platform fails to initialize due to liboqs loading issues.

Symptoms:

  • ImportError: "oqs bindings are unavailable"
  • Application startup failures
  • Quantum-resistant algorithm unavailability

Diagnosis Steps:

  1. Check liboqs installation:
python -c "import oqs; print(oqs.get_enabled_sig_mechanisms())"
  1. Verify library paths:
ldd prebuilt_liboqs/linux-x86_64/lib/liboqs.so
  1. Check environment variables:
echo $LD_LIBRARY_PATH
echo $PYTHONPATH

Solutions:

  1. Missing Dependencies: Install required system libraries:
sudo apt-get install build-essential cmake pkg-config
  1. Library Path Issues: Set appropriate environment variables:
export LD_LIBRARY_PATH=/path/to/liboqs:$LD_LIBRARY_PATH
export PYTHONPATH=/path/to/python/modules:$PYTHONPATH
  1. Version Compatibility: Ensure liboqs version matches requirements:
python -c "import oqs; print(oqs.version())"

Section sources

  • pqc.py
  • prebuilt_liboqs/linux-x86_64/include/oqs/common.h

Environment Variable Configuration

Problem: Incorrect or missing environment variables causing startup failures.

Common Variables:

  • FIDO_SERVER_SECRET_KEY: Session encryption key
  • FIDO_SERVER_RP_ID: Relying Party identifier
  • FIDO_SERVER_GCS_BUCKET: Google Cloud Storage bucket
  • FIDO_SERVER_GCS_CREDENTIALS_FILE: Service account credentials

Diagnosis:

# Check all FIDO-related environment variables
env | grep FIDO_SERVER_

# Verify specific variables
echo $FIDO_SERVER_SECRET_KEY
echo $FIDO_SERVER_RP_ID

Solutions:

  1. Secret Key Generation:
# Generate secure secret key
openssl rand -hex 32 > secret.key
export FIDO_SERVER_SECRET_KEY_FILE=$(pwd)/secret.key
  1. RP ID Configuration:
# Set relying party identifier
export FIDO_SERVER_RP_ID="your-domain.com"

Section sources

  • config.py

Configuration Problems

Incorrect Environment Variables

Problem: Misconfigured environment variables leading to operational failures.

Common Issues:

  • Invalid secret key format
  • Malformed RP ID
  • Incorrect storage backend configuration

Diagnostic Commands:

# Test secret key configuration
python -c "
from server.server.config import _resolve_secret_key
print('Secret key resolution successful')
"

# Validate RP ID
python -c "
from server.server.config import determine_rp_id
print(determine_rp_id())
"

Configuration Validation:

# Example configuration validation
import os
from server.server.config import app

# Check required environment variables
required_vars = ['FIDO_SERVER_SECRET_KEY', 'FIDO_SERVER_RP_ID']
for var in required_vars:
    if not os.environ.get(var):
        print(f"Missing required environment variable: {var}")

# Validate storage configuration
if os.environ.get('FIDO_SERVER_GCS_BUCKET'):
    print("Google Cloud Storage enabled")
else:
    print("Using local storage")

Section sources

  • config.py

Storage Backend Connectivity Issues

Problem: Unable to connect to storage backends (local or cloud).

Local Storage Issues:

  • Permission denied errors
  • Disk space limitations
  • Directory creation failures

Cloud Storage Issues:

  • Authentication failures
  • Network connectivity problems
  • Bucket access permissions

Diagnostic Steps:

  1. Local Storage Test:
# Test directory permissions
mkdir -p server/server/session-credentials/test
touch server/server/session-credentials/test/testfile
rm server/server/session-credentials/test/testfile
rmdir server/server/session-credentials/test
  1. Cloud Storage Test:
# Test GCS connectivity
python -c "
from server.server.cloud_storage import ensure_ready
ensure_ready()
print('Cloud storage ready')
"

Solutions:

  1. Permission Issues:
# Fix directory permissions
chmod 755 server/server/session-credentials
chmod 644 server/server/session-credentials/*
  1. Network Connectivity:
# Test network connectivity
curl -I https://storage.googleapis.com
ping storage.googleapis.com

Section sources

  • storage.py
  • cloud_storage.py

Device Communication Issues

HID Device Detection Problems

Problem: FIDO/HID devices not detected or communication failures.

Symptoms:

  • "No FIDO devices found" errors
  • Device enumeration failures
  • Communication timeouts

HID Layer Architecture:

classDiagram
class CtapDevice {
+capabilities : int
+call(cmd, data, event, on_keepalive) bytes
+close() void
+list_devices() Iterator~CtapDevice~
}
class CtapHidConnection {
+read_packet() bytes
+write_packet(data) void
+close() void
}
class FileCtapHidConnection {
+handle : int
+descriptor : HidDescriptor
+read_packet() bytes
+write_packet(data) void
+close() void
}
class HidDescriptor {
+path : str
+vid : int
+pid : int
+report_size_in : int
+report_size_out : int
+product_name : str
+serial_number : str
}
CtapDevice --> CtapHidConnection : uses
CtapHidConnection <|-- FileCtapHidConnection : implements
FileCtapHidConnection --> HidDescriptor : manages
Loading

Diagram sources

  • base.py

Diagnostic Commands:

# List USB devices
lsusb

# Check HID devices
ls /dev/hidraw*

# Test device access
cat /dev/hidraw0 | hexdump -C | head -10

Solutions:

  1. Device Permissions:
# Add user to dialout group
sudo usermod -a -G dialout $USER

# Set device permissions
sudo chmod 660 /dev/hidraw*
  1. Device Enumeration:
# Test device discovery
from fido2.hid import CtapHidDevice
devices = list(CtapHidDevice.list_devices())
print(f"Found {len(devices)} devices")

Section sources

  • base.py

CTAP2 Error Codes

Problem: CTAP2 protocol errors during device communication.

Common CTAP2 Error Codes:

Error Code Description Solution
0x01 INVALID_COMMAND Check command format and parameters
0x02 INVALID_PARAMETER Validate input parameters
0x03 INVALID_LENGTH Verify data length constraints
0x05 TIMEOUT Increase timeout values
0x21 PROCESSING Wait for device processing
0x2F USER_ACTION_TIMEOUT Reduce user interaction timeout
0x31 PIN_INVALID Reset PIN or use correct PIN
0x35 PIN_NOT_SET Set PIN before use

Error Handling Implementation:

from fido2.ctap import CtapError

try:
    # Device communication
    response = device.call(command, data)
except CtapError as e:
    if e.code == CtapError.ERR.TIMEOUT:
        # Handle timeout - increase timeout or retry
        pass
    elif e.code == CtapError.ERR.PIN_INVALID:
        # Handle PIN issues
        pass
    else:
        # Log unknown error
        logger.error(f"CTAP error: {e}")

Section sources

  • ctap.py

Metadata Service Connectivity

MDS Validation Errors

Problem: FIDO Metadata Service (MDS) connectivity and validation issues.

Metadata Service Architecture:

sequenceDiagram
participant Client as WebAuthn Client
participant Server as WebAuthn Server
participant MDS as FIDO MDS Service
participant Verifier as Metadata Verifier
Client->>Server : Register/Authenticate Request
Server->>MDS : Fetch Metadata Blob
MDS-->>Server : Metadata Blob + Signature
Server->>Verifier : Validate Metadata
Verifier->>Verifier : Verify Signature
Verifier->>Verifier : Check Revocation Status
Verifier-->>Server : Validation Result
Server-->>Client : Response with Metadata Info
Loading

Diagram sources

  • mds3.py

Common MDS Issues:

  1. Network Connectivity: Unable to reach MDS endpoints
  2. Certificate Validation: SSL/TLS certificate issues
  3. Revocation Checking: Device revocation status failures
  4. Metadata Parsing: Corrupted or malformed metadata

Diagnostic Commands:

# Test MDS connectivity
curl -I https://mds3.fidoalliance.org/

# Check certificate chain
openssl s_client -connect mds3.fidoalliance.org:443 -showcerts

# Test metadata download
curl https://mds3.fidoalliance.org/

Solutions:

  1. Network Issues:
# Configure proxy if needed
export HTTPS_PROXY=http://proxy.company.com:8080
export HTTP_PROXY=http://proxy.company.com:8080
  1. Certificate Issues:
# Add custom CA certificates
import ssl
ssl_context = ssl.create_default_context()
ssl_context.load_verify_locations('/path/to/custom/ca-bundle.crt')

Section sources

  • mds3.py
  • config.py

Certificate Validation Failures

Problem: X.509 certificate validation errors in attestation chains.

Validation Process:

flowchart TD
Start([Attestation Received]) --> ParseCert["Parse Certificate Chain"]
ParseCert --> CheckFormat{"Valid Format?"}
CheckFormat --> |No| FormatError["Certificate Format Error"]
CheckFormat --> |Yes| VerifyChain["Verify Certificate Chain"]
VerifyChain --> CheckDates{"Certificates Valid?"}
CheckDates --> |No| DateError["Certificate Expired"]
CheckDates --> |Yes| CheckRoot{"Trusted Root?"}
CheckRoot --> |No| TrustError["Untrusted Root"]
CheckRoot --> |Yes| CheckRevocation["Check Revocation"]
CheckRevocation --> Revoked{"Revoked?"}
Revoked --> |Yes| RevocationError["Device Revoked"]
Revoked --> |No| Success["Validation Successful"]
FormatError --> End([Validation Failed])
DateError --> End
TrustError --> End
RevocationError --> End
Success --> End
Loading

Diagram sources

  • attestation.py

Common Certificate Issues:

  • Expired certificates
  • Untrusted root certificates
  • Certificate chain validation failures
  • Revoked device certificates

Diagnostic Tools:

# Certificate validation debugging
from cryptography.x509 import load_pem_x509_certificate
from cryptography.hazmat.backends import default_backend

def debug_certificate(cert_pem):
    cert = load_pem_x509_certificate(cert_pem.encode(), default_backend())
    print(f"Issuer: {cert.issuer}")
    print(f"Subject: {cert.subject}")
    print(f"Not Valid Before: {cert.not_valid_before}")
    print(f"Not Valid After: {cert.not_valid_after}")
    print(f"Serial Number: {cert.serial_number}")

Section sources

  • attestation.py

Storage Backend Issues

Local Storage Problems

Problem: Credential storage failures in local filesystem.

Common Issues:

  • Disk space exhaustion
  • Permission denied errors
  • File corruption
  • Concurrent access conflicts

Diagnostic Commands:

# Check disk space
df -h server/server/session-credentials/

# Verify permissions
ls -la server/server/session-credentials/

# Test file operations
python -c "
import os
import tempfile
temp_dir = tempfile.mkdtemp()
test_file = os.path.join(temp_dir, 'test')
with open(test_file, 'w') as f:
    f.write('test')
os.remove(test_file)
os.rmdir(temp_dir)
print('File operations successful')
"

Solutions:

  1. Disk Space Management:
# Clean up old credentials
find server/server/session-credentials/ -name "*.pkl" -mtime +30 -delete

# Monitor disk usage
du -sh server/server/session-credentials/
  1. Permission Fixes:
# Fix ownership
sudo chown -R www-data:www-data server/server/session-credentials/

# Fix permissions
sudo chmod -R 755 server/server/session-credentials/

Section sources

  • storage.py

Cloud Storage Issues

Problem: Google Cloud Storage connectivity and authentication issues.

Common Issues:

  • Service account authentication failures
  • Bucket access permission errors
  • Network connectivity problems
  • Rate limiting and quota exceeded

Authentication Diagnostics:

# Test GCS authentication
gcloud auth list
gcloud config get-value project

# Test bucket access
gsutil ls gs://your-bucket-name/

Configuration Validation:

# Cloud storage configuration test
from server.server.cloud_storage import _build_client, _ensure_bucket

try:
    client = _build_client()
    bucket = _ensure_bucket()
    print(f"Successfully connected to bucket: {bucket.name}")
except Exception as e:
    print(f"Cloud storage configuration error: {e}")

Solutions:

  1. Service Account Setup:
# Download service account key
gcloud iam service-accounts keys create key.json \
    --iam-account=your-service-account@your-project.iam.gserviceaccount.com

# Set environment variables
export GOOGLE_APPLICATION_CREDENTIALS=$(pwd)/key.json
export FIDO_SERVER_GCS_BUCKET=your-bucket-name
  1. Network Configuration:
# Configure firewall rules
gcloud compute firewall-rules create allow-gcs-access \
    --allow tcp:443 \
    --source-ranges 0.0.0.0/0 \
    --target-tags gcs-access

Section sources

  • cloud_storage.py

Error Code Reference

WebAuthn API Exceptions

Common WebAuthn Errors:

Error Category HTTP Status Description Resolution
Invalid Request 400 Malformed request parameters Validate input format
Unauthorized 401 Missing or invalid authentication Check authentication tokens
Forbidden 403 Insufficient permissions Verify user permissions
Not Found 404 Resource not found Check resource existence
Conflict 409 Resource conflict Resolve conflicting operations
Internal Error 500 Server-side error Check server logs

CTAP2 Status Codes

CTAP2 Command Status Codes:

Code Name Description Action
0x00 SUCCESS Operation completed successfully Continue with next step
0x01 INVALID_COMMAND Unsupported or invalid command Check command specification
0x02 INVALID_PARAMETER Invalid parameter value Validate parameter constraints
0x03 INVALID_LENGTH Data length exceeds limits Check data size limits
0x05 TIMEOUT Operation timed out Increase timeout or retry
0x21 PROCESSING Device is processing request Wait for completion
0x2F USER_ACTION_TIMEOUT User action timeout Reduce timeout values
0x31 PIN_INVALID PIN verification failed Reset or correct PIN
0x35 PIN_NOT_SET PIN not configured Set PIN before use
0x7F OTHER Other unspecified error Check device logs

Error Code Translation:

def translate_ctap_error(code):
    error_map = {
        0x00: "Success",
        0x01: "Invalid Command",
        0x02: "Invalid Parameter",
        0x03: "Invalid Length",
        0x05: "Timeout",
        0x21: "Processing",
        0x2F: "User Action Timeout",
        0x31: "PIN Invalid",
        0x35: "PIN Not Set",
        0x7F: "Other Error"
    }
    return error_map.get(code, f"Unknown Error (0x{code:02X})")

Section sources

  • ctap.py

Diagnostic Tools and Logging

Server Logs Analysis

Log Configuration: The platform uses Flask's built-in logging with configurable levels.

Log Locations:

  • Application logs: Standard output/stderr
  • Access logs: Flask development server
  • Error logs: Python exception traces

Log Analysis Commands:

# Tail application logs
tail -f /var/log/webauthn-server.log

# Filter error logs
grep -i error /var/log/webauthn-server.log

# Search for specific issues
grep -i "device.*not.*found" /var/log/webauthn-server.log
grep -i "metadata.*error" /var/log/webauthn-server.log

Structured Logging Example:

# Enhanced logging with context
import logging
from flask import request

logger = logging.getLogger(__name__)

@app.before_request
def log_request_info():
    logger.debug('Headers: %s', request.headers)
    logger.debug('Body: %s', request.get_data())

@app.errorhandler(Exception)
def handle_exception(e):
    logger.error('Unhandled exception: %s', str(e), exc_info=True)
    return {'error': 'Internal server error'}, 500

Section sources

  • app.py

Device Communication Traces

HID Layer Debugging:

# Enable HID debugging
import logging
logging.getLogger('fido2.hid').setLevel(logging.DEBUG)

# Trace device communication
from fido2.hid import CtapHidDevice

def debug_device_communication():
    devices = list(CtapHidDevice.list_devices())
    for dev in devices:
        print(f"Device: {dev.descriptor}")
        try:
            # Send ping command
            dev.call(0x01, b'\x00' * 8)
            print("Ping successful")
        except Exception as e:
            print(f"Ping failed: {e}")

Packet Capture:

# Capture USB traffic (requires appropriate drivers)
usbmon -t

# Monitor HID events
hidlisten -v

Section sources

  • base.py

Metadata Service Diagnostics

Metadata Validation Tools:

# Metadata service health check
from server.server.metadata import ensure_metadata_bootstrapped

def check_metadata_health():
    try:
        ensure_metadata_bootstrapped()
        print("Metadata service healthy")
    except Exception as e:
        print(f"Metadata service error: {e}")

# Certificate chain validation
from server.server.attestation import verify_attestation_chain

def validate_attestation_chain(attestation_data):
    try:
        result = verify_attestation_chain(attestation_data)
        if result['root_valid']:
            print("Certificate chain valid")
        else:
            print(f"Certificate chain invalid: {result['errors']}")
    except Exception as e:
        print(f"Chain validation error: {e}")

Section sources

  • mds3.py

Performance Troubleshooting

Startup Performance Issues

Problem: Slow server startup or dependency loading.

Startup Process:

flowchart TD
Start([Server Start]) --> LoadConfig["Load Configuration"]
LoadConfig --> WarmDeps["Warm Dependencies"]
WarmDeps --> LoadMetadata["Load Metadata"]
LoadMetadata --> LoadStorage["Initialize Storage"]
LoadStorage --> LoadDevices["Detect Devices"]
LoadDevices --> Ready["Server Ready"]
WarmDeps --> MetadataCheck{"Metadata Available?"}
MetadataCheck --> |No| RetryMetadata["Retry Metadata"]
MetadataCheck --> |Yes| StorageCheck{"Storage Ready?"}
RetryMetadata --> MetadataCheck
StorageCheck --> |No| RetryStorage["Retry Storage"]
StorageCheck --> |Yes| DeviceCheck{"Devices Found?"}
RetryStorage --> StorageCheck
DeviceCheck --> |No| RetryDevices["Retry Devices"]
DeviceCheck --> |Yes| Ready
RetryDevices --> DeviceCheck
Loading

Diagram sources

  • startup.py

Performance Optimization:

# Startup performance monitoring
import time
from server.server.startup import warm_up_dependencies

def monitor_startup_performance():
    start_time = time.time()
    
    try:
        warm_up_dependencies()
        elapsed = time.time() - start_time
        print(f"Startup completed in {elapsed:.2f} seconds")
    except Exception as e:
        elapsed = time.time() - start_time
        print(f"Startup failed after {elapsed:.2f} seconds: {e}")

# Parallel dependency loading
import concurrent.futures

def parallel_warmup():
    with concurrent.futures.ThreadPoolExecutor() as executor:
        futures = [
            executor.submit(lambda: ensure_metadata_bootstrapped()),
            executor.submit(lambda: cloud_storage.ensure_ready()),
            executor.submit(lambda: session_metadata_store.ensure_session("__startup__")),
        ]
        
        for future in concurrent.futures.as_completed(futures):
            try:
                future.result()
            except Exception as e:
                logger.warning(f"Parallel warmup failed: {e}")

Section sources

  • startup.py

Memory and Resource Usage

Resource Monitoring:

# Monitor memory usage
ps aux | grep python
top -p $(pgrep -f "python.*webauthn")

# Check disk usage
du -sh server/server/session-credentials/
du -sh server/server/static/

# Monitor network connections
netstat -tulpn | grep :5000
lsof -i :5000

Memory Optimization:

# Memory profiling
import psutil
import gc

def monitor_memory_usage():
    process = psutil.Process()
    mem_info = process.memory_info()
    print(f"Memory usage: {mem_info.rss / 1024 / 1024:.2f} MB")
    
    # Force garbage collection
    gc.collect()
    mem_info = process.memory_info()
    print(f"After GC: {mem_info.rss / 1024 / 1024:.2f} MB")

Support Resources

Official Documentation

Primary Resources:

Platform-Specific Documentation:

Community Support

Getting Help:

  1. GitHub Issues: Report bugs and feature requests
  2. Stack Overflow: Tag with "post-quantum-webauthn"
  3. Discord Channels: Join community discussions
  4. Mailing Lists: Subscribe to development updates

Bug Reporting Guidelines:

## Bug Report Template

**Environment**:
- OS: [e.g., Ubuntu 22.04]
- Python Version: [e.g., 3.9.7]
- liboqs Version: [e.g., 0.14.1]
- Browser: [e.g., Chrome 115]

**Steps to Reproduce**:
1. [First step]
2. [Second step]
3. [Third step]

**Expected Behavior**:
[Description of expected behavior]

**Actual Behavior**:
[Description of actual behavior]

**Logs**:

[Paste relevant log output here]


**Additional Context**:
[Any additional information that might help diagnose the issue]

Escalation Paths

Issue Severity Levels:

Severity Description Response Time Escalation Path
Critical System down, data loss 1 hour Immediate team contact
High Major functionality broken 4 hours Team lead notification
Medium Feature degradation 24 hours Regular support channels
Low Minor issues, enhancements 1 week Community forum

Escalation Procedures:

  1. Self-Diagnosis: Attempt to resolve using documentation
  2. Community Support: Seek help from community channels
  3. Official Support: Contact vendor support for critical issues
  4. Security Issues: Report immediately to security team

Contact Information:

Contributing to the Project

Development Setup:

# Clone repository
git clone https://github.com/rainzhang05/postquantum-webauthn-platform.git
cd postquantum-webauthn-platform

# Set up virtual environment
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt
pip install -e .

# Run tests
pytest tests/

Development Guidelines:

  • Follow PEP 8 coding standards
  • Write comprehensive tests
  • Update documentation
  • Submit pull requests with clear descriptions

Section sources

  • requirements.txt
  • test_storage.py

Post-Quantum WebAuthn Platform

Getting Started

Architectural Foundations

Cryptography & Security

Authentication Platform

Core Protocol

Flows & Interfaces

Authenticator Capabilities

Server Platform

Frontend Platform

Architecture

Interaction & Utilities

Metadata Service (MDS)

Storage & Data Management

Data Models & Encoding

API Reference

Cross-Platform & HID

Operations & Troubleshooting

Glossary & References

Clone this wiki locally