A starter kit for building ETL adapters that migrate content from your CMS to Arc XP. This starter kit provides both standalone and orchestrated (Temporal) implementations for extracting, identifying, transforming, and loading content into Arc XP. Use this as a learning resource and starting point for your own content migration project.
This starter kit helps you:
- Extract content from your source CMS
- Identify content types and generate Arc XP ANS IDs
- Transform source content into Arc XP ANS (Arc Native Specification) format
- Load transformed content into Arc XP via the Migration Center API
The starter kit includes two adapter implementations:
- Standalone Adapter (
/adapter) - For iterative development and testing - Temporal Adapter (
/adapter_temporal) - For production orchestrated execution
This starter kit does not address all Arc XP object types. The starter kit focuses on:
- Stories - Standard articles with text content
- Images - Standalone images and images embedded in stories
Content types not addressed include:
- Videos
- Image galleries
- Other Arc XP content types (authors, redirects, etc.)
Additionally, this starter kit does not address all possible content_elements that can be included in Arc XP stories. The examples demonstrate basic content elements (text, headings, lists, images), but you may need to extend the converters to handle additional content element types specific to your use case.
You can extend the starter kit to support additional content types and content elements by following the customization patterns in the Customization Guide.
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables (create
.envfile):BEARER_TOKEN=your_arc_xp_token ORG=your_organization_id ENV=sandbox # or production WEBSITE=your_website_id TEMPORAL_SERVER_URL=localhost:7233 # Optional, defaults to localhost:7233 EXTRACT_API_URL=http://127.0.0.1:8000/ # Optional, defaults to fake-news
-
Start the mock CMS service (for testing):
fastapi dev fake-news/main.py
-
Choose your adapter:
- For development/testing: See Standalone Adapter Guide
- For production: See Temporal Adapter Guide
- Customization Guide - START HERE - Guide for adapting this starter kit to your CMS
- Quick start checklist
- Selection of non-comprehensive customization points for each component
- Common customization patterns
- Examples and best practices
-
Standalone Adapter - Non-orchestrated implementation
- Best for: Iterative development, debugging, learning, testing modifications
- Run each script independently
- Manual data passing between steps
- See: Content mapping, extract, identify, transform, load processes
-
Temporal Adapter - Orchestrated implementation using Temporal
- Best for: Production deployments, automated workflows
- Automatic data flow between functions
- Built-in error handling and retries
- Progress tracking via Temporal UI
- See: Setup, orchestration, workflow architecture
- Fake-News Mock CMS - Mock CMS API for development and testing
- FastAPI service that generates test content
- Includes error fixtures for testing failure scenarios
- Reference implementation for CMS data structure
arc-xp-etl-migration-starter/
├── adapter/ # Standalone adapter implementation
│ ├── extract.py # Extract content from CMS
│ ├── identify.py # Identify content types
│ ├── convert_story.py # Transform stories to ANS
│ ├── convert_image.py # Transform images to ANS
│ ├── inventory.py # Track processed items
│ ├── load.py # Load content to Arc XP
│ └── ReadMe.md # Standalone adapter documentation
│
├── adapter_temporal/ # Temporal orchestrated adapter
│ ├── activities.py # Temporal activities (ETL steps)
│ ├── orchestration_workflow.py # Workflow definitions
│ ├── run_worker.py # Start Temporal worker
│ ├── run_workflow.py # Start workflow execution
│ ├── tests/ # Unit tests
│ └── ReadMe.md # Temporal adapter documentation
│
├── fake-news/ # Mock CMS service
│ ├── main.py # FastAPI application
│ ├── fake_content.py # Content generation logic
│ └── ReadMe.md # Mock CMS documentation
│
├── fixtures/ # Test data and examples
│ ├── fake_news_*.json # Sample content fixtures
│ └── workflow_logging_result.txt # Example workflow output
│
├── CUSTOMIZATION_GUIDE.md # Comprehensive customization guide
├── requirements.txt # Python dependencies
└── README.md # This file
- Extract - Retrieve content from your source CMS
- Identify - Determine content type and generate Arc XP ANS IDs
- Transform - Convert source content to Arc XP ANS format
- Load - Submit transformed content to Arc XP via Migration Center API
Before coding, you must map your source CMS fields to Arc XP ANS fields. This is critical pre-work that determines:
- Which source fields map to which ANS fields
- What transformations are needed (date formats, URL structures, etc.)
- How to handle different content types
See the Pre-Work section in the Standalone Adapter guide for detailed mapping guidance.
Arc XP uses ANS as its native content format. Your adapter must transform your CMS content into ANS format, which includes:
- Required fields:
_id,type,version, etc. - Content elements: Story body, images, headlines, lists, etc.
- Circulation: Sections, URLs, publishing metadata
- Migration Center payload: Wrapper for loading into Arc XP
- Read the Customization Guide - Understand what needs to be customized
- Map your CMS data - Create a mapping document of source → ANS fields
- Start with the Standalone Adapter - Test transformations incrementally
- Customize converters - Modify
set_content_elements()and otherset_*()methods
- Python 3.8+
- Arc XP account with Migration Center API access
- (For Temporal adapter) Temporal server (local dev server or cloud instance)
See requirements.txt for Python package dependencies.
- Customization questions: See CUSTOMIZATION_GUIDE.md
- Standalone adapter: See adapter/ReadMe.md
- Temporal adapter: See adapter_temporal/ReadMe.md
- Mock CMS: See fake-news/ReadMe.md