A minimal notebook-driven starter to import web pages into a WordLift knowledge graph using the WordLift Python SDK. The kg_import.ipynb notebook wires up a configuration file and a custom callback so you can adapt the import to your project.
- Python 3.11 or 3.12
uvfor dependency and environment management- access to a WordLift workspace (set the credentials in your environment or
.envfile before running the notebook)
- Install
uvif you do not have it:curl -LsSf https://astral.sh/uv/install.sh | sh - Create the virtual environment and install dependencies:
Add
uv sync
--all-extrasif you also want the dev tools (ruff,pre-commit,pytest,nbstripout). - Activate the environment (optional) or prefix commands with
uv run:source .venv/bin/activate - Set any required environment variables in a
.envfile at the repo root (the SDK will read them viapython-dotenv). This is typically where you place API credentials and workspace settings. - Start the notebook:
uv run jupyter notebook kg_import.ipynb
- Open
kg_import.ipynbin your browser and run the cells top-to-bottom. The first cell ensures required packages are present; the second loads your configuration and kicks off the import workflow.
- Edit
config/default.pyto point to your sitemap, page types, concurrency, and import strategy. - Customize the callback in
app/overrides/web_page_import_protocol.pyto handle theWebPageImportResponseobjects (e.g., logging, persistence, or additional processing). - Add templates or additional assets under
data/as needed.
- Install Git hooks after syncing deps:
uv run pre-commit install - Format/lint:
uv run ruff check . - Tests (if/when added):
uv run pytest
- If the notebook fails to authenticate, double-check the credentials in your
.env. - Re-run
uv syncafter updating dependencies inpyproject.toml.