Add detailed architecture diagram to README for Wikimedia Dumps pipeline

The current README provides a textual explanation of the Wikimedia Dumps automation workflow, but it lacks a visual representation of how the different components interact with each other.

To improve onboarding for new contributors and make the system easier to understand, a detailed architecture diagram should be added to the README.

**Proposed update:**

1. Add a Mermaid-based flowchart that illustrates the complete high-level pipeline, including:
2. Wikimedia dumps HTML pages (dumps.wikimedia.org)
3. Crawling and HTML parsing using wiki_dumps_crawler.py
4. Detection of newly published and finished dumps
5. Storage of discovered dump URLs in crawled_urls.txt
6. Publishing workflow via wikimedia_publish.py
7. Publication of RDF metadata to the Databus API
8. Availability of metadata in the Databus knowledge graph for SPARQL queries

**The diagram should clearly show:**

1. Data flow between components
2. Decision points (e.g., presence of new dumps)
3. The end-to-end automation from dump discovery to Databus publication

This visual overview will make the repository more accessible to first-time contributors and clarify how the crawler and publishing scripts work together within the DBpedia infrastructure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add detailed architecture diagram to README for Wikimedia Dumps pipeline #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add detailed architecture diagram to README for Wikimedia Dumps pipeline #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions