Skip to content

Commit 4c77965

Browse files
authored
Merge pull request #60 from localgovdrupal/58-standard-pipeline-fails-if-localgov-publications-importer-ai-is-not-installed
Updates Standard import Pipeline
2 parents 0719efe + 8162a83 commit 4c77965

File tree

2 files changed

+17
-3
lines changed

2 files changed

+17
-3
lines changed

config/install/localgov_publications_importer.import_pipeline.standard.yml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,8 @@ extract_plugin_configuration: { }
77
transform_plugins:
88
- transform_images
99
- transform_line_breaks
10-
- transform_ai_aio
1110
transform_plugin_configurations:
1211
- { }
1312
- { }
14-
-
15-
prompt: "You are a website content editor. Format the provided text and HTML into valid JSON only.\r\n\r\nRequirements:\r\n- Return ONLY a JSON array of page objects, no other text\r\n- Each page object has: \"title\" (string), \"content\" (string)\r\n- Split the content into MULTIPLE pages\r\n- Each page should contain 200-500 words of content when possible\r\n- Break pages at natural stopping points: section boundaries, topic changes, or major headings\r\n- Content value contains HTML using only the tags: h1, h2, h3, h4, h5, h6, p, ul, ol, li, img\r\n- Use the first line as h1 if it's a complete sentence\r\n- Preserve original text and HTML tags exactly, only add HTML tags\r\n- Generate descriptive titles that reflect each page's main topic\r\n- Pay special attention to img tags - they must be preserved with all original attributes\r\n- Properly escape all double quotes in JSON strings\r\n- Ensure any JSON you create is valid. This is really important.\r\n\r\nSplit strategy:\r\n- Look for major headings, topic shifts, or natural content breaks\r\n- Each page should feel complete but part of a larger whole\r\n- Distribute content evenly across pages\r\n- Don't create pages that are too short (under 100 words) unless necessary\r\n\r\nExample format:\r\n[\r\n {\"title\":\"Introduction and Overview\",\"content\":\"<h1>Main Title</h1><img src=\"/example-image.jpg\"><p>Intro content...</p>\"},\r\n {\"title\":\"Key Concepts\",\"content\":\"<h2>Section Title</h2><p>More content...</p>\"},\r\n {\"title\":\"Advanced Topics\",\"content\":\"<h2>Another Section</h2><p>Final content...</p>\"}\r\n]\r\n"
1613
save_plugin: save_publication
1714
save_plugin_configuration: { }
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
status: true
2+
dependencies: { }
3+
id: standard_with_ai
4+
label: Standard with AI
5+
extract_plugin: smalot_pdfparser
6+
extract_plugin_configuration: { }
7+
transform_plugins:
8+
- transform_images
9+
- transform_line_breaks
10+
- transform_ai_aio
11+
transform_plugin_configurations:
12+
- { }
13+
- { }
14+
-
15+
prompt: "You are a website content editor. Format the provided text and HTML into valid JSON only.\r\n\r\nRequirements:\r\n- Return ONLY a JSON array of page objects, no other text\r\n- Each page object has: \"title\" (string), \"content\" (string)\r\n- Split the content into MULTIPLE pages\r\n- Each page should contain 200-500 words of content when possible\r\n- Break pages at natural stopping points: section boundaries, topic changes, or major headings\r\n- Content value contains HTML using only the tags: h1, h2, h3, h4, h5, h6, p, ul, ol, li, img\r\n- Use the first line as h1 if it's a complete sentence\r\n- Preserve original text and HTML tags exactly, only add HTML tags\r\n- Generate descriptive titles that reflect each page's main topic\r\n- Pay special attention to img tags - they must be preserved with all original attributes\r\n- Properly escape all double quotes in JSON strings\r\n- Ensure any JSON you create is valid. This is really important.\r\n\r\nSplit strategy:\r\n- Look for major headings, topic shifts, or natural content breaks\r\n- Each page should feel complete but part of a larger whole\r\n- Distribute content evenly across pages\r\n- Don't create pages that are too short (under 100 words) unless necessary\r\n\r\nExample format:\r\n[\r\n {\"title\":\"Introduction and Overview\",\"content\":\"<h1>Main Title</h1><img src=\"/example-image.jpg\"><p>Intro content...</p>\"},\r\n {\"title\":\"Key Concepts\",\"content\":\"<h2>Section Title</h2><p>More content...</p>\"},\r\n {\"title\":\"Advanced Topics\",\"content\":\"<h2>Another Section</h2><p>Final content...</p>\"}\r\n]\r\n"
16+
save_plugin: save_publication
17+
save_plugin_configuration: { }

0 commit comments

Comments
 (0)