Skip to content

Commit 791548a

Browse files
Merge branch 'dev' into 1052-update-cosmos-to-create-jobs-for-scrapers-and-indexers
2 parents 5fa9a7f + 2420983 commit 791548a

File tree

24 files changed

+1696
-172
lines changed

24 files changed

+1696
-172
lines changed

.github/workflows/run_full_test_suite.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ on:
44
pull_request:
55
branches:
66
- dev
7+
paths-ignore:
8+
- '**/*.md'
79

810
jobs:
911
run-tests:
@@ -33,5 +35,10 @@ jobs:
3335
DJANGO_ENV: test
3436
run: docker-compose -f local.yml run --rm django bash ./init.sh
3537

38+
- name: Generate Coverage Report
39+
env:
40+
DJANGO_ENV: test
41+
run: docker-compose -f local.yml run --rm django bash -c "coverage report"
42+
3643
- name: Cleanup
3744
run: docker-compose -f local.yml down --volumes

CHANGELOG.md

Lines changed: 61 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,72 @@ For each PR made, an entry should be added to this changelog. It should contain
1212
- etc.
1313

1414
## Changelog
15-
1615
- 1052-update-cosmos-to-create-jobs-for-scrapers-and-indexers
1716
- Description: The original automation set up to generate the scrapers and indexers automatically based on a collection workflow status change needed to be updated to more accurately reflect the curation workflow. It would also be good to generate the jobs during this process to streamline the same.
1817
- Changes:
1918
- Updated function nomenclature. Scrapers are Sinequa connector configurations that are used to scrape all the URLs prior to curation. Indexers are Sienqua connector configurations that are used to scrape the URLs post to curation, which would be used to index content on production. Jobs are used to trigger the connectors which are included as parts of joblists.
20-
- Parameterized the convert_template_to_job method to include the job_source to streamline the value added to the <Collection> tag in the job XML.
19+
- Parameterized the convert_template_to_job method to include the job_source to streamline the value added to the `<Collection>` tag in the job XML.
2120
- Updated the fields that are pertinenet to transfer from a scraper to an indexer. Also added a third level of XML processing to facilitate the same.
2221
- scraper_template.xml and indexer_template.xml now contains the templates used for the respective configuration generation.
2322
- Deleted the redundant webcrawler_initial_crawl.xml file.
2423
- Added and updated tests on workflow status triggers.
24+
25+
- 2889-serialize-the-tdamm-tags
26+
- Description: Have TDAMM serialzed in a specific way and exposed via the Curated URLs API to be consumed into SDE Test/Prod
27+
- Changes:
28+
- Changed `get_tdamm_tag` method in the `CuratedURLAPISerializer` to process the TDAMM tags and pass them to the API endpoint
29+
30+
- 960-notifications-add-a-dropdown-with-options-on-the-feedback-form
31+
- Description: Generate an API endpoint and publish all the dropdown options necessary as a list for LRM to consume it.
32+
- Changes:
33+
- Created a new model `FeedbackFormDropdown`
34+
- Added the migration file
35+
- Added the `dropdown_option` field to the `Feedback` model
36+
- Updated the slack notification structure by adding the dropdown option text
37+
- Created a new serializer called `FeedbackFormDropdownSerializer`
38+
- Added a new API endpoint `feedback-form-dropdown-options-api/` where the list is going to be accesible
39+
- Added a list view called `FeedbackFormDropdownListView`
40+
- Added tests
41+
42+
- 1217-add-data-validation-to-the-feedback-form-api-to-restrict-html-content
43+
- Description: The feedback form API does not currently have any form of data validation on the backend which makes it easy for the user with the endpoint to send in data with html tags. We need to have a validation scheme on the backend to protect this from happening.
44+
- Changes:
45+
- Defined a class `HTMLFreeCharField` which inherits `serializers.CharField`
46+
- Used regex to catch any HTML content comming in as an input to form fields
47+
- Called this class within the serializer for necessary fields
48+
49+
- 3227-bugfix-title-patterns-selecting-multi-url-pattern-does-nothing
50+
- Description: When selecting options from the match pattern type filter, the system does not filter the results as expected. Instead of displaying only the chosen variety of patterns, it continues to show all patterns.
51+
- Changes:
52+
- In `title_patterns_table` definition, corrected the column reference
53+
- Made `match_pattern_type` searchable
54+
- Corrected the column references and made code consistent on all the other tables, i.e., `exclude_patterns_table`, `include_patterns_table`, `division_patterns_table` and `document_type_patterns_table`
55+
56+
- 1190-add-tests-for-job-generation-pipeline
57+
- Description: Tests have been added to enhance coverage for the config and job creation pipeline, alongside comprehensive tests for XML processing.
58+
- Changes:
59+
- Added config_generation/tests/test_config_generation_pipeline.py which tests the config and job generation pipeline, ensuring all components interact correctly
60+
- config_generation/tests/test_db_to_xml.py is updated to include comprehensive tests for XML Processing
61+
62+
- 1001-tests-for-critical-functionalities
63+
- Description: Critical functionalities have been identified and listed, and critical areas lacking tests listed
64+
- Changes:
65+
- Integrated coverage.py as an indicative tool in the workflow for automated coverage reports on PRs, with separate display from test results.
66+
- Introduced docs/architecture-decisions/testing_strategy.md, which includes the coverage report, lists critical areas, and specifically identifies those critical areas that are untested or under-tested.
67+
68+
- 1192-finalize-the-infrastructure-for-frontend-testing
69+
- Description: Set up comprehensive frontend testing infrastructure using Selenium WebDriver with Chrome, establishing a foundation for automated UI testing.
70+
- Changes:
71+
- Added Selenium testing dependency to `requirements/local.txt`
72+
- Updated Dockerfile to support Chrome and ChromeDriver
73+
- Created BaseTestCase and AuthenticationMixin for reusable test components
74+
- Implemented core authentication test suite
75+
76+
- 1195-implement-unit-test-for-forms-on-the-frontend
77+
- Description: Implemented comprehensive frontend test suite covering authentication, collection management, search functionality, and pattern application forms.
78+
- Changes:
79+
- Added tests for authentication flows
80+
- Implemented collection display and data table tests
81+
- Added universal search functionality tests
82+
- Created search pane filter tests
83+
- Added pattern application form tests with validation checks

compose/local/django/Dockerfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ RUN apt-get update && apt-get install --no-install-recommends -y \
5252
&& wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add - \
5353
&& apt-get update \
5454
&& apt-get install -y postgresql-15 postgresql-client-15 \
55+
&& apt-get install -y chromium chromium-driver \
5556
# cleaning up unused files
5657
&& apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false \
5758
&& rm -rf /var/lib/apt/lists/*
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
from unittest.mock import MagicMock, call, patch
2+
3+
from django.test import TestCase
4+
5+
from sde_collections.models.collection import Collection
6+
from sde_collections.models.collection_choice_fields import WorkflowStatusChoices
7+
8+
"""
9+
Workflow status change → Opens template → Applies XML transformation → Writes to GitHub.
10+
11+
- When the `workflow_status` changes, it triggers the relevant config creation method.
12+
- The method reads an template and processes it using `XmlEditor`.
13+
- `XmlEditor` modifies the template by injecting collection-specific values and transformations.
14+
- The generated XML is passed to `_write_to_github()`, which commits it directly to GitHub.
15+
16+
Note: This test verifies that the correct methods are triggered and XML content is passed to GitHub.
17+
The actual XML structure and correctness are tested separately in `test_db_xml.py`.
18+
"""
19+
20+
21+
class TestConfigCreation(TestCase):
22+
def setUp(self):
23+
self.collection = Collection.objects.create(
24+
name="Test Collection", division="1", workflow_status=WorkflowStatusChoices.RESEARCH_IN_PROGRESS
25+
)
26+
27+
@patch("sde_collections.utils.github_helper.GitHubHandler") # Mock GitHubHandler
28+
@patch("sde_collections.models.collection.Collection._write_to_github")
29+
@patch("sde_collections.models.collection.XmlEditor")
30+
def test_ready_for_engineering_triggers_config_and_job_creation(
31+
self, MockXmlEditor, mock_write_to_github, MockGitHubHandler
32+
):
33+
"""
34+
When the collection's workflow status is updated to READY_FOR_ENGINEERING,
35+
it should trigger the creation of scraper configuration and job files.
36+
"""
37+
# Mock GitHubHandler to avoid actual API calls
38+
mock_github_instance = MockGitHubHandler.return_value
39+
mock_github_instance.create_file.return_value = None
40+
mock_github_instance.create_or_update_file.return_value = None
41+
42+
# Set up the XmlEditor mock for both config and job
43+
mock_editor_instance = MockXmlEditor.return_value
44+
mock_editor_instance.convert_template_to_scraper.return_value = "<scraper_config>config_data</scraper_config>"
45+
mock_editor_instance.convert_template_to_job.return_value = "<scraper_job>job_data</scraper_job>"
46+
47+
# Simulate the status change to READY_FOR_ENGINEERING
48+
self.collection.workflow_status = WorkflowStatusChoices.READY_FOR_ENGINEERING
49+
self.collection.save()
50+
51+
# Verify that the XML for both config and job are generated and written to GitHub
52+
expected_calls = [
53+
call(self.collection._scraper_config_path, "<scraper_config>config_data</scraper_config>", False),
54+
call(self.collection._scraper_job_path, "<scraper_job>job_data</scraper_job>", False),
55+
]
56+
mock_write_to_github.assert_has_calls(expected_calls, any_order=True)
57+
58+
@patch("sde_collections.models.collection.GitHubHandler") # Mock GitHubHandler in the correct module path
59+
@patch("sde_collections.models.collection.Collection._write_to_github")
60+
@patch("sde_collections.models.collection.XmlEditor")
61+
def test_ready_for_curation_triggers_indexer_config_and_job_creation(
62+
self, MockXmlEditor, mock_write_to_github, MockGitHubHandler
63+
):
64+
"""
65+
When the collection's workflow status is updated to READY_FOR_CURATION,
66+
it should trigger indexer config and job creation methods.
67+
"""
68+
# Mock GitHubHandler to avoid actual API calls
69+
mock_github_instance = MockGitHubHandler.return_value
70+
mock_github_instance.check_file_exists.return_value = True # Assume scraper exists
71+
mock_github_instance._get_file_contents.return_value = MagicMock()
72+
mock_github_instance._get_file_contents.return_value.decoded_content = (
73+
b"<scraper_config>Mock Data</scraper_config>"
74+
)
75+
76+
# Set up the XmlEditor mock for both config and job
77+
mock_editor_instance = MockXmlEditor.return_value
78+
mock_editor_instance.convert_template_to_indexer.return_value = "<indexer_config>config_data</indexer_config>"
79+
mock_editor_instance.convert_template_to_job.return_value = "<indexer_job>job_data</indexer_job>"
80+
81+
# Simulate the status change to READY_FOR_CURATION
82+
self.collection.workflow_status = WorkflowStatusChoices.READY_FOR_CURATION
83+
self.collection.save()
84+
85+
# Verify that the XML for both indexer config and job are generated and written to GitHub
86+
expected_calls = [
87+
call(self.collection._indexer_config_path, "<indexer_config>config_data</indexer_config>", True),
88+
call(self.collection._indexer_job_path, "<indexer_job>job_data</indexer_job>", False),
89+
]
90+
mock_write_to_github.assert_has_calls(expected_calls, any_order=True)
Lines changed: 108 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
1-
import xml.etree.ElementTree as ET
1+
# docker-compose -f local.yml run --rm django pytest config_generation/tests/test_db_to_xml.py
2+
from xml.etree.ElementTree import ElementTree, ParseError, fromstring
3+
4+
import pytest
25

36
from ..db_to_xml import XmlEditor
47

@@ -28,39 +31,112 @@ def elements_equal(e1, e2):
2831
return False
2932
return all(elements_equal(c1, c2) for c1, c2 in zip(e1, e2))
3033

31-
tree1 = ET.fromstring(xml1)
32-
tree2 = ET.fromstring(xml2)
33-
return elements_equal(tree1, tree2)
34+
tree1 = ElementTree(fromstring(xml1))
35+
tree2 = ElementTree(fromstring(xml2))
3436

37+
return elements_equal(tree1.getroot(), tree2.getroot())
3538

36-
def test_update_or_add_element_value():
37-
xml_string = """<root>
38-
<child>
39-
<grandchild>old_value</grandchild>
40-
</child>
41-
</root>"""
4239

40+
# Tests for valid and invalid XML initializations
41+
def test_valid_xml_initialization():
42+
xml_string = "<root><child>Test</child></root>"
4343
editor = XmlEditor(xml_string)
44+
assert editor.get_tag_value("child") == ["Test"]
4445

45-
# To update an existing element's value
46-
updated_xml = editor.update_or_add_element_value("child/grandchild", "new_value")
47-
expected_output = """<root>
48-
<child>
49-
<grandchild>new_value</grandchild>
50-
</child>
51-
</root>
52-
"""
53-
assert xmls_equal(updated_xml, expected_output)
54-
55-
# To create a new element and set its value
56-
new_xml = editor.update_or_add_element_value("newchild", "some_value")
57-
expected_output = """<root>
58-
<child>
59-
<grandchild>new_value</grandchild>
60-
</child>
61-
<newchild>
62-
some_value
63-
</newchild>
64-
</root>
65-
"""
66-
assert xmls_equal(new_xml, expected_output)
46+
47+
def test_invalid_xml_initialization():
48+
with pytest.raises(ParseError):
49+
XmlEditor("<root><child></root>")
50+
51+
52+
# Test retrieval of single and multiple tag values
53+
def test_get_single_tag_value():
54+
xml_string = "<root><child>Test</child></root>"
55+
editor = XmlEditor(xml_string)
56+
assert editor.get_tag_value("child", strict=True) == "Test"
57+
58+
59+
def test_get_nonexistent_tag_value():
60+
xml_string = "<root><child>Test</child></root>"
61+
editor = XmlEditor(xml_string)
62+
assert editor.get_tag_value("nonexistent", strict=False) == []
63+
64+
65+
def test_get_tag_value_strict_multiple_elements():
66+
xml_string = "<root><child>One</child><child>Two</child></root>"
67+
editor = XmlEditor(xml_string)
68+
with pytest.raises(ValueError):
69+
editor.get_tag_value("child", strict=True)
70+
71+
72+
# Test updating and adding XML elements
73+
def test_update_existing_element():
74+
xml_string = "<root><child>Old</child></root>"
75+
editor = XmlEditor(xml_string)
76+
editor.update_or_add_element_value("child", "New")
77+
updated_xml = editor.update_config_xml()
78+
assert "New" in updated_xml and "Old" not in updated_xml
79+
80+
81+
def test_add_new_element():
82+
xml_string = "<root></root>"
83+
editor = XmlEditor(xml_string)
84+
editor.update_or_add_element_value("newchild", "Value")
85+
updated_xml = editor.update_config_xml()
86+
assert "Value" in updated_xml and "<newchild>Value</newchild>" in updated_xml
87+
88+
89+
def test_add_third_level_hierarchy():
90+
xml_string = "<root></root>"
91+
editor = XmlEditor(xml_string)
92+
editor.update_or_add_element_value("parent/child/grandchild", "DeeplyNested")
93+
updated_xml = editor.update_config_xml()
94+
root = fromstring(updated_xml)
95+
grandchild = root.find(".//grandchild")
96+
assert grandchild is not None, "Grandchild element not found"
97+
assert grandchild.text == "DeeplyNested", "Grandchild does not contain the correct text"
98+
99+
# Check complete path
100+
parent = root.find(".//parent/child/grandchild")
101+
assert parent is not None, "Complete path to grandchild not found"
102+
assert parent.text == "DeeplyNested", "Complete path to grandchild does not contain correct text"
103+
104+
105+
# Test transformations and generic mapping
106+
def test_convert_indexer_to_scraper_transformation():
107+
xml_string = """<root><Plugin>Indexer</Plugin></root>"""
108+
editor = XmlEditor(xml_string)
109+
editor.convert_indexer_to_scraper()
110+
updated_xml = editor.update_config_xml()
111+
assert "<Plugin>SMD_Plugins/Sinequa.Plugin.ListCandidateUrls</Plugin>" in updated_xml
112+
assert "<Plugin>Indexer</Plugin>" not in updated_xml
113+
114+
115+
def test_generic_mapping_addition():
116+
xml_string = "<root></root>"
117+
editor = XmlEditor(xml_string)
118+
editor._generic_mapping(name="id", value="doc.url1", selection="url1")
119+
updated_xml = editor.update_config_xml()
120+
assert "<Mapping>" in updated_xml
121+
assert "<Name>id</Name>" in updated_xml
122+
assert "<Value>doc.url1</Value>" in updated_xml
123+
124+
125+
# Test XML serialization with headers
126+
def test_xml_serialization_with_header():
127+
xml_string = "<root><child>Value</child></root>"
128+
editor = XmlEditor(xml_string)
129+
xml_output = editor.update_config_xml()
130+
assert '<?xml version="1.0" encoding="utf-8"?>' in xml_output
131+
assert "<root>" in xml_output and "<child>Value</child>" in xml_output
132+
133+
134+
# Test handling multiple changes accumulation
135+
def test_multiple_changes_accumulation():
136+
xml_string = "<root><child>Initial</child></root>"
137+
editor = XmlEditor(xml_string)
138+
editor.update_or_add_element_value("child", "Modified")
139+
editor.update_or_add_element_value("newchild", "Added")
140+
updated_xml = editor.update_config_xml()
141+
assert "Modified" in updated_xml and "Added" in updated_xml
142+
assert "Initial" not in updated_xml

0 commit comments

Comments
 (0)