Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
178 commits
Select commit Hold shift + click to select a range
eaaec17
fix the issues of doctypes having 0 as a doctype
bishwaspraveen Sep 18, 2024
013d720
Updated scraper and indexer template creation. Updated scraper and in…
dhanur-sharma Oct 15, 2024
3a58fad
Updated indexing templates. convert_template_to_job and transfer fiel…
dhanur-sharma Oct 16, 2024
b3af9c8
Renamed template files and some variable names.
dhanur-sharma Oct 16, 2024
7b1353c
Fixed indexer source
dhanur-sharma Oct 16, 2024
3f95184
Updated scraper and indexer template creation. Updated scraper and in…
dhanur-sharma Oct 15, 2024
a44d3e3
Updated indexing templates. convert_template_to_job and transfer fiel…
dhanur-sharma Oct 16, 2024
9580d63
Renamed template files and some variable names.
dhanur-sharma Oct 16, 2024
52f07be
Fixed indexer source
dhanur-sharma Oct 16, 2024
d69a437
Merge branch '1052-update-cosmos-to-create-jobs-for-scrapers-and-inde…
dhanur-sharma Oct 16, 2024
89fc7d9
Fixing merge conflicts
dhanur-sharma Oct 16, 2024
92c118d
Updated templates to remove version for url
dhanur-sharma Oct 24, 2024
cbadf23
Merge branch 'dev' of https://github.com/NASA-IMPACT/COSMOS into 1052…
dhanur-sharma Dec 4, 2024
1933ee8
Updated worker counts in the template to 3
dhanur-sharma Dec 4, 2024
17226ab
Enabled neural for scraper template
dhanur-sharma Dec 4, 2024
f9068c5
migrate from single tasks.py to tasks folder
CarsonDavis Jan 30, 2025
323196e
add initial inference models and tasks
CarsonDavis Jan 31, 2025
903deef
changed the default to multi url pattern
bishwaspraveen Feb 5, 2025
61c8280
changed the nesessary tests accordingly
bishwaspraveen Feb 5, 2025
e0a87ca
add initial notes on queue functioning
CarsonDavis Feb 5, 2025
1ee6c1f
fixed create division pattern test to fit default match pattern changes
bishwaspraveen Feb 5, 2025
ee79af2
add initial inference app
CarsonDavis Feb 6, 2025
c0dfa60
reconsolidate sde_collections/tasks
CarsonDavis Feb 7, 2025
be78521
Merge branch 'dev' into 1182-ml-classification-queue
CarsonDavis Feb 7, 2025
5c06e32
add local inference pipeline integration tests
CarsonDavis Feb 7, 2025
214be69
fix local_test_inference_integration run path instructions
CarsonDavis Feb 7, 2025
79d0252
update InferenceAPIClient to pass only text to the pipeline
CarsonDavis Feb 7, 2025
7bcec7e
add initial changelog template with classification queue deployment n…
CarsonDavis Feb 7, 2025
881676b
add a verbose name for the InferenceApp and add it to the base settings
CarsonDavis Feb 7, 2025
0a24012
Merge branch 'dev' of https://github.com/NASA-IMPACT/COSMOS into 1052…
dhanur-sharma Feb 10, 2025
e347d7b
Merge branch 'dev' into 1209-bug-fix-document-type-creator-form
bishwaspraveen Feb 10, 2025
108879d
added change log with the PR
bishwaspraveen Feb 10, 2025
3d96230
Updated test cases
dhanur-sharma Feb 10, 2025
c649fe8
Merge branch 'dev' into 1030-resolve-0-value-document-type-in-nasa_sc…
bishwaspraveen Feb 11, 2025
05da883
added a changelog file
bishwaspraveen Feb 11, 2025
4f31fbc
Updated_test_workflow_status_triggers_TC
Feb 12, 2025
259f136
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 12, 2025
bc4e34e
refactor InferenceJob processing pipeline
CarsonDavis Feb 13, 2025
9938f42
expand ModelVersion model
CarsonDavis Feb 13, 2025
1a0bc4e
add INFERENCE_API_URL to base.py
CarsonDavis Feb 13, 2025
ff31aa7
add migrations for new inference models
CarsonDavis Feb 13, 2025
5545eb5
tests for config and job creation
Feb 13, 2025
01024b5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 13, 2025
6e1a569
tests for config and job creation_1
Feb 13, 2025
2179384
Merge branch '1190-add-tests-for-job-generation-pipeline' of https://…
Feb 13, 2025
134ffd6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 13, 2025
764901a
tests for config and job creation_2
Feb 13, 2025
4ab74d2
Merge branch '1190-add-tests-for-job-generation-pipeline' of https://…
Feb 13, 2025
de73421
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 13, 2025
9548277
Update inference/models/inference.py
CarsonDavis Feb 13, 2025
e79eb8f
refactor inference status handling, job evaluation, and naming
CarsonDavis Feb 13, 2025
5fa9a7f
Updated changelog
dhanur-sharma Feb 14, 2025
fcd054d
Merge branch 'dev' into 1030-resolve-0-value-document-type-in-nasa_sc…
bishwaspraveen Feb 19, 2025
3ff6aad
made necessary changes
bishwaspraveen Feb 19, 2025
c5daed3
Merge branch 'dev' into 1030-resolve-0-value-document-type-in-nasa_sc…
bishwaspraveen Feb 20, 2025
db33aae
changes js code to preserve y scroll position while saving
bishwaspraveen Feb 20, 2025
2ac4560
added change log
bishwaspraveen Feb 20, 2025
7140f51
Fixes issue #1014
Feb 21, 2025
87520f7
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 21, 2025
78fecf1
Updated test_import_fulltexts.py
Feb 21, 2025
0528153
Merge branch '1014-add-logs-when-importing-urls-so-we-know-how-many-w…
Feb 21, 2025
188fab0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 21, 2025
dac0e0f
Update test_workflow_status_triggers.py
Feb 21, 2025
02f1290
Merge branch '1014-add-logs-when-importing-urls-so-we-know-how-many-w…
Feb 21, 2025
8f0cf45
add latest inference pipeline data to git
CarsonDavis Feb 21, 2025
23c8d8e
add file specifications to compose and add celerybeat to local
CarsonDavis Feb 21, 2025
2199843
Updated CHANGELOG.md
Feb 21, 2025
016442f
Modifications post PR review
Feb 24, 2025
e012f18
Merge branch 'dev' into 1014-add-logs-when-importing-urls-so-we-know-…
saifrk Feb 24, 2025
bb160ee
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 24, 2025
904b983
Address the TC failure
Feb 24, 2025
9a23ef9
Merge branch '1014-add-logs-when-importing-urls-so-we-know-how-many-w…
Feb 24, 2025
9fb6ad5
Updated the TC failure_2
Feb 24, 2025
71e1795
Merge branch 'dev' of https://github.com/NASA-IMPACT/COSMOS into 1190…
Feb 24, 2025
6cfa5f7
Address the comment made in the PR review
Feb 24, 2025
995e549
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 24, 2025
1804266
Updated the associated TCs
Feb 24, 2025
f15edcf
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 24, 2025
2f72222
Update the TCs
Feb 24, 2025
1d4ba0c
Merge branch 'dev' into 1014-add-logs-when-importing-urls-so-we-know-…
CarsonDavis Feb 25, 2025
02eee25
refactor and test inference api client
CarsonDavis Feb 27, 2025
65f68d3
rename inference api client test to specify local running
CarsonDavis Feb 27, 2025
63bb8a0
add path indicators to tops of files
CarsonDavis Feb 27, 2025
7340cd8
pass api_client to external job and allow specification of api_url
CarsonDavis Feb 27, 2025
de79b7b
explicitly handle null values in text batcher
CarsonDavis Feb 27, 2025
0e854c3
add batch tests and clarify batch limits
CarsonDavis Feb 27, 2025
f94d115
add inference integration tests and update external job to allow blan…
CarsonDavis Feb 27, 2025
df42889
reorder imports in models/__init__.py
CarsonDavis Feb 27, 2025
fe46f35
add documentation todo in inference pipeline
CarsonDavis Feb 27, 2025
b314e68
Merge branch 'dev' into 1190-add-tests-for-job-generation-pipeline
dhanur-sharma Feb 27, 2025
96430de
Added a separate get_total_count() function
Feb 28, 2025
4f5deef
Updated the TCs
Feb 28, 2025
59019f3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 28, 2025
d25604a
Merge branch 'dev' into 1014-add-logs-when-importing-urls-so-we-know-…
saifrk Feb 28, 2025
2420983
Merge pull request #1225 from NASA-IMPACT/1190-add-tests-for-job-gene…
bishwaspraveen Feb 28, 2025
791548a
Merge branch 'dev' into 1052-update-cosmos-to-create-jobs-for-scraper…
dhanur-sharma Feb 28, 2025
1dd1a63
Merge pull request #1072 from NASA-IMPACT/1052-update-cosmos-to-creat…
bishwaspraveen Feb 28, 2025
bf84557
Merge branch 'dev' into 3228-bugfix-preserve-scroll-position--documen…
dhanur-sharma Feb 28, 2025
f2c730b
Updated Message
Feb 28, 2025
651fc9f
Merge pull request #1228 from NASA-IMPACT/3228-bugfix-preserve-scroll…
dhanur-sharma Feb 28, 2025
2d7b7cb
Merge branch 'dev' into 1030-resolve-0-value-document-type-in-nasa_sc…
bishwaspraveen Feb 28, 2025
7895580
Merge branch 'dev' into 1014-add-logs-when-importing-urls-so-we-know-…
saifrk Feb 28, 2025
472e9f9
Updated message2
Feb 28, 2025
44294ed
Merge branch '1014-add-logs-when-importing-urls-so-we-know-how-many-w…
Feb 28, 2025
a4f3cb3
Update CHANGELOG.md
dhanur-sharma Feb 28, 2025
16aae30
Merge pull request #1229 from NASA-IMPACT/1014-add-logs-when-importin…
dhanur-sharma Feb 28, 2025
3d11f2e
Merge branch 'dev' into 1030-resolve-0-value-document-type-in-nasa_sc…
dhanur-sharma Feb 28, 2025
32de8a6
Merge pull request #1031 from NASA-IMPACT/1030-resolve-0-value-docume…
dhanur-sharma Feb 28, 2025
7a83380
Fixes issue_#1196
Mar 2, 2025
4acb1c0
update readme to incorporate new classification changes
CarsonDavis Mar 5, 2025
17b4bdb
added a function to handle escaping
bishwaspraveen Mar 5, 2025
2252750
added change log file
bishwaspraveen Mar 5, 2025
a35be75
Merge branch 'dev' into 1209-bug-fix-document-type-creator-form
bishwaspraveen Mar 5, 2025
e9a2888
made html changes to reflect pattern type changes on forms
bishwaspraveen Mar 5, 2025
cd311b2
Update CHANGELOG.md
bishwaspraveen Mar 5, 2025
b680ab9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 5, 2025
cca44da
Merge branch 'dev' into 1182-ml-classification-queue
CarsonDavis Mar 6, 2025
ed83801
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 6, 2025
3388dea
Merge pull request #1244 from NASA-IMPACT/1101-bug-fix-quotes-not-esc…
CarsonDavis Mar 6, 2025
9131057
Merge branch 'dev' into 1196-arrange-the-show-100-csv-customize-colum…
CarsonDavis Mar 6, 2025
485cfdb
Merge branch 'dev' into 1209-bug-fix-document-type-creator-form
CarsonDavis Mar 6, 2025
a173a80
Merge pull request #1216 from NASA-IMPACT/1209-bug-fix-document-type-…
CarsonDavis Mar 6, 2025
9d68d85
Remove old styling after updating new styling
Mar 7, 2025
dead9e6
Merge branch '1196-arrange-the-show-100-csv-customize-columns-boxes-t…
Mar 7, 2025
45176d8
Merge branch 'dev' into 1196-arrange-the-show-100-csv-customize-colum…
saifrk Mar 7, 2025
65cf30d
Updated Changelog
Mar 7, 2025
6a68fbb
Merge branch '1196-arrange-the-show-100-csv-customize-columns-boxes-t…
Mar 7, 2025
241088e
Fixes Issue #1240
Mar 7, 2025
c2f91de
Added ChangeLog
Mar 7, 2025
abd7ae4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 7, 2025
baedb88
Fixes Issue #1246
Mar 7, 2025
3632801
Merge pull request #1247 from NASA-IMPACT/1246-minor-enhancement-docu…
CarsonDavis Mar 7, 2025
7c7c5b3
Merge branch 'dev' into 1240-fix-code-scanning-alert-inclusion-of-fun…
CarsonDavis Mar 7, 2025
5a349fd
add initial ml integration
CarsonDavis Mar 7, 2025
a48c9a7
Latest Updated Changes
Mar 8, 2025
7479bb6
Merge branch 'dev' into 1196-arrange-the-show-100-csv-customize-colum…
saifrk Mar 8, 2025
f47aba8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 8, 2025
5ff3e26
Added tenacity to base requirements
dhanur-sharma Mar 11, 2025
619f912
Replace fetch_and_replace_full_text with fetch_full_text
dhanur-sharma Mar 12, 2025
92a5420
model_version assignment in refresh_status_and_store_results
dhanur-sharma Mar 13, 2025
db13d73
Added classification utils to save results to tdamm_tag_ml
dhanur-sharma Mar 14, 2025
e7a9ba3
Fixed threshold TypeError bug
dhanur-sharma Mar 15, 2025
2321ff7
Update reevaluate_progress_and_update_status for queued jobs
dhanur-sharma Mar 15, 2025
55e358e
Update process_inference_job_queue to refresh progress for queued jobs
dhanur-sharma Mar 15, 2025
d46ae13
InferenceJob failure on no external jobs created, updated_at, only CO…
dhanur-sharma Mar 16, 2025
ddc3dee
Retry load_model to 60 seconds
dhanur-sharma Mar 16, 2025
6365c10
Reevaluate progress for only PENDING InferenceJobs
dhanur-sharma Mar 16, 2025
3f18253
update the retry method in inference api client
CarsonDavis Mar 17, 2025
f56d7f3
Updated TDAMMTags
dhanur-sharma Mar 17, 2025
4b21b1c
Completed at log and updated classification threshold default
dhanur-sharma Mar 17, 2025
09b122e
Updated existing tests
dhanur-sharma Mar 17, 2025
630363b
Updated TDAMMTags for NOT_TDAMM
dhanur-sharma Mar 17, 2025
ab9bd9d
Added tests for classification utils
dhanur-sharma Mar 17, 2025
d2b9f73
Merge branch '1182-ml-classification-queue' into integrate_classifica…
dhanur-sharma Mar 18, 2025
b7f6edf
Updated imports for pre-commit resolution
dhanur-sharma Mar 18, 2025
a22ade6
Added https URL to allow CORS
dhanur-sharma Mar 18, 2025
067af7a
Added changelog
dhanur-sharma Mar 18, 2025
a00c38c
Merge pull request #1250 from NASA-IMPACT/1249-add-https-link-to-cors…
CarsonDavis Mar 18, 2025
6878c42
Merge branch 'dev' into 1196-arrange-the-show-100-csv-customize-colum…
CarsonDavis Mar 18, 2025
5fd192c
Merge pull request #1242 from NASA-IMPACT/1196-arrange-the-show-100-c…
CarsonDavis Mar 18, 2025
80e9f4c
Merge branch 'dev' into 1240-fix-code-scanning-alert-inclusion-of-fun…
CarsonDavis Mar 18, 2025
bfa1aa3
Merge pull request #1245 from NASA-IMPACT/1240-fix-code-scanning-aler…
CarsonDavis Mar 18, 2025
a646e32
Merge pull request #1248 from NASA-IMPACT/integrate_classification_queue
CarsonDavis Mar 18, 2025
bb222c0
Merge branch 'dev' into 1182-ml-classification-queue
CarsonDavis Mar 18, 2025
4e51b6f
Set default INFERENCE_API_URL value
dhanur-sharma Mar 18, 2025
ae83fde
Set default INFERENCE_API_URL value
dhanur-sharma Mar 18, 2025
5144c27
Merge branch '1182-ml-classification-queue' of https://github.com/NAS…
dhanur-sharma Mar 18, 2025
7eeaf41
Updated test import full text
dhanur-sharma Mar 18, 2025
17e6456
Added run command for import full text
dhanur-sharma Mar 18, 2025
df188ff
Updated test factories
dhanur-sharma Mar 19, 2025
fd14a15
Updated test_identical_url_in_both in test_migrate_dump
dhanur-sharma Mar 19, 2025
59eead4
Updated test_tdamm_tags to use the value from post_generation decorator
dhanur-sharma Mar 19, 2025
0bf0f98
Updated commented commands to run tests
dhanur-sharma Mar 19, 2025
49d174d
Updated test_workflow_status_triggers
dhanur-sharma Mar 19, 2025
3ba0a63
Updated test_batch
dhanur-sharma Mar 19, 2025
bf8fc55
Updated production.yml
dhanur-sharma Mar 19, 2025
f6647ff
Removed the port config change in production.yml
dhanur-sharma Mar 19, 2025
460565d
Merge pull request #1219 from NASA-IMPACT/1182-ml-classification-queue
CarsonDavis Mar 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,24 @@ For each PR made, an entry should be added to this changelog. It should contain
- etc.

## Changelog

- 1209-bug-fix-document-type-creator-form
- Description: The dropdown on the pattern creation form needs to be set as multi as the default option since this is why the doc type creator form is used for the majority of multi-URL pattern creations. This should be applied to doc types, division types, and titles as well.
- Changes:
- Set the default value for `match_pattern_type` in `BaseMatchPattern` class is set to `2`
- Changed `test_create_simple_exclude_pattern` test within `TestDeltaExcludePatternBasics`
- Changed `test_create_division_pattern` and `test_create_document_type_pattern_single` within `TestFieldModifierPatternBasics`

- 1052-update-cosmos-to-create-jobs-for-scrapers-and-indexers
- Description: The original automation set up to generate the scrapers and indexers automatically based on a collection workflow status change needed to be updated to more accurately reflect the curation workflow. It would also be good to generate the jobs during this process to streamline the same.
- Changes:
- Updated function nomenclature. Scrapers are Sinequa connector configurations that are used to scrape all the URLs prior to curation. Indexers are Sienqua connector configurations that are used to scrape the URLs post to curation, which would be used to index content on production. Jobs are used to trigger the connectors which are included as parts of joblists.
- Parameterized the convert_template_to_job method to include the job_source to streamline the value added to the `<Collection>` tag in the job XML.
- Updated the fields that are pertinenet to transfer from a scraper to an indexer. Also added a third level of XML processing to facilitate the same.
- scraper_template.xml and indexer_template.xml now contains the templates used for the respective configuration generation.
- Deleted the redundant webcrawler_initial_crawl.xml file.
- Added and updated tests on workflow status triggers.

- 2889-serialize-the-tdamm-tags
- Description: Have TDAMM serialzed in a specific way and exposed via the Curated URLs API to be consumed into SDE Test/Prod
- Changes:
Expand All @@ -36,13 +54,38 @@ For each PR made, an entry should be added to this changelog. It should contain
- Used regex to catch any HTML content comming in as an input to form fields
- Called this class within the serializer for necessary fields

- 1030-resolve-0-value-document-type-in-nasa_science
- Description: Around 2000 of the docs coming out of the COSMOS api for nasa_science have a doc type value of 0.
- Changes:
- Added `obj.document_type != 0` as a condition in the `get_document_type` method within the `CuratedURLAPISerializer`

- 1014-add-logs-when-importing-urls-so-we-know-how-many-were-expected-how-many-succeeded-and-how-many-failed
- Description: When URLs of a given collection are imported into COSMOS, a Slack notification is sent. This notification includes the name of the collection imported,count of the existing curated URLs, total URLs count as per the server, URLs successfully imported from the server, delta URLs identified and delta URLs marked for deletion.
- Changes:
- The get_full_texts() function in sde_collections/sinequa_api.py is updated to yeild total_count along with rows.
- fetch_and_replace_full_text() function in sde_collections/tasks.py captures the total_server_count and triggers send_detailed_import_notification().
- Added a function send_detailed_import_notification() in sde_collections/utils/slack_utils.py to structure the notification to be sent.
- Updated the associated tests effected due to inclusion of this functionality.

- 3228-bugfix-preserve-scroll-position--document-type-selection-behavior-on-individual-urls
- Description: Upon selecting a document type on any individual URL, the page refreshes and returns to the top. This is not necessarily a bug but an inconvenience, especially when working at the bottom of the page. Fix the JS code.
- Changes:
- Added a constant `scrollPosition` within `postDocumentTypePatterns` to store the y coordinate postion on the page
- Modified the ajax relaod to navigate to this position upon posting/saving the document type changes.

- 3227-bugfix-title-patterns-selecting-multi-url-pattern-does-nothing
- Description: When selecting options from the match pattern type filter, the system does not filter the results as expected. Instead of displaying only the chosen variety of patterns, it continues to show all patterns.
- Changes:
- In `title_patterns_table` definition, corrected the column reference
- Made `match_pattern_type` searchable
- Corrected the column references and made code consistent on all the other tables, i.e., `exclude_patterns_table`, `include_patterns_table`, `division_patterns_table` and `document_type_patterns_table`

- 1190-add-tests-for-job-generation-pipeline
- Description: Tests have been added to enhance coverage for the config and job creation pipeline, alongside comprehensive tests for XML processing.
- Changes:
- Added config_generation/tests/test_config_generation_pipeline.py which tests the config and job generation pipeline, ensuring all components interact correctly
- config_generation/tests/test_db_to_xml.py is updated to include comprehensive tests for XML Processing

- 1001-tests-for-critical-functionalities
- Description: Critical functionalities have been identified and listed, and critical areas lacking tests listed
- Changes:
Expand All @@ -65,3 +108,32 @@ For each PR made, an entry should be added to this changelog. It should contain
- Added universal search functionality tests
- Created search pane filter tests
- Added pattern application form tests with validation checks

- 1101-bug-fix-quotes-not-escaped-in-titles
- Description: Title rules that include single quotes show up correctly in the sinequa frontend (and the COSMOS api) but not in the delta urls page.
- Changes:
- Added `escapeHtml` function in the `delta_url_list.js` file to handle special character escaping correctly.
- Called this function while retrieving the titles in `getGeneratedTitleColumn()` and `getCuratedGeneratedTitleColumn()` functions.

- 1240-fix-code-scanning-alert-inclusion-of-functionality-from-an-untrusted-source
- Description: Ensured all external resources load securely by switching to HTTPS and adding Subresource Integrity (SRI) checks.
- Changes:
- Replaced protocol‑relative URLs with HTTPS.
- Added SRI (integrity) and crossorigin attributes to external script tags.

- 1196-arrange-the-show-100-csv-customize-columns-boxes-to-be-in-one-line-on-the-delta-urls-page
changelog-update-Issue-1001
- Description: Formatting the buttons - 'Show 100','CSV' and 'Customize Columns' to be on a single line for an optimal use of space.
- Changes:
- Updated delta_url_list.css and delta_url_list.js files with necessary modifications

- 1246-minor-enhancement-document-type-pattern-form-require-document-type-or-show-appropriate-error
- Description: In the Document Type Pattern Form, if the user does not select a Document Type while filling out the form, an appropriate error message is displayed.
- Changes:
- Added a JavaScript validation check on form submission to ensure the document type (stored in a hidden input) is not empty.
- Display an error message and prevent form submission if the field is empty.

- 1249-add-https-link-to-cors_allowed_origins-for-sde-lrm
- Description: The feedback form API was throwing CORS errors and to rectify that, we need to add the apt https link for sde-lrm.
- Changes:
- Added `https://sde-lrm.nasa-impact.net` to `CORS_ALLOWED_ORIGINS` in the base settings.
1 change: 1 addition & 0 deletions compose/local/django/start
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
#compose/local/django/start
#!/bin/bash

set -o errexit
Expand Down
1 change: 1 addition & 0 deletions compose/production/django/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# compose/production/django/Dockerfile
# define an alias for the specfic python version used in this file.
FROM python:3.10.14-slim-bullseye AS python

Expand Down
1 change: 1 addition & 0 deletions compose/production/django/start
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# compose/production/django/start
#!/bin/bash

set -o errexit
Expand Down
1 change: 1 addition & 0 deletions compose/production/traefik/traefik.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# compose/production/traefik/traefik.yml
log:
level: INFO

Expand Down
24 changes: 24 additions & 0 deletions config/celery.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# config/celery.py
import os

from celery import Celery
from celery.schedules import crontab

# Set the default Django settings module
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "config.settings.local")

app = Celery("cosmos")

# Configure Celery using Django settings
app.config_from_object("django.conf:settings", namespace="CELERY")

# Load task modules from all registered Django app configs
app.autodiscover_tasks()

app.conf.beat_schedule = {
"process-inference-queue": {
"task": "inference.tasks.process_inference_job_queue",
# Only run between 6pm and 7am
"schedule": crontab(minute="*/5", hour="18-23,0-6"),
},
}
10 changes: 6 additions & 4 deletions config/settings/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@
"feedback",
"sde_collections",
"sde_indexing_helper.users",
"inference",
]

# https://docs.djangoproject.com/en/dev/ref/settings/#installed-apps
Expand All @@ -92,6 +93,7 @@
CORS_ALLOWED_ORIGINS = [
"http://localhost:3000",
"http://sde-lrm.nasa-impact.net",
"https://sde-lrm.nasa-impact.net",
"https://sde-qa.nasa-impact.net",
"https://sciencediscoveryengine.test.nasa.gov",
"https://sciencediscoveryengine.nasa.gov",
Expand Down Expand Up @@ -288,11 +290,9 @@
# https://docs.celeryq.dev/en/stable/userguide/configuration.html#std:setting-result_serializer
CELERY_RESULT_SERIALIZER = "json"
# https://docs.celeryq.dev/en/stable/userguide/configuration.html#task-time-limit
# TODO: set to whatever value is adequate in your circumstances
CELERY_TASK_TIME_LIMIT = 5 * 60
CELERY_TASK_TIME_LIMIT = 30 * 60
# https://docs.celeryq.dev/en/stable/userguide/configuration.html#task-soft-time-limit
# TODO: set to whatever value is adequate in your circumstances
CELERY_TASK_SOFT_TIME_LIMIT = 60
CELERY_TASK_SOFT_TIME_LIMIT = 25 * 60
# https://docs.celeryq.dev/en/stable/userguide/configuration.html#beat-scheduler
CELERY_BEAT_SCHEDULER = "django_celery_beat.schedulers:DatabaseScheduler"
# https://docs.celeryq.dev/en/stable/userguide/configuration.html#worker-send-task-events
Expand Down Expand Up @@ -349,3 +349,5 @@
LRM_QA_PASSWORD = env("LRM_QA_PASSWORD")
LRM_DEV_TOKEN = env("LRM_DEV_TOKEN")
XLI_TOKEN = env("XLI_TOKEN")
INFERENCE_API_URL = env("INFERENCE_API_URL", default="http://host.docker.internal:8000")
TDAMM_CLASSIFICATION_THRESHOLD = env("TDAMM_CLASSIFICATION_THRESHOLD", default="0.5")
73 changes: 43 additions & 30 deletions config_generation/db_to_xml.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,35 +148,51 @@ def convert_template_to_scraper(self, collection) -> None:
scraper_config = self.update_config_xml()
return scraper_config

def convert_template_to_plugin_indexer(self, scraper_editor) -> None:
def convert_template_to_job(self, collection, job_source) -> None:
"""
assuming this class has been instantiated with the scraper_template.xml
assuming this class has been instantiated with the job_template.xml
"""
self.update_or_add_element_value("Collection", f"/{job_source}/{collection.config_folder}/")
job_config = self.update_config_xml()
return job_config

def convert_template_to_indexer(self, scraper_editor) -> None:
"""
assuming this class has been instantiated with the final_config_template.xml
"""

transfer_fields = [
"KeepHashFragmentInUrl",
"CorrectDomainCookies",
"IgnoreSessionCookies",
"DownloadImages",
"DownloadMedia",
"DownloadCss",
"DownloadFtp",
"DownloadFile",
"IndexJs",
"FollowJs",
"CrawlFlash",
"NormalizeSecureSchemesWhenTestingVisited",
"RetryCount",
"RetryPause",
"AddBaseHref",
"AddMetaContentType",
"NormalizeUrls",
"Throttle",
]

double_transfer_fields = [
("UrlAccess", "AllowXPathCookies"),
("UrlAccess", "UseBrowserForWebRequests"),
("UrlAccess", "UseHttpClientForWebRequests"),
("UrlAccess", "BrowserForWebRequestsReadinessThreshold"),
("UrlAccess", "BrowserForWebRequestsInitialDelay"),
("UrlAccess", "BrowserForWebRequestsMaxTotalDelay"),
("UrlAccess", "BrowserForWebRequestsMaxResourcesDelay"),
("UrlAccess", "BrowserForWebRequestsLogLevel"),
("UrlAccess", "BrowserForWebRequestsViewportWidth"),
("UrlAccess", "BrowserForWebRequestsViewportHeight"),
("UrlAccess", "BrowserForWebRequestsAdditionalJavascript"),
("UrlAccess", "PostLoginUrl"),
("UrlAccess", "PostLoginData"),
("UrlAccess", "GetBeforePostLogin"),
("UrlAccess", "PostLoginAutoRedirect"),
("UrlAccess", "ReLoginCount"),
("UrlAccess", "ReLoginDelay"),
("UrlAccess", "DetectHtmlLoginPattern"),
("IndexerClient", "RetryTimeout"),
("IndexerClient", "RetrySleep"),
]

triple_transfer_fields = [
("UrlAccess", "BrowserLogin", "Activate"),
("UrlAccess", "BrowserLogin", "RemoteDebuggingPort"),
("UrlAccess", "BrowserLogin", "BrowserLogLevel"),
("UrlAccess", "BrowserLogin", "ShowDevTools"),
("UrlAccess", "BrowserLogin", "SuccessCondition"),
("UrlAccess", "BrowserLogin", "CookieFilter"),
]

for field in transfer_fields:
Expand All @@ -187,18 +203,15 @@ def convert_template_to_plugin_indexer(self, scraper_editor) -> None:
f"{parent}/{child}", scraper_editor.get_tag_value(f"{parent}/{child}", strict=True)
)

for grandparent, parent, child in triple_transfer_fields:
self.update_or_add_element_value(
f"{grandparent}/{parent}/{child}",
scraper_editor.get_tag_value(f"{grandparent}/{parent}/{child}", strict=True),
)

scraper_config = self.update_config_xml()
return scraper_config

def convert_template_to_indexer(self, collection) -> None:
"""
assuming this class has been instantiated with the indexer_template.xml
"""
self.update_or_add_element_value("Collection", f"/SDE/{collection.config_folder}/")
indexer_config = self.update_config_xml()

return indexer_config

def _mapping_exists(self, new_mapping: ET.Element):
"""
Check if the mapping with given parameters already exists in the XML tree
Expand Down
90 changes: 90 additions & 0 deletions config_generation/tests/test_config_generation_pipeline.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
from unittest.mock import MagicMock, call, patch

from django.test import TestCase

from sde_collections.models.collection import Collection
from sde_collections.models.collection_choice_fields import WorkflowStatusChoices

"""
Workflow status change → Opens template → Applies XML transformation → Writes to GitHub.

- When the `workflow_status` changes, it triggers the relevant config creation method.
- The method reads an template and processes it using `XmlEditor`.
- `XmlEditor` modifies the template by injecting collection-specific values and transformations.
- The generated XML is passed to `_write_to_github()`, which commits it directly to GitHub.

Note: This test verifies that the correct methods are triggered and XML content is passed to GitHub.
The actual XML structure and correctness are tested separately in `test_db_xml.py`.
"""


class TestConfigCreation(TestCase):
def setUp(self):
self.collection = Collection.objects.create(
name="Test Collection", division="1", workflow_status=WorkflowStatusChoices.RESEARCH_IN_PROGRESS
)

@patch("sde_collections.utils.github_helper.GitHubHandler") # Mock GitHubHandler
@patch("sde_collections.models.collection.Collection._write_to_github")
@patch("sde_collections.models.collection.XmlEditor")
def test_ready_for_engineering_triggers_config_and_job_creation(
self, MockXmlEditor, mock_write_to_github, MockGitHubHandler
):
"""
When the collection's workflow status is updated to READY_FOR_ENGINEERING,
it should trigger the creation of scraper configuration and job files.
"""
# Mock GitHubHandler to avoid actual API calls
mock_github_instance = MockGitHubHandler.return_value
mock_github_instance.create_file.return_value = None
mock_github_instance.create_or_update_file.return_value = None

# Set up the XmlEditor mock for both config and job
mock_editor_instance = MockXmlEditor.return_value
mock_editor_instance.convert_template_to_scraper.return_value = "<scraper_config>config_data</scraper_config>"
mock_editor_instance.convert_template_to_job.return_value = "<scraper_job>job_data</scraper_job>"

# Simulate the status change to READY_FOR_ENGINEERING
self.collection.workflow_status = WorkflowStatusChoices.READY_FOR_ENGINEERING
self.collection.save()

# Verify that the XML for both config and job are generated and written to GitHub
expected_calls = [
call(self.collection._scraper_config_path, "<scraper_config>config_data</scraper_config>", False),
call(self.collection._scraper_job_path, "<scraper_job>job_data</scraper_job>", False),
]
mock_write_to_github.assert_has_calls(expected_calls, any_order=True)

@patch("sde_collections.models.collection.GitHubHandler") # Mock GitHubHandler in the correct module path
@patch("sde_collections.models.collection.Collection._write_to_github")
@patch("sde_collections.models.collection.XmlEditor")
def test_ready_for_curation_triggers_indexer_config_and_job_creation(
self, MockXmlEditor, mock_write_to_github, MockGitHubHandler
):
"""
When the collection's workflow status is updated to READY_FOR_CURATION,
it should trigger indexer config and job creation methods.
"""
# Mock GitHubHandler to avoid actual API calls
mock_github_instance = MockGitHubHandler.return_value
mock_github_instance.check_file_exists.return_value = True # Assume scraper exists
mock_github_instance._get_file_contents.return_value = MagicMock()
mock_github_instance._get_file_contents.return_value.decoded_content = (
b"<scraper_config>Mock Data</scraper_config>"
)

# Set up the XmlEditor mock for both config and job
mock_editor_instance = MockXmlEditor.return_value
mock_editor_instance.convert_template_to_indexer.return_value = "<indexer_config>config_data</indexer_config>"
mock_editor_instance.convert_template_to_job.return_value = "<indexer_job>job_data</indexer_job>"

# Simulate the status change to READY_FOR_CURATION
self.collection.workflow_status = WorkflowStatusChoices.READY_FOR_CURATION
self.collection.save()

# Verify that the XML for both indexer config and job are generated and written to GitHub
expected_calls = [
call(self.collection._indexer_config_path, "<indexer_config>config_data</indexer_config>", True),
call(self.collection._indexer_job_path, "<indexer_job>job_data</indexer_job>", False),
]
mock_write_to_github.assert_has_calls(expected_calls, any_order=True)
Loading