You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+35Lines changed: 35 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -148,3 +148,38 @@ For each PR made, an entry should be added to this changelog. It should contain
148
148
- Changes:
149
149
- Added `data-order` attribute to URL count columns for proper numeric sorting
150
150
- Updated SearchPane comparisons to use `@data-order` values instead of string-based loose equality checks to ensure correct numeric filtering
151
+
152
+
- 1182-ml-classification-queue
153
+
- Description: The inference API will be providing confidence levels to the classification results to COSMOS. We require a robust job processing mechanism to batch URLs based on the load the API can handle, and track every individual job sent to the API and ultimately evaluate the status of jobs tied to each collection based on the results retrieved. This also needs to take the translation of the classification labels from the API to the tags used internally within COSMOS.
154
+
- Changes:
155
+
- New environment values have been created called `INFERENCE_API_URL` and `TDAMM_CLASSIFICATION_THRESHOLD` set in base settings.
156
+
- New models added:
157
+
- ModelVersion: Tracking system for multiple versions of classification models with API identifiers
158
+
- InferenceJob: Manages inference jobs for collections of URLs with a specific model version
159
+
- ExternalJob: Represents a batched job sent to the inference API, with multiple ExternalJobs per InferenceJob
160
+
- Status Tracking: Enum classes for job status tracking (queued, pending, completed, failed, cancelled, etc.)
161
+
- BatchProcessor: Handles batching of URLs for efficient API processing
162
+
- Text Length Management: Smart batching based on total text length with configurable maximum (default 10,000 chars)
163
+
- Oversized Text Handling: Automatic truncation of URLs that exceed maximum batch size
164
+
- Iterator Management: Safe handling of QuerySet iterators including proper cleanup
165
+
- InferenceAPIClient: Handles direct interaction with the Inference API
166
+
- Model Management: Loading, unloading, and status checking for models
167
+
- Job Submission: Support for batch submission with proper error handling
168
+
- Retry Logic: Robust retry mechanisms for model loading operations
169
+
- Health Checking: API health verification before operations
170
+
- ClassificationThresholdProcessor: A class to handle the class-based thresholding of classification results
171
+
- Separte classmethods for tdamm and division classifiers
172
+
- Config file to handle the thresholds for each class
173
+
- Celery Integration: Scheduled processing of inference job queue with configurable interval, executes `process_inference_job_queue`
174
+
- Time-Based Execution: Configured to run during off-hours on weekdays (6pm-7am) and all the time on weekends
175
+
- Concurrency and Safety:
176
+
- AdvisoryLock: Utility class for managing Postgres advisory locks
177
+
- Transaction Management: Context managers for safe lock acquisition and release
178
+
- ID Generation: Hash-based lock ID generation from string names
179
+
- Updated TDAMMTags to remove redundant tags (MMA_M_EM, MMA_O_BI, MMA_O_BH, MMA_O_N) and add a missing one (MMA_S_FBOT). Also updated the enum value for the NOT_TDAMM tag.
180
+
- Classification results coming in from the inference API will need to be translated to the TDAMMTags model we have, and that is handled by the `map_classification_to_tdamm_tags` in the `classification_utils` that will contain any relevant utilities for subsequent classifiers
181
+
- The collections that will be run through the pipeline are limited to the following right now:
182
+
- imagine_the_universe
183
+
- physics_of_the_cosmos
184
+
- stsci_space_telescope_science_institute
185
+
- Once the front end has been updated to allow for tag edits, all astrophysics collections will be marked to be run through the pipeline
0 commit comments