This repository contains a detailed automated pipeline for processing and analyzing clinical trial data. The pipeline consists of multiple interdependent scripts that are executed in sequence to perform various tasks.
To start the automated pipeline, follow these steps:
-
Fill in the necessary SLURM job details in each script.
-
Replace the placeholders with the full paths to the respective script files.
-
Run sbatch 'Everything.sh' to run the pipeline, which will take care of dependencies as outlined in the script.
- SLURM: The pipeline uses SLURM job scheduler to manage job submission and dependencies.
- Python: The pipeline includes Python scripts for data processing and filtering.
- Dependencies: Any dependencies specific to each script should be documented within the script itself.
This Python script manages the configuration and metadata used in a clinical trial summarization pipeline. It allows users to modify:
- Medical text/file fields
- Devices
- Search terms used for detecting devices in raw clinical trial text
- View current pipeline values
- Add, change, or delete:
- Medical fields
- Devices (and corresponding search terms)
- Search terms associated with each device
- Automatically updates and rewrites the configuration file
- Updates master JSON lists per device
- Rebuilds historical
PastDevices.jsonrecords
[MedicalFields]
text_fields = summary, intervention, detailed_description
file_fields = nct_id
[Devices]
devices = Apple_HealthKit, Garmin_Watch, Fitbit
[AlgSearchTerms]
garmin = Garmin_Fenix
google[ -]?fit = Google_Fit
[FormattedAlgSearchTerms]
Garmin_Fenix = garmin
Google_Fit = google[ -]?fitDisplays:
- Medical fields
- Tracked devices
- Search term mappings
You'll be prompted to choose:
- Medical Fields
- Devices
- Search Terms
- Choose Devices from the "Change pipeline values" menu.
- Select Add.
- Input the new device name (e.g.,
Google_Pixel_Watch). - Enter associated search terms (e.g.,
pixel watch,google smartwatch).- See How to Create Search Terms for more information
- Use
_for leading/trailing spaces (e.g.,_fitbit=" fitbit").
- Type
0to finish adding terms. - Confirm your entries.
- Add the device to
config.ini - Add search terms to
AlgSearchTermsandFormattedAlgSearchTerms - Alphabetize entries
- Create a new
devices_master_lists/<device>.jsonfromTemplate.json - Update
Devices.jsonandPastDevices.json
- Choose Devices → Change
- Select the device to rename
- Enter the new name
Devices,Formatted Search Terms, andAlgSearchTerms- Master list filenames
- JSON content references
- Choose Devices → Delete
- Select the device
- Remove the device from all config sections
- Update
Devices.jsonandPastDevices.json - Delete the corresponding
.jsonfromdevices_master_lists/
Choose Search Terms from the main menu.
- Choose a device
- Input new search terms
- Script updates the mappings and config file
- Replaces all current search terms for a device
- Old terms are removed from the config
- Choose the device
- Select the search term to delete
- Note: Deletion is blocked if it's the only remaining term
- Go to clinicaltrials.gov and type in potential device names in the Other Terms field.
- Try several terms similar to the device you have. Typos are common in clinical trial documentation.
- For example, for the MyFitnessPal device, the following search terms are needed to capture them all: myfitnesspal, myfitness pal, my fitnesspal, my fitness pal
- This process will involve hand-checking whether the search terms are too broad or too narrow.
- In some cases, such as the Apple HealthKit, the search term healthkit was not sufficiently narrow, so a regular expression was produced to search for other relevant terms in addition to health kit, such as iPhone, Apple, and/or iOS.
- You may also find the regular expressions defined in the config.ini file for each existing device helpful when ideating on new devices.
- Notes:
- During the search process, all text is converted to lower-case, so all search terms will also be lower-case.
- It is also not necessary to create regular expressions for new search terms. Inputting each search term individually will have the same effect as the creation of a regular expression connecting each with an or. Regular expressions are useful for multi-word device names or devices that require more complex searches.
- Please reach out if you need help identifying sufficient search terms.
- Iterate through several search terms until you feel sufficiently confident in your search terms.
- Add them to the pipeline using one of the methods above, depending on the context in which you are adding the search terms.
To reinitialize all downstream files from the updated config, run:
sbatch helpful_files/RestartMasterLists.sh- Always use Title_Case for device names.
- Use
_to signify spaces in search terms (e.g.,_fitbit→" fitbit"). - Be cautious with deletions: downstream summaries may be affected.
Ensure your config.ini contains the following:
[OpenAIAPI]
key = your-api-key-here
model = gpt-4-0125-previewPlease be cautious when running the pipeline, ensuring that the required permissions and resources are available.
For any questions or assistance, contact reneedw@cs.stanford.edu.